CN113836238A

CN113836238A - Batch processing method and device for data commands

Info

Publication number: CN113836238A
Application number: CN202111164186.0A
Authority: CN
Inventors: 何华峰; 刘宇霆; 张鹏
Original assignee: Hangzhou Dt Dream Technology Co Ltd
Current assignee: Hangzhou Dt Dream Technology Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2021-12-24

Abstract

The application provides a batch processing method and device of data commands, electronic equipment and a storage medium, wherein the method comprises the following steps: reading a set of data commands for a target database cluster from a command queue; determining data command groups in the data command set, wherein the data command groups respectively correspond to all cluster nodes in the database cluster; when any data command group contains a plurality of data commands, all the data commands in any data command group are combined into corresponding batch processing requests based on a pipeline technology; and submitting the batch processing request to a target cluster node corresponding to any data command group so that the target cluster node performs batch processing on the data commands contained in the batch processing request.

Description

Batch processing method and device for data commands

Technical Field

The application relates to the field of databases, in particular to a batch processing method and device for data commands.

Background

The database is a core component in computer software, and the database cluster is an organization form of the database, wherein the database cluster comprises a plurality of cluster nodes. In the related art, when a related management tool submits a pending data command to a database cluster, the database cluster often only supports processing the data command in a manner of submitting one data command at a time, and it is difficult to meet the requirement of high efficiency.

Disclosure of Invention

In view of the above, the present application provides a method and an apparatus for batch processing of data commands, which are used for batch processing of data commands.

Specifically, the method is realized through the following technical scheme:

according to a first aspect of the present application, a batch processing method of data commands is provided, the method comprising:

reading a set of data commands for a target database cluster from a command queue;

determining data command groups in the data command set, wherein the data command groups respectively correspond to all cluster nodes in the database cluster;

when any data command group contains a plurality of data commands, all the data commands in any data command group are combined into corresponding batch processing requests based on a pipeline technology;

and submitting the batch processing request to a target cluster node corresponding to any data command group so that the target cluster node performs batch processing on the data commands contained in the batch processing request.

According to a second aspect of the present application, there is provided an apparatus for batch processing of data commands, the apparatus comprising:

the reading unit is used for reading a data command set aiming at the target database cluster from the command queue;

the determining unit is used for determining data command groups respectively corresponding to all cluster nodes in the database cluster in the data command set;

the merging unit merges all the data commands in any data command group into corresponding batch processing requests based on a pipeline technology when any data command group comprises a plurality of data commands;

and the submitting unit is used for submitting the batch processing request to a target cluster node corresponding to any data command group so as to enable the target cluster node to perform batch processing on the data commands contained in the batch processing request.

According to a third aspect of the present application, there is provided an electronic device comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor implements the method as described in the embodiments of the first aspect above by executing the executable instructions.

According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method as described in the embodiments of the first aspect above.

According to the technical scheme, the data commands are stored in the command queue, the data command group corresponding to a certain node in a database cluster is obtained from the command queue, and then the data commands in the same data command group can be combined into a batch processing request by using a pipeline technology and submitted to the corresponding cluster node in batch. According to the technical scheme, the data commands can be distributed, and the distributed data commands are combined into batch processing requests and submitted to corresponding cluster nodes at one time. Compared with the mode that the data command is submitted once when being received, the method reduces the times of data interaction, reduces the consumption of transmission resources and improves the efficiency of processing the data command.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

FIG. 1 is a flow chart illustrating a method for batch processing of data commands according to an exemplary embodiment of the present application;

FIG. 2 is a schematic diagram of a network architecture to which an embodiment of the present invention is applied for batch processing of data commands;

FIG. 3 is a detailed flow diagram illustrating a method for batch processing of data commands according to an exemplary embodiment of the present application;

FIG. 4 is a schematic diagram of an electronic device shown in accordance with an exemplary embodiment of the present application;

FIG. 5 is a block diagram illustrating a data command batch processing apparatus according to an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

Next, examples of the present application will be described in detail.

FIG. 1 is a flow chart illustrating a method for batch processing of data commands according to an exemplary embodiment of the present application. As shown in fig. 1, the above method may include the steps of:

step 102: a set of data commands for the target database cluster is read from the command queue.

In an embodiment, the method is applied to a tool that can operate on a database, such as an ETL (Extract-Transform-Load) tool or other tools that can operate a database, and this application does not limit this.

In one embodiment, the target database cluster includes a plurality of database nodes, and the database nodes cooperate with each other to implement the basic functions of the database cluster. In an external expression form, each database node may be expressed as a different entity server, or as a different logic unit partitioned from the same server. The data command may be stored in a command queue, where the data command is used to instruct a specific operation to be performed on a target database cluster, for example, the data command may include a data write command used to write data from the target database cluster, and the data to be written may be included in the data write command; or, the data command may also include a read command for reading specific data from the target database cluster, the application does not limit the type included in the data command, and the data command set may include only one type of data command or may include multiple types of data commands.

In one embodiment, the data commands in the command queue are arranged in sequence according to a time sequence, and the relative sequence between the data commands contained in the batch processing request is the same as that in the command queue. The command queue maintains a first-in first-out data structure, the data command stored in the command queue is arranged before the data command stored in the command queue, when the data command is read from the command queue, the data command stored in the command queue is read firstly, and then other data commands arranged behind are read in sequence. The data commands stored in the command queue may be arranged sequentially in time order, for example, the data commands may be arranged sequentially in the command queue according to the time when the data commands are received by the queue; or the data commands are sequentially arranged according to the generation time or the sending time of the data commands, in this way, the generation time or the sending time of the data commands can be recorded in the data commands, when the ETL tool stores the data commands in the command queue, the generation time or the sending time of the data commands is obtained by analyzing the data commands, and then the data commands are sequentially stored in the command queue according to the sequence of the generation time or the sending time.

In one embodiment, when the number of data commands in the command queue reaches a preset threshold, reading a set of data commands for the target database cluster from the command queue. In order to prevent backlog of data due to an excessive number of data commands in the command queue, a preset threshold may be set according to the actual processing capacity.

Step 104: and determining data command groups in the data command set, wherein the data command groups respectively correspond to all cluster nodes in the database cluster.

In an embodiment, since the data command is processed by the database cluster, in order to enable the nodes in the database cluster to share the data processing pressure, the pressure for processing the data command may be distributed to the nodes in the database cluster, so that the data command is divided into a plurality of data command groups, and each data command group has a cluster node corresponding to itself for processing the data command in the data command group.

Specifically, the data commands may be distributed in a manner of matching hash slots, each cluster node included in the database cluster corresponds to a respective managed hash slot range, and the ETL tool may calculate a hash slot corresponding to any data command according to a key value of the data command in the data command set, and distribute the data command to a corresponding cluster node according to the hash slot of the node. For example, assuming that 30000 hash slots are defined in the entire database cluster, wherein the hash slots include three cluster nodes, node a manages 0-10000 hash slots, node B manages 10001-20000 hash slots, and node C manages 20001-30000 hash slots, the ETL tool can calculate the key value included in any data command by using CRC16 algorithm, and assuming that the calculated result is 3000 and falls within the range of the hash slot managed by node a, the data command should be allocated to the data command group corresponding to node a for being processed by node a.

In one case, in the command queue, the data commands in the front of the time are arranged before the data commands in the back of the time, and the data commands are kept arranged according to the time sequence, so that the original sequence of the data commands is not disturbed when the data commands are distributed to each data command group in order to avoid the situation that the database cluster cannot process the data commands because the sequence of the data commands is disturbed when the data commands have a dependency relationship. For example, if the data commands generated first are data write commands and the data commands generated later are data read commands, and the data write commands generated first in the command queue are arranged before the data read commands generated later, if the data commands are all allocated to the same data command group, the data write commands are still arranged before the data read commands in the data command group.

Step 106: when any data command group contains a plurality of data commands, all the data commands in the any data command group are merged into a corresponding batch processing request based on the pipeline technology.

Step 108: and submitting the batch processing request to a target cluster node corresponding to any data command group so that the target cluster node performs batch processing on the data commands contained in the batch processing request.

In an embodiment, when any data command group includes multiple data commands, if each data command is submitted to a corresponding cluster node, the number of interactions will be increased, so all the data commands in any data command group can be merged into a batch processing request based on a pipeline technology, and the batch processing request includes all the data commands included in the data command group. Through the processing of the pipeline technology, the corresponding cluster node can acquire all data commands in the data command group only by submitting a batch processing request once, and after the cluster node completes the processing of all the data commands in the batch processing request, the processing results can be combined into a return result without being returned in a plurality of times.

In an embodiment, when the batch processing request is submitted to the target cluster node corresponding to any data command group, the target connection corresponding to the target cluster node may be acquired from a connection pool of the database cluster, and the batch processing request is submitted to the target cluster node by using the target connection. Connections in the connection pool are established when the database cluster is initialized, and the cluster nodes all have respective corresponding connections. In order to improve the submission efficiency, when submitting the batch processing request, the initialized connection in the connection pool can be directly borrowed, and the connection is utilized to submit the corresponding batch processing request.

In one embodiment, when any data command fails to be executed, the data command which is not executed is stopped to be continuously executed; re-determining cluster nodes contained in the database cluster and the hash groove range managed by each cluster node; and forming the data commands which are not executed into new data command groups respectively corresponding to the newly determined cluster nodes according to the newly determined hash slot range, and further combining all the data commands in the new data command groups into corresponding batch processing requests based on a pipeline technology. In this embodiment, if a storage space of a cluster node cannot accommodate data to be written or a cluster node is disconnected due to a network, execution of a data command may fail, in this case, in order to maintain normal operation of the database cluster, available cluster nodes in the database cluster may decrease or increase, and in a case that the total number of hash slots is constant, the number of nodes changes, which may cause the allocation of hash slots to change. In order to adapt to the change of cluster nodes and ensure that the data processing flow is not interrupted, when any data command fails to be executed, the cluster nodes can be stopped from continuously executing other data commands, and after the database cluster redistributes the hash grooves corresponding to the nodes, the cluster nodes contained in the database cluster and the hash groove range managed by the nodes are determined. According to the result of the redistribution, the commands which are not executed are redistributed into the data command group corresponding to the existing node, and then the newly generated batch processing request is submitted by reusing the pipeline technology.

In one embodiment, the above method may also be applied if it is desired to achieve batch synchronization of data between two databases. For example, when the data command is a data write command, the ETL tool may extract data from the source database in batch, and include the data to be written in the data write command, that is, each data write command includes one piece of complete data to be written, and then the cluster node corresponding to the data command may be calculated according to the key value of the data to be written. By using the method, the database cluster is a target database, and data in the source database can be written into the target database in batches, so that the efficiency of data synchronization is improved.

According to the technical scheme, the data commands are stored in the command queue, the data command group corresponding to a certain node in a database cluster is obtained from the command queue, and then the data commands in the same data command group can be combined into a batch processing request by using a pipeline technology and submitted to the corresponding cluster node in batch. According to the technical scheme, the data commands can be distributed, and the distributed data commands are combined into batch processing requests and submitted to corresponding cluster nodes at one time. Compared with the mode that the data command is submitted once when being received, the method reduces the times of data interaction, reduces the consumption of transmission resources and improves the efficiency of processing the data command. In addition, according to the data command processing method and device, a fault processing mechanism is also set for the condition that a fault occurs in the data command processing process, the unprocessed data commands can be redistributed according to the updating condition of the nodes in the database cluster, and the command processing fluency is improved.

Fig. 2 is a schematic diagram of a network architecture to which the data command batch processing method according to the embodiment of the present application is applied. As shown in fig. 2, ETL tool 22 may batch extract data from source database 21 and write the data into database cluster 23, where database cluster 23 includes a plurality of cluster nodes 1 and 2 … ….

Fig. 3 is a specific flowchart of a batch processing method for data commands according to an embodiment of the present application, and the following describes in detail the steps of the method in fig. 3 with reference to fig. 2:

step 302, extracting data to be written from a source database.

The ETL tool 22 may extract data to be written in a batch from the source database, and write the data to be written in the database cluster 23, which is not limited in the present application, for example, the source database may be a distributed database such as a database cluster, or may also be a relational or non-relational database. It should be noted that, in this embodiment, a database cluster is taken as a redis cluster as an example.

At step 304, a data write command is generated and stored to the command queue.

The ETL tool 22 may generate data write commands according to data to be written, where each data write command includes original data to be written that needs to be written, and the data to be written generally exists in a Key-Value pair (Key-Value) form. The generated data write commands can be stored in a command queue, the data write commands in the command queue are sequentially arranged according to a time sequence, the command queue maintains a first-in first-out data structure, the data write command stored in the command queue first is arranged before the data write command stored in the command queue later, when the data write command is read from the command queue later, the data write command stored in the command queue first is read first, and then other data write commands arranged later are sequentially read. The data write commands stored in the command queue are arranged in sequence according to a time sequence, for example, the data write commands may be arranged in sequence in the command queue according to a time when the data write commands are received by the queue; or the data writing commands are sequentially arranged according to the generation time or the sending time of the data writing commands, in this way, the generation time or the sending time of the data writing commands can be recorded in the data writing commands, when the ETL tool stores the data writing commands in the command queue, the generation time or the sending time of the data writing commands is obtained by analyzing the data writing commands, and then the data writing commands are sequentially stored in the command queue according to the sequence of the generation time or the sending time.

In step 306, the set of data write commands is read.

And 308, determining a data writing command group corresponding to each cluster node.

In the above two steps, the ETL tool 22 may read the data write command set from the command queue, and allocate the data write commands according to the hash slot ranges managed by the respective redis nodes in the redis cluster 23, where each redis node corresponds to a data write command group, and the data write command group is a set of data write commands that the redis node needs to process. Specifically, a plurality of redis nodes are included in the redis cluster 23, it is assumed that 16384 hash slots (slots) are defined in the entire redis cluster 23, each redis node manages a part of the hash slots, it is assumed that 3 redis nodes are included in the redis cluster 23, node a manages slots of 0 to 5500, node B manages slots of 5501 and 11000, node C manages slots of 11001 and 16383, the ETL tool can calculate a Key value (Key) including data to be written in any data write command by using a CRC16 algorithm, and it is assumed that the calculated result is 3000 and falls within the slot range managed by node a, so that the data write command should be allocated to a data write command group corresponding to node a to be processed by node a.

At step 310, a batch processing request is generated.

In this step, taking node a as an example, ETL tool 22 may combine the data write commands assigned to this node for processing, and generate a batch processing request corresponding to node a. Specifically, the ETL tool 22 merges the data write commands, and a message of the batch processing request generated after merging includes the data write command that needs to be processed by the node.

Step 312, obtain connections from the connection pool.

In the redis cluster 23, if interaction with the redis node is required, data interaction with the redis node can be performed through connection between the jedis object and the redis node by establishing connection between the jedis object and the redis node. In general, each redis node corresponds to a jedis object, so as to avoid repeatedly establishing a connection in the interaction process, the redis cluster 23 may provide a connection pool, the connection pool stores the connection between the jedis object and the redis node, and the initialization of the connection stored in the connection pool is completed when the redis cluster 23 is started. In this step, the ETL tool 22 needs to submit the batch processing request to the corresponding redis node, and may determine the connection to be borrowed from the connection pool according to the node corresponding to the batch processing request, and further obtain the determined connection from the connection pool, so as to be used in the submission process in step 314. It is worth noting that the borrowed connection may be released after the commit is complete and restocked into the connection pool.

Step 314, submit the batch processing request to the database cluster.

In this step, the ETL tool 22 may submit the batch processing request to a corresponding redis node in the redis cluster 23 by using the borrowed connection, and the redis node responds to the data write command included in the batch processing request, writes the data to be written included in the batch processing request, and completes the batch write process of the data to be written.

Corresponding to the method embodiments, the present specification also provides an embodiment of an apparatus.

Fig. 4 is a schematic structural diagram of an electronic device of a data command batch processing apparatus according to an exemplary embodiment of the present application. Referring to fig. 4, at the hardware level, the electronic device includes a processor 402, an internal bus 404, a network interface 406, a memory 408, and a non-volatile memory 410, but may also include hardware required for other services. The processor 402 reads the corresponding computer program from the non-volatile memory 410 into the memory 408 and runs it, forming a means for processing data commands in batches at a logical level. Of course, besides the software implementation, the present application does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices.

FIG. 5 is a block diagram illustrating a data command batch processing apparatus according to an exemplary embodiment of the present application. Referring to fig. 5, the apparatus includes a reading unit 502, a determining unit 504, a merging unit 506, and a committing unit 508, where:

a reading unit 502, configured to read a data command set for a target database cluster from a command queue;

a determining unit 504, configured to determine data command groups in the data command set, where the data command groups respectively correspond to cluster nodes in the database cluster;

a merging unit 506, configured to merge all data commands in any data command group into corresponding batch processing requests based on a pipeline technique when any data command group includes multiple data commands;

a submitting unit 508, configured to submit the batch processing request to a target cluster node corresponding to any data command group, so that the target cluster node performs batch processing on the data commands included in the batch processing request.

Optionally, the reading, from the command queue, a set of data commands for a target database cluster includes:

and when the number of the data commands in the command queue reaches a preset threshold value, reading a data command set aiming at the target database cluster from the command queue.

Optionally, the data commands in the command queue are sequentially arranged according to a time sequence, and a relative sequence between the data commands included in the batch processing request is the same as that of the data commands in the command queue.

Optionally, the determining the data command groups in the data command set, which respectively correspond to the cluster nodes in the database cluster, includes:

calculating a hash slot corresponding to the data command according to a key value of the data command in the data command set;

respectively determining the hash slot range managed by each cluster node contained in the database cluster;

and determining a data command group corresponding to each cluster node contained in the database cluster from the data command set, wherein the hash slot corresponding to the data command contained in the data command group distributed to any cluster node falls into the hash slot range managed by any cluster node.

Optionally, the apparatus further includes a fault handling unit 510:

when any data command fails to be executed, stopping continuously executing the data command which is not executed;

re-determining cluster nodes contained in the database cluster and the hash groove range managed by each cluster node;

and forming the data commands which are not executed into new data command groups respectively corresponding to the newly determined cluster nodes according to the newly determined hash slot range, and further combining all the data commands in the new data command groups into corresponding batch processing requests based on a pipeline technology.

Optionally, the apparatus further includes a connection obtaining unit 512, where the submitting the batch processing request to the target cluster node corresponding to any data command group includes:

acquiring target connection corresponding to the target cluster node from a connection pool of the database cluster;

and submitting the batch processing request to the target cluster node by using the target connection.

Optionally, the data command is a data write command, where the data write command includes data to be written extracted from a source database;

the database cluster is a destination database, and the data writing command is used for writing the contained data to be written into the destination database.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.

In an exemplary embodiment, there is also provided a non-transitory computer readable storage medium, e.g. a memory, comprising instructions executable by a processor of a data command batch processing apparatus to implement a method as described in any of the above embodiments, such as the method may comprise:

reading a set of data commands for a target database cluster from a command queue; determining data command groups in the data command set, wherein the data command groups respectively correspond to all cluster nodes in the database cluster; when any data command group contains a plurality of data commands, all the data commands in any data command group are combined into corresponding batch processing requests based on a pipeline technology; and submitting the batch processing request to a target cluster node corresponding to any data command group so that the target cluster node performs batch processing on the data commands contained in the batch processing request.

The non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, etc., which is not limited in this application.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. A method for batch processing of data commands, the method comprising:

2. The method of claim 1, wherein reading a set of data commands for a target database cluster from a command queue comprises:

3. The method of claim 1, wherein the data commands in the command queue are arranged sequentially in a chronological order, and the relative order between the data commands included in the batch request is the same as it is in the command queue.

4. The method of claim 1, wherein determining the set of data commands in the set of data commands that respectively correspond to the cluster nodes in the database cluster comprises:

5. The method of claim 4, further comprising:

6. The method of claim 1, wherein the submitting the batch processing request to the target cluster node corresponding to any data command group comprises:

7. The method of claim 1,

the data command is a data writing command which comprises data to be written extracted from a source database;

8. An apparatus for batch processing of data commands, the apparatus comprising:

9. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor implements the method of any one of claims 1-7 by executing the executable instructions.

10. A computer-readable storage medium having stored thereon computer instructions, which when executed by a processor, perform the steps of the method according to any one of claims 1-7.