CN110716698B

CN110716698B - Data fragment copy deployment method and device

Info

Publication number: CN110716698B
Application number: CN201910944239.7A
Authority: CN
Inventors: 马申跃; 范亚平
Original assignee: New H3C Big Data Technologies Co Ltd
Current assignee: New H3C Big Data Technologies Co Ltd
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2022-08-26
Anticipated expiration: 2039-09-30
Also published as: CN110716698A

Abstract

The application provides a data fragment copy deployment method and a device, the method is applied to a configuration server in a database cluster comprising the configuration server, a fragment server and a zookeeper server, and the method comprises the following steps: acquiring the storage space surplus of each sharded server in a database cluster; when a first storage sub node is newly added to a first storage node in a zookeeper server, acquiring a data fragment identifier stored by the first storage sub node and a fragment server identifier for storing the data fragment, and selecting a target fragment server for storing a data fragment copy from other fragment servers according to the residual amount of storage space of each fragment server; and storing the data fragment copies to the target fragment server so that the target fragment server allocates ports for the data fragment copies. The method can realize automatic deployment of the data fragment copies, improves the deployment efficiency of the data fragment copies, and is not easy to generate errors.

Description

Data fragment copy deployment method and device

Technical Field

The application relates to the technical field of databases, in particular to a data fragment copy deployment method and device.

Background

When mass data are stored, the MongoDB database can divide the mass data into a plurality of data fragments through a data fragment technology, and respectively stores the data fragments on a plurality of fragment servers, so that the mass data are stored in a MongoDB cluster mode. On the basis, in order to realize the redundancy protection of the data fragments, corresponding data fragment copies are set for the data fragments, and the data fragment copies are stored in the fragment servers.

However, the MongoDB database is different from databases such as HDFS, Kafka and Solr, and does not have an automatic distribution and deployment mechanism of data fragment copies. Each data fragment copy needs to be allocated with a fragment server for storing the data fragment copy and a port for operating the data fragment copy after the data fragment copy is started, and the data fragment copy can be used by the MongoDB database.

At present, a fragmentation server and a port are mainly allocated to each data fragmentation copy in a manual mode, so that the implementation efficiency is low, and errors are easy to occur.

Disclosure of Invention

In view of this, the present application provides a data fragment copy deployment method and apparatus.

Specifically, the method is realized through the following technical scheme:

according to a first aspect of the embodiments of the present application, a data shard copy deployment method is provided, which is applied to a configuration server in a database cluster including the configuration server and a shard server, where the database cluster further includes a zookeeper server, and the zookeeper server is connected to the configuration server and the shard server, respectively, and the method includes:

acquiring the storage space surplus of each sharded server in the database cluster;

when a first storage sub-node in a zookeeper server is monitored to be newly added with a first storage sub-node, a data fragment identifier stored by the first storage sub-node and a fragment server identifier for storing the data fragment are obtained, and a target fragment server for storing a data fragment copy is selected from other fragment servers according to the storage space surplus of each fragment server, wherein the first storage node is used for storing data fragment information, the first storage node comprises at least one first storage sub-node, and each first storage sub-node corresponds to one data fragment;

storing the data fragment copies to a target fragment server, and registering the identifier of the data fragment copies to a target second storage sub-node of a second storage node in the zookeeper server, where the target second storage sub-node is a pre-designated second storage sub-node which is responsible for monitoring by the target fragment server, so that when the target fragment server monitors registration information of a new data fragment copy added to the target second storage sub-node, a port is allocated to the data fragment copies, and the port is used for operating the data fragment copies, where the second storage node is used for storing information of the fragment server, the second storage node includes at least one second storage sub-node, and each second storage sub-node corresponds to one fragment server.

According to a second aspect of the embodiments of the present application, a data shard copy deployment method is provided, which is applied to a shard server in a database cluster including a configuration server and a shard server, where the database cluster further includes a zookeeper server, and the zookeeper server is connected to the configuration server and the shard server respectively, and the method includes:

monitoring a target second storage sub-node of a second storage node in the zookeeper server, wherein the target second storage sub-node is a pre-designated second storage sub-node which is monitored by the fragment server;

when the fragmentation server monitors registration information of a data fragmentation copy newly added to a target second storage sub node, a port is allocated to the data fragmentation copy, and the port is used for operating the data fragmentation copy.

According to a third aspect of the embodiments of the present application, there is provided a data shard copy deployment apparatus, applied to a configuration server in a database cluster including the configuration server and a shard server, where the database cluster further includes a zookeeper server, and the zookeeper server is connected to the configuration server and the shard server respectively, and the apparatus includes:

the acquisition unit is used for acquiring the storage space surplus of each sharded server in the database cluster;

the data fragment storage system comprises a selection unit and a fragment server, wherein the selection unit is used for acquiring a data fragment identifier stored by a first storage sub-node and a fragment server identifier for storing a data fragment when the first storage sub-node in a zookeeper server is newly added, and selecting a target fragment server for storing a data fragment copy from other fragment servers according to the storage space surplus of each fragment server, wherein the first storage node is used for storing data fragment information, the first storage node comprises at least one first storage sub-node, and each first storage sub-node corresponds to one data fragment;

and the storage unit is used for storing the data fragment copies to a target fragment server and registering the identification of the data fragment copies to a target second storage sub-node of a second storage node in the zookeeper server, wherein the target second storage sub-node is a pre-designated second storage sub-node which is monitored by the target fragment server, so that when the target fragment server monitors registration information of a new data fragment copy added to the target second storage sub-node, a port is allocated for the data fragment copies and used for operating the data fragment copies, the second storage node is used for storing information of the fragment server, the second storage node comprises at least one second storage sub-node, and each second storage sub-node corresponds to one fragment server.

According to a fourth aspect of the embodiments of the present application, there is provided a data shard copy deployment apparatus, which is applied to a shard server in a database cluster including a configuration server and a shard server, where the database cluster further includes a zookeeper server, and the zookeeper server is connected to the configuration server and the shard server respectively, and the apparatus includes:

the monitoring unit is used for monitoring a target second storage sub-node of a second storage node in the zookeeper server, wherein the target second storage sub-node is a pre-designated second storage sub-node which is monitored by the slicing server;

and the port distribution unit is used for distributing a port for the data fragment copy when the fragment server monitors the registration information of the newly added data fragment copy of the target second storage sub node, and the port is used for operating the data fragment copy.

According to the method, the zookeeper server is added on the basis of the existing configuration server and the fragment server of the database cluster, so that the configuration server can know that the data fragments are newly added in the database cluster by monitoring the first storage sub-node on the zookeeper server, and a target fragment server for storing the data fragment copies of the data fragments is determined according to the storage space surplus of each fragment server; and further registering the identifier of the data fragment copy to a target second storage sub-node in the zookeeper server, so that when the target fragment server monitors registration information of a data fragment copy newly added to the target second storage sub-node, a port is allocated to the data fragment copy, automatic deployment of the data fragment copy is realized, compared with a manual mode of allocating a fragment server and a port to each copy, the deployment efficiency of the data fragment copy is improved, and errors are not easy to occur. In addition, various information related to the data fragments and the data fragment copies in the database cluster is stored to each node in the zookeeper server, so that a user can master the distribution condition of each data fragment and each data fragment copy in the database cluster by checking the information stored by each node.

Drawings

Fig. 1 is a flowchart illustrating a data shard copy deployment method according to an exemplary embodiment of the present application.

FIG. 2 is a schematic diagram of a database cluster shown in an exemplary embodiment of the present application.

Fig. 3 is a schematic diagram illustrating a storage node in a zookeeper server according to an exemplary embodiment of the present application.

Fig. 4 is a flowchart illustrating an implementation of step 102 according to an exemplary embodiment of the present application.

Fig. 5 is a flowchart illustrating an implementation of step 1022 according to an exemplary embodiment of the present application.

Fig. 6 is a flowchart illustrating another data shard copy deployment method according to an exemplary embodiment of the present application.

FIG. 7 is a schematic diagram of another database cluster shown in an exemplary embodiment of the present application.

Fig. 8 is a schematic diagram of another storage node in a zookeeper server according to an exemplary embodiment of the present application.

Fig. 9 is a schematic structural diagram of a data shard copy deployment apparatus according to an exemplary embodiment of the present application.

Fig. 10 is a schematic structural diagram of another data fragmented copy deployment apparatus according to an exemplary embodiment of the present application.

Fig. 11 is a schematic structural diagram of another data fragmented copy deployment apparatus according to an exemplary embodiment of the present application.

Fig. 12 is a schematic hardware structure diagram of a data shard copy deployment apparatus according to an exemplary embodiment of the present application.

Fig. 13 is a schematic hardware structure diagram of another data slice copy deployment apparatus according to an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in detail below with reference to the accompanying drawings and specific embodiments.

In order to clearly illustrate the data fragment copy deployment method of the present application, the following describes a data fragment copy deployment method executed by a configuration server side and a fragment server side from a single-side perspective:

referring to fig. 1, fig. 1 is a flowchart of a data shard copy deployment method provided in the present application, where the flowchart is applied to a configuration server in a database cluster including the configuration server and shard servers. Referring to fig. 2, fig. 2 is a schematic diagram of a database cluster in which a configuration service and a monitoring service and a server for data storage (referred to herein as a sharded server) are deployed, wherein:

the configuration service may be a program module, which may be deployed independently to a physical server or may be deployed in combination with other software modules, and is described herein by way of example as a configuration service deployed independently to a server (which may be referred to as a configuration server).

The monitoring service may be a program module that may be deployed independently to a physical server or in combination with other software modules, and is deployed independently in a server (which may be referred to as a zookeeper server or a monitoring server) with the monitoring server.

The fragment server is used for storing the data fragments and the data fragment copies.

For one embodiment, the database cluster may be a database cluster capable of setting data shards and data shard copies, and may be a mongodb database cluster, for example.

For one embodiment, a zookeeper server may include a first storage node and a second storage node. The first storage node is configured to store data fragment information, where the data fragment information may include, but is not limited to, fragment server information in which data fragments are stored, fragment server information in which copies of each data fragment are stored, and port (port for operating on a data fragment or a copy of a data fragment) information on a corresponding fragment server.

For example, the information stored by the first storage node may be keyed by a data shard identifier (used to uniquely identify one data shard in the database cluster), where the data shard identifier corresponds to multiple shard servers (including the shard server storing the data shard and the shard server storing a copy of the data shard).

The first storage node may include at least one first storage sub-node, each first storage sub-node corresponds to one data fragment, and each first storage sub-node stores an identifier of the corresponding data fragment (i.e., a data fragment identifier), an identifier of a fragment server (i.e., a fragment server identifier) where the data fragment and a copy of the data fragment are stored, and port information on the corresponding fragment server.

The second storage node is configured to store the fragmentation server information, where the fragmentation server information may include, but is not limited to, information of the data fragments and the data fragment copies stored in the fragmentation server, and port information corresponding to the data fragments and the data fragment copies.

For example, the information stored by the second storage node may be keyed by a shard server identifier (used to uniquely identify one shard server in the database cluster), where the shard server identifier corresponds to at least one data shard or data shard copy, and port information corresponding to each data shard or data shard copy on the shard server.

The second storage node may include at least one second storage sub-node, each second storage sub-node corresponds to one fragment server, and each second storage sub-node stores a corresponding fragment server identifier, an identifier of a data fragment and a data fragment copy stored on the fragment server, and corresponding port information.

Fig. 3 shows a manner of setting storage nodes included in the zookeeper server. Referring to fig. 3, as an embodiment, the zookeeper server includes a first storage node that is a shardlist (data shard directory) node and a second storage node that is a host (shard server) node. Here, the first storage sub-node may include a shard1 sub-node and a shard2 sub-node stored by a shardlist node, and the second storage sub-node may include a non 1.com sub-node and a non 2.com sub-node stored by a host node. In addition, fig. 3 also shows a shard node, which serves as a root node of the shardlist node and the host node, and is configured to store data fragment information of all data fragments and copies thereof in the cluster and fragment server information of all fragment servers. For the data fragment, because the data fragment copy exists, and the data fragment copy are stored in different fragment servers, a first storage sub-node is created by taking the data fragment as a unit, so that the fragment server information and the corresponding port information of the fragment server where each data fragment and the copy thereof are located are stored. For the fragment servers, each fragment server usually stores a plurality of data fragments or data fragment copies, and therefore, for the fragment servers, a second storage sub-node needs to be created by taking the fragment server as a unit to store the data fragments and data fragment copy information stored by each fragment server.

As shown in fig. 1, the process may include the following steps:

step 101, acquiring the storage space surplus of each sharded server in a database cluster;

as an embodiment, in this step 101, the configuration server may obtain the remaining amount of storage space of the shard server in a plurality of ways, for example, the configuration server may obtain the remaining amount of storage space of the shard server by issuing a storage space remaining amount obtaining instruction, or may obtain the remaining amount of storage space of the shard server by receiving the remaining amount of storage space periodically reported by each shard server.

As an embodiment, the storage space remaining amount herein may include: the remaining amount of the disk, the predicted remaining amount of the disk, and/or the remaining amount of the memory, etc.

102, when monitoring that a first storage node in a zookeeper server newly adds a first storage sub-node, acquiring a data fragment identifier stored by the first storage sub-node and a fragment server identifier for storing the data fragment, and selecting a target fragment server for storing a data fragment copy from other fragment servers according to the residual amount of storage space of each fragment server, wherein the first storage node is used for storing data fragment information, the first storage node comprises at least one first storage sub-node, and each first storage sub-node corresponds to one data fragment;

as an embodiment, every time a data fragment is added to the database cluster, a first storage sub-node is correspondingly added to the first storage node. Still taking the shardlist node in fig. 3 as the first storage node as an example, when the data fragment shard2 is newly added to the database cluster, the shardlist node stores a shard2 sub-node that is newly added.

As an embodiment, when the configuration server needs to monitor the first storage node in the zookeeper server, a watchdog needs to be registered on the corresponding node, so that when the data stored by the node changes, the watchdog is triggered, and an event notification is sent to the configuration server, so that the configuration server knows that the data stored by the node changes. For example, the configuration server may register the dispatcher on the shardlist node, so that when the data of the shardlist node changes (for example, when a shard2 child node is newly added), the shardlist node sends an event notification to the configuration server through the dispatcher. When the configuration server learns that the data of the first storage node changes, the information of each first storage sub-node of the first storage node can be checked and compared with each data fragment identifier currently stored by the configuration server to determine whether a newly added first storage sub-node exists. When the configuration server determines that the first storage sub-node is newly added, the configuration server can check the data fragment identifier stored by the first storage sub-node and the fragment server identifier storing the data fragment to determine a copy of the newly added data fragment, and can avoid storing the data fragment and the data fragment copy in the same fragment server. Still taking the first storage node as a shardlist node and the newly added first storage sub-node as a shard2 sub-node as an example, after determining that data of the shardlist node changes, the configuration server may view a shard2 identifier and node1.com information stored in a shard2 sub-node, where the shard2 is an identifier of the newly added data fragment, and the node1.com is an identifier of a fragment server where the newly added data fragment is located.

As an embodiment, there are multiple implementation manners for selecting a target sharding server for storing a data sharding copy from other sharding servers according to the remaining amount of storage space of each sharding server, which will be specifically exemplified below, and details are not described herein for the time being.

Step 103, storing the data fragment copy to a target fragment server, and registering an identifier of the data fragment copy to a target second storage sub-node in a second storage node in the zookeeper server, where the target second storage sub-node is a pre-specified second storage sub-node that is responsible for monitoring by the target fragment server, so that when the target fragment server monitors registration information of a new data fragment copy added to the target second storage sub-node, a port is allocated for the data fragment copy, and the port is used for operating the data fragment copy, where the second storage node is used for storing information of the fragment server, the second storage node includes at least one second storage sub-node, and each second storage sub-node corresponds to one fragment server.

As an embodiment, the target second storage sub-node is a pre-specified storage node that is responsible for monitoring by the target sharding server, and the target second storage sub-node is used for storing the identifier and the corresponding port of the data shard and the data shard copy currently stored by the target sharding server. For example, the node2.com sub-node stored by the host node shown in fig. 3, which is the storage node responsible for monitoring by the sharding server identified as node2.com, may be used as the target second storage sub-node.

As an embodiment, when the configuration server stores the data fragment copy to the target fragment server, because the configuration server cannot know the current idle port on the target fragment server, the configuration server may not allocate a port to the data fragment copy first, or may allocate an unusable port to the data fragment copy first, and replace the usable port after the target fragment server determines the usable port subsequently, where the unusable port may be a port 0. If an unusable port is allocated to the data fragment copy first, when the identifier of the data fragment copy is registered to the target second storage sub-node, the unusable port may also be registered to the target second storage sub-node. Still taking the target second storage sub-node as the node1.com sub-node stored by the host node shown in fig. 3 and the identifier of the data fragment copy is shrrd 2 as an example, if only the identifier of the data fragment copy is registered to the second storage node, only the shrrd 2 will be stored in the node2.com sub-node storage after registration. If port 0, which is not available, is also registered to node2.com sub-node, then the registered node1.com sub-node store would store shrards 2 and 0.

As an embodiment, the port allocated by the target fragmentation server for each data fragmentation copy is one of the current idle ports of the target fragmentation server, so that after the data fragmentation copy is started, a user can operate the data fragmentation copy through the port. As described above, an unusable port is allocated to the data fragment copy first, and the usable port can be replaced after the target fragment server determines the usable port. Based on this, after determining the port allocated to the data fragment copy, the target fragment server may re-register the port to the target second storage sub-node to cover the previously registered unusable port. Still taking the above-mentioned unusable port as 0 example, as shown in fig. 3, if the target sharding server determines that the port allocated for the data shard copy is 27019, then 0 stored by the node2.com sub-node after re-registration will be modified to 27019.

The flow shown in fig. 1 is thus completed.

As can be seen from the process shown in fig. 1, in the application, a zookeeper server is added on the basis of an existing configuration server and a shard server of a database cluster, so that the configuration server can know that data shards are newly added in the database cluster by monitoring a first storage sub-node on the zookeeper server, and a target shard server for storing a data shard copy of the data shards is determined according to the residual amount of storage space of each shard server; and further registering the identifier of the data fragment copy to a target second storage sub-node in the zookeeper server, so that when the target fragment server monitors registration information of a data fragment copy newly added to the target second storage sub-node, a port is allocated to the data fragment copy, automatic deployment of the data fragment copy is realized, compared with a manual mode of allocating a fragment server and a port to each copy, the deployment efficiency of the data fragment copy is improved, and errors are not easy to occur. In addition, various information related to the data fragments and the data fragment copies in the database cluster is stored to each node in the zookeeper server, so that a user can master the distribution condition of each data fragment and data fragment copy in the database cluster by checking the information stored by each node.

Next, how to select a target sharding server for storing a copy of a data shard from other sharding servers according to the remaining amount of storage space of each sharding server in step 102 will be described.

Referring to fig. 4, fig. 4 is a flowchart illustrating an implementation of step 102 according to an exemplary embodiment of the present application. As shown in fig. 4, the process may include the following steps:

step 1021, selecting at least one candidate fragment server meeting the specified rule from each fragment server;

for one embodiment, the specified rule may include a first sub-rule and a second sub-rule. The first sub-rule may be that the candidate fragment server needs to be different from the fragment server where the data fragment is located, the second sub-rule may be that the disk remaining amount of the candidate fragment server needs to be greater than a certain value, the memory remaining amount is greater than a certain value, or the storage space remaining amount is greater than a certain value, and the like, and the candidate server needs to satisfy both the first sub-rule and the second sub-rule. The purpose of screening the fragmentation servers is to remove the fragmentation servers in which the data fragments are stored so as to avoid the data fragments and the data fragment copies from being stored in the same fragmentation server, and to remove the fragmentation servers in which the storage space of the fragmentation servers is not suitable for storing the data fragment copies, where the candidate servers obtained after screening are the other fragmentation servers.

Step 1022, determining selected parameters of the candidate sharded servers according to the remaining amount of the storage space of the candidate sharded servers, wherein the selected parameters are the basis for selecting the candidate sharded servers as target sharded servers;

as an embodiment, the selected parameter refers to a probability that the candidate sharded server is selected as the target sharded server, and the selected parameter of each candidate sharded server can be determined by calculating a ratio of a storage space residual amount of each candidate sharded server to a sum of storage space residual amounts of all candidate sharded servers. The larger the selected parameter is, the larger the storage space residual of the sharding server is, and the larger the probability that the sharding server is selected is.

And step 1023, selecting a target slicing server from all candidate slicing servers according to the selected parameters of the candidate slicing servers.

As an embodiment, when only one data fragment copy exists in one data fragment, the maximum selected parameter may be determined from the selected parameters of the candidate fragment servers, and the candidate fragment server corresponding to the maximum selected parameter is determined as the target fragment server.

When a plurality of data fragment copies exist in one data fragment, the fragment servers for storing each data fragment copy may be sequentially selected according to the above-mentioned policy (the fragment servers storing different fragment copies are different), or based on the number of fragment copies, the fragment servers of the matching number are selected according to the above-mentioned policy, and each data fragment copy is stored respectively.

The flow shown in fig. 4 is completed.

In the following, how to determine the selected parameter according to the remaining amount of storage space in step 1022 will be described with reference to fig. 5, in the case that the remaining amount of storage space includes the predicted remaining amount of disk and the remaining amount of memory. As shown in fig. 5, the process may include the following steps:

step 10221, for each candidate slice server, calculating a first ratio of the predicted disk remainder to N of the candidate slice server, and calculating a second ratio of the memory remainder to M of the candidate slice server, where N is the sum of the predicted disk remainders of the candidate slice server, and M is the sum of the memory remainders of the slice server;

as an embodiment, the remaining storage space amount includes a predicted remaining disk amount and a remaining memory amount, and therefore, a first ratio corresponding to the predicted remaining disk amount and a second ratio corresponding to the remaining memory amount of each candidate shard server need to be calculated respectively. The first ratio here can be calculated by the following formula:

before running the formula, each sharded server may be assigned a number for subsequent calculations. N here _remain The disk prediction residual quantity of the slicing server with the number n is referred to, m is the maximum value of the number of the candidate slicing server, and the domain [ n ]]Refers to the first ratio of the sharded servers with the number n, where n may be the number of each sharded server.

Similarly, the second ratio can be calculated by the following equation:

n in this case _free The number of the fragmentation server is n, m is the maximum value of the number of the candidate fragmentation server, and free [ n ]]Refers to a second ratio of the sharded servers with the number n, where n may be the number of each sharded server.

As an embodiment, the predicted remaining disk amount may be calculated periodically by a fragmentation server, and the fragmentation server actively sends the predicted remaining disk amount to a configuration server after obtaining the predicted remaining disk amount, or the configuration server issues a disk predicted remaining amount calculation command, so that the fragmentation server calculates the predicted remaining disk amount after receiving the command and sends the predicted remaining disk amount to the configuration server.

As an embodiment, the obtaining manner of the predicted remaining amount of the disk of the candidate sharding server will be described in detail below, and will not be described herein again.

And step 10222, setting and calculating the first proportion and the second proportion to obtain the selected parameter.

As one example, the first ratio may be added to the second ratio to obtain the selected parameter.

The flow shown in fig. 5 is completed.

The manner of obtaining the predicted remaining disk amount in step 10221 will be described as follows:

calculating the disk usage amount of each data fragment stored by the fragment server in unit time, and calculating the disk usage amount of each data fragment copy stored by the fragment server in unit time;

as an embodiment, each sharding server may determine the data shards and the data shard copies stored on the sharding server by looking at the data shards and the data shard copies stored by the corresponding second storage sub-node.

As an embodiment, the disk usage amount per data slice or data slice copy in a unit time may be calculated by the following formula:

before running the formula, each data slice and data slice copy may be assigned a number for subsequent calculations. N here _j Refers to a data fragment or a data fragment copy with a number j on a fragment server with a number n, n _js The method is characterized in that the method refers to the size of a data fragment or a data fragment copy with the number j on a fragmentation server with the number n, createTime refers to the time when the data fragment or the data fragment copy is stored to the fragmentation server, currentTime refers to the time when the disk usage in unit time is calculated, m refers to the maximum value of the number of the fragmentation server, and use [ nj ]]The number n of the data shards or the disk usage amount of the data shard copies with the number j in the unit time on the shard server with the number j.

As an example, the unit of currentTime and createTime may be day, hour, minute, etc., and the application does not limit the unit.

And calculating the average usage amount of the disk in unit time according to the usage amounts of the disks of all the data fragments in unit time and the disk usage amounts of all the data fragment copies in unit time, and predicting the residual amount of the disk after the specified time is ended according to the average usage amount of the disk, the total disk capacity of the fragment server and the current disk occupancy amount of the fragment server.

As an embodiment, the calculation process of the predicted remaining amount of the disk includes:

step a 1: and calculating the average usage amount of the disk in the unit time of the fragment server, wherein the average usage amount of the disk in the unit time of the fragment server can be calculated based on the usage amount of the disk in the unit time of the data fragment or the data fragment copy. The average usage amount of the disk in unit time of the slicing server can be calculated by the following formula:

where m is the maximum value of the number of the sharding server, n _j Refers to the data fragment with the number j or the data fragment copy with the number n on the fragment server with the number n, use [ nj]The amount of disk usage in unit time for the data shard with the number j or the data shard copy on the shard server with the number n is average _n The average usage amount of the disk in unit time of the slicing server with the number n is referred to.

Step a 2: average is calculated in step a1 _n Based on the above, the predicted remaining amount of the disk can be calculated by the following formula:

total here _n Refers to the total disk capacity of the slicing server with the number n, n _j Refers to the number coding on the slicing server with the number nData slice or copy of data slice with number j, n _js Refers to the size of the data fragment with the number j or the data fragment copy on the fragment server with the number n, average _n The average usage amount of the disk in unit time of the slicing server with the number n is indicated, and d is the designated duration.

The manner of obtaining the predicted remaining disk amount in step 10221 is described above.

The data shard copy deployment method of the present application will be described below from a shard server side.

Referring to fig. 6, fig. 6 is a flowchart of a data shard copy deployment method provided in the present application, where the flowchart is applied to a shard server. The database cluster comprises a configuration server deployed with a configuration service process and a fragment server used for storing data fragments and data fragment copies, and further comprises a zookeeper server which is respectively connected with the configuration server and the fragment server.

Regarding the architecture of the database cluster and the data stored by each node, reference may be made to the description of the configuration server side for related contents, which is not described herein again.

As shown in fig. 6, the process may include the following steps:

step 201, monitoring a target second storage sub-node of a second storage node in the zookeeper server, wherein the target second storage sub-node is a pre-designated second storage sub-node which is monitored by the fragment server;

regarding the structure of the database cluster and the setting of the storage nodes in the zookeeper server, reference may be made to the description of the configuration server side for the related contents, which is not described herein again.

As an embodiment, the sharding server may be any sharding server in the database cluster, and the target second storage node is a storage node corresponding to the sharding server.

Step 202, when the fragmentation server monitors the registration information of the newly added data fragmentation copy of the target second storage sub-node, a port is allocated to the data fragmentation copy, and the port is used for operating the data fragmentation copy.

As an embodiment, the monitoring, by the fragment server, the registration information of the newly added data fragment copy to the target second storage sub-node may include:

step b 1: similar to the manner in which the configuration server listens for the first storage sub-node as described above. The fragment server may register the dispatcher on the target second storage sub-node, so as to receive an event notification sent by the dispatcher when the data stored in the subsequent target second storage sub-node changes. Still taking the non 2.com sub-node stored by the host node as the target second storage sub-node as an example, the sharding server may register a router on the non 2.com sub-node, so that when data of the non 2.com sub-node changes (i.e. when registration information of the shard2 is newly added), the non 2.com sub-node may send an event notification to the sharding server through the router.

Step b 2: the fragmentation server monitoring the target second storage sub-node only knows that the data of the target second storage sub-node is changed, but does not know what kind of change is specifically generated. Therefore, the fragment server needs to actively check the identifiers of all the data fragments stored by the target second storage sub-node, and the data fragments and the data fragment copy identifiers of the started database process stored by the second storage node, so as to determine the identifier of the un-started data fragment copy, that is, the identifier of the data fragment copy of the data fragment corresponding to the newly added first storage sub-node. For example, still taking the second storage sub-node as the node2.com sub-node as an example, the node stores shard1 and shard2, which means that the shard server identified as node2.com currently stores data shards or copies of data shards identified as shard1 and shard 2. The node2.com sub-node store also stores started and its associated shard1, which represents the identification of shard1 for the data shard or data shard copy of the currently started database process on the shard server identified as node 2.com. Com sub-node may further determine that the un-started data fragment copy is the shard2 by comparing the identities of all data fragments or data fragment copies stored by the non 2.com sub-node with the identities of the data fragments or data fragment copies of the started database process.

As an embodiment, the port allocated for the data fragment copy here is a port currently free on the target fragment server. How to select an output port from the ports in the idle state and allocate the output port to the data fragment copy may be to randomly select the ports, or may select the ports according to the sequence of the port numbers from large to small or according to the sequence of the port numbers from small to large.

The flow shown in fig. 6 is completed.

In this application, as an embodiment, when the configuration server obtains the predicted remaining amount of the disk of the sharding server, the sharding server may perform the following steps:

step c 1: calculating the disk usage amount of each data fragment stored by the fragment server in unit time, and calculating the disk usage amount of each data fragment copy stored by the fragment server in unit time; calculating the average usage amount of the disk in unit time according to the usage amounts of the disk in unit time of all the data fragments and the disk usage amount of all the data fragment copies in unit time, and predicting the residual amount of the disk after the specified time length is ended according to the average usage amount of the disk, the total disk capacity of the fragment server and the current disk occupancy amount of the fragment server;

step c 2: and sending the predicted disk prediction residual quantity to a configuration server.

For the process of calculating the predicted remaining amount of the disk by the fragmentation server, reference may be made to the description of the configuration server side for the relevant content, which is not described herein again.

How the sharding server obtains the predicted remaining amount of the disk is described above.

In this application, as an embodiment, after a target sharding server allocates a port for a data sharding copy, the target sharding server may start a database process corresponding to the data sharding copy, which is described in detail below:

after the data fragment copy is stored in the target fragment server and the port is allocated, the database process corresponding to the data fragment copy needs to be started and initialized to be used by the database cluster, so the fragment server needs to start the database process corresponding to the data fragment copy.

How the target sharded server starts the database process corresponding to the data sharded copy is described above.

In this application, as an embodiment, after the target fragment server starts the database process corresponding to the data fragment copy, the target fragment server may register the identifier of the target fragment server and the port corresponding to the data fragment copy in the newly added first storage sub-node, so that the routing server initializes the process, which is described in detail below:

the target fragment server registers the identifier and the port of the target fragment server into the newly added first storage sub-node, so that the subsequent routing server can judge whether to execute initialization on the database process corresponding to the data fragment copy according to the registration information in the newly added first storage sub-node. Referring to fig. 7, fig. 7 is a schematic view of another database cluster provided by the present application, where the database cluster further includes a routing server connected to the zookeeper server, and the routing server is configured to perform an initialization operation on a database process corresponding to a data shard copy.

In order to perform redundancy protection on the data fragments, one or more data fragment copies may be set for each data fragment, and after the database process corresponding to each data fragment copy corresponding to the data fragment is started, the database process corresponding to the data fragment copy may be initialized. Based on this, when the routing server monitors the newly added first storage sub-node, it may be determined whether the number of identifiers or the number of ports of the target fragment server stored by the newly added first storage sub-node is equal to the specified number; if so, initializing the database processes of all data fragment copies of the data fragment corresponding to the newly-added first storage sub-node. The specified number here is the number of copies of the data slice that is set in advance and is required to be set for the data slice. That is, the determination here is to determine whether the database processes of all the data fragment copies of the data fragment corresponding to the newly added first storage sub-node have been started, and when the number of identifiers or the number of ports of the target fragment server stored by the newly added first storage sub-node is equal to the specified number, it indicates that the database processes of the data fragment copies of the data fragment corresponding to the newly added first storage sub-node have been started, and at this time, the database processes may execute initialization operations at the same time.

Fig. 8 shows another arrangement of storage nodes in a zookeeper server. Referring to fig. 8, as an embodiment, the shard2 sub-node shown in fig. 8 may be used as the first storage sub-node, where the nodeb 1.com and 27019 stored in the shard2 sub-node may indicate that the original shard2 data fragment is stored on the nodeb 1.com shard server, and they are allocated with 27019 ports, and the nodeb 2.com and 27015 stored in the shard2 sub-node indicate that the shard2 data fragment copy is stored on the nodeb 2.com shard server, and they are allocated with 27015 ports, and the nodeb 3.com and 27016 stored in the shard2 sub-node indicate that the shard2 data fragment copy is stored on the nodeb 3.com shard server, and they are allocated with 27016 ports. The routing server may determine whether the identifier of the shard server stored in the shard2 sub-node is equal to the specified number 3, and if so, may start the database process corresponding to the shard2 on the nodeb 2.com shard server and the database process corresponding to the shard2 on the nodeb 3.com shard server.

It will be understood by those skilled in the art that the three shard server identifications and the three ports are only used for illustration, and the number of shard server identifications and ports stored by the first storage sub-node may be determined according to the number of preset data shard copies. In addition, which fragmentation server stores the data fragment and the data fragment copy to is also determined according to the actual situation, and the three fragmentation servers are only used for illustration and are not particularly limited.

How the routing server initializes the database corresponding to the data fragment copy is described above.

As an embodiment, the determining, by the routing server, a data fragment copy corresponding to the newly added first storage sub-node may include:

step d 1: similar to the way the target sharding server listens for the target second storage sub-node. The routing server may register a router on the first storage node, so that when the data stored by the first storage node changes, the first storage node sends an event notification to the target sharding server through the router.

For example, taking the shardlist node shown in fig. 8 as the first storage node, and taking the shard2 sub-node stored in the shardlist node as the newly added first storage sub-node as an example, the routing server may register a router on the shardlist node, so that when data of the shardlist node changes (that is, when the shard2 sub-node is newly added), the shardlist node may send an event notification to the target fragment server through the router.

Step d 2: after learning that the data of the first storage node changes, the routing server cannot directly learn what kind of change the data of the first storage node specifically changes. Therefore, the routing server needs to actively check the data fragment identifiers stored by all the first storage sub-nodes stored by the first storage node, and a third storage node, where the third storage node stores the identifier information of the data fragment copy that has completed the database process initialization operation. And further, the identification of the data fragment copy which does not complete the initialization of the database process can be determined by comparing the identifications. For example, the initialized node of FIG. 8, which stores shard1 indicating that the database process corresponding to the fragmented copy of data identified as shard1 has completed initialization, may be used as a third storage node. According to the shrard 1 sub-node and the shrard 2 sub-node stored in the first storage node, it can be known that the shrard 2 sub-node is the newly added first storage sub-node of the first storage node.

The above description describes how the routing server determines the newly added first storage sub-node of the first storage node.

By registering the fragment server information and the port information where the data fragment copies are located to corresponding nodes in the zookeeper server, unified search of the data fragment copies of the database processes corresponding to uninitialized data fragment copies can be realized, initialization operation of the database processes corresponding to a plurality of data fragment copies is further realized, and the efficiency of the initialization operation is improved.

Corresponding to the embodiment of the data fragment copy deployment method, the application also provides an embodiment of a data fragment copy deployment device. The data fragment copy deployment device is applied to a configuration server in a database cluster comprising the configuration server and a fragment server; the database cluster further comprises a zookeeper server, and the zookeeper server is respectively connected with the configuration server and the fragment server.

Referring to fig. 9, the data slice copy deployment apparatus includes:

an obtaining unit 910, configured to obtain a remaining amount of storage space of each sharded server in the database cluster;

a selecting unit 920, configured to acquire a data fragment identifier stored in a first storage sub-node and a fragment server identifier storing a data fragment when the first storage sub-node is newly added to a first storage node in a zookeeper server, and select a target fragment server for storing a data fragment copy from other fragment servers according to a storage space surplus of each fragment server, where the first storage node is used to store data fragment information, the first storage node includes at least one first storage sub-node, and each first storage sub-node corresponds to one data fragment;

the storage unit 930 is configured to store the data fragment copy to a target fragment server, and register an identifier of the data fragment copy to a target second storage sub-node in a second storage node in the zookeeper server, where the target second storage sub-node is a second storage sub-node that is pre-designated and is responsible for monitoring by the target fragment server, so that when the target fragment server monitors registration information of a new data fragment copy added to the target second storage sub-node, a port is allocated for the data fragment copy, and the port is used to operate the data fragment copy, where the second storage node is used to store information of the fragment server, the second storage node includes at least one second storage sub-node, and each second storage sub-node corresponds to one fragment server.

As an embodiment, the selecting unit 920 is specifically configured to select at least one candidate shard server that meets a specified rule from the shard servers;

determining a selected parameter of the candidate fragment server according to the residual amount of the storage space of the candidate fragment server, wherein the selected parameter is the basis for selecting the candidate fragment server as a target fragment server;

and selecting the target fragment server from all candidate fragment servers according to the selected parameters of the candidate fragment servers.

As one embodiment, the storage space remaining amount includes: and predicting the residual quantity and the residual quantity of the memory of the disk.

As an embodiment, in a case that the remaining storage space amount includes a predicted remaining disk amount and a remaining memory amount, the selecting unit 920 is specifically configured to calculate, for each candidate shard server, a first ratio of the predicted remaining disk amount of the candidate shard server to N, and a second ratio of the remaining memory amount of the candidate shard server to M, where N is a sum of the predicted remaining disk amounts of the candidate shard servers, and M is a sum of the remaining memory amounts of the shard servers;

and setting and calculating the first proportion and the second proportion to obtain the selected parameters.

As an embodiment, the predicted remaining amount of the disk is obtained by:

receiving the disk prediction residual quantity predicted by each fragment server sent by each fragment server;

the fragmentation server predicts the disk prediction residual quantity in the following mode: calculating the disk usage amount of each data fragment stored by the fragment server in unit time, and calculating the disk usage amount of each data fragment copy stored by the fragment server in unit time; and calculating the average usage amount of the disk in unit time according to the usage amounts of the disks of all the data fragments in unit time and the usage amounts of the disks of all the data fragment copies in unit time, and predicting the residual amount of the disk after the specified time is ended according to the average usage amount of the disk, the total disk capacity of the fragment server and the current disk occupancy amount prediction of the fragment server.

Corresponding to the embodiment of the data fragment copy deployment method, the application also provides an embodiment of a data fragment copy deployment device. The data clustering method is applied to a sharding server in a database cluster comprising a configuration server and the sharding server, the database cluster further comprises a zookeeper server, and the zookeeper server is respectively connected with the configuration server and the sharding server.

Referring to fig. 10, the data slice copy deployment apparatus includes:

a monitoring unit 1010, configured to monitor a target second storage sub-node of a second storage node in the zookeeper server, where the target second storage sub-node is a second storage sub-node that is pre-designated and is monitored by the sharding server;

a port allocating unit 1020, configured to allocate a port for a data fragment copy when the fragment server monitors registration information of a new data fragment copy added to a target second storage sub-node, where the port is used to operate the data fragment copy.

Referring to fig. 11, the data fragment copy deployment apparatus further includes, as an embodiment:

a calculating unit 1030, configured to calculate a disk usage amount of each data fragment stored in the fragment server in unit time, and calculate a disk usage amount of each data fragment copy stored in the fragment server in unit time; calculating the average usage amount of the disk in unit time according to the usage amounts of the disk in unit time of all the data fragments and the disk usage amount of all the data fragment copies in unit time, and predicting the residual amount of the disk after the specified time length is ended according to the average usage amount of the disk, the total disk capacity of the fragment server and the current disk occupancy amount prediction of the fragment server;

a sending unit 1040, configured to send the predicted remaining amount of the disk to the configuration server.

Please refer to fig. 12, which is a schematic diagram of a hardware structure of a data slice copy deployment device according to an embodiment of the present application. The data fragment copy deployment device is applied to a configuration server in a database cluster comprising the configuration server and a fragment server. The data slice replica deploying apparatus may comprise a processor 1201 and a machine-readable storage medium 1202 storing machine-executable instructions. The processor 1201 and the machine-readable storage medium 1202 may communicate via a system bus 1203. Also, by reading and executing machine-executable instructions in the machine-readable storage medium 1202 corresponding to the data processing logic, the processor 1201 may perform the above-described data shard copy deployment method applied to the configuration server.

The machine-readable storage medium 1202, as referred to herein, may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.

Please refer to fig. 13, which is a schematic diagram of a hardware structure of another data slice copy deployment apparatus according to an embodiment of the present application. The data fragment copy deployment device is applied to a fragment server in a database cluster comprising a configuration server and the fragment server. The data slice copy deployment apparatus may comprise a processor 1301, a machine-readable storage medium 1302 storing machine-executable instructions. The processor 1301 and the machine-readable storage medium 1302 may communicate via a system bus 1303. Also, by reading and executing machine-executable instructions in the machine-readable storage medium 1302 corresponding to the data processing logic, the processor 1301 may perform the above-described data shard copy deployment method applied to the shard server.

The machine-readable storage medium 1302 referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: RAM (random Access Memory), volatile Memory, non-volatile Memory, flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. A data fragment copy deployment method is applied to a configuration server in a database cluster comprising the configuration server and a fragment server, and is characterized in that the database cluster further comprises a zookeeper server which is respectively connected with the configuration server and the fragment server, and the method comprises the following steps:

when a first storage node in a zookeeper server is monitored to newly add a first storage sub-node, acquiring a data fragment identifier stored by the first storage sub-node and a fragment server identifier for storing the data fragment, and selecting a target fragment server for storing a data fragment copy from other fragment servers according to the storage space surplus of each fragment server, wherein the first storage node is used for storing data fragment information, the first storage node comprises at least one first storage sub-node, and each first storage sub-node corresponds to one data fragment;

2. The method according to claim 1, wherein the selecting a target sharding server for storing a data sharding copy from other sharding servers according to the remaining amount of storage space of each sharding server comprises:

selecting at least one candidate sharding server meeting a specified rule from all sharding servers;

determining a selected parameter of the candidate slicing server according to the storage space residual amount of the candidate slicing server, wherein the selected parameter is a basis for selecting the candidate slicing server as a target slicing server;

and selecting the target slicing server from all candidate slicing servers according to the selected parameters of the candidate slicing servers.

3. The method of claim 2, wherein the amount of storage space remaining comprises: predicting the residual quantity and the residual quantity of the memory of the magnetic disk;

determining the selected parameters of the candidate slicing server according to the storage space residual of the candidate slicing server comprises the following steps:

aiming at each candidate slicing server, calculating a first ratio of the disk prediction residual quantity of the candidate slicing server to N and calculating a second ratio of the memory residual quantity of the candidate slicing server to M, wherein N is the sum of the disk prediction residual quantities of the candidate slicing servers, and M is the sum of the memory residual quantities of the slicing servers;

and setting and calculating the first ratio and the second ratio to obtain the selected parameters.

4. The method of claim 3, wherein the predicted disk remaining amount is obtained by:

receiving the disk predicted residual quantity of the fragment server, which is sent by each fragment server;

5. A data fragment copy deployment method is applied to a fragment server in a database cluster comprising a configuration server and the fragment server, and is characterized in that the database cluster further comprises a zookeeper server which is respectively connected with the configuration server and the fragment server, and the method comprises the following steps:

6. The method of claim 5, further comprising:

calculating the disk usage amount of each data fragment stored by the fragment server in unit time, and calculating the disk usage amount of each data fragment copy stored by the fragment server in unit time; calculating the average usage amount of the disk in unit time according to the usage amounts of the disk in unit time of all the data fragments and the disk usage amount of all the data fragment copies in unit time, and predicting the residual amount of the disk after the specified time length is ended according to the average usage amount of the disk, the total disk capacity of the fragment server and the current disk occupancy amount prediction of the fragment server;

and sending the predicted disk prediction residual quantity to the configuration server.

7. A data fragment copy deployment device is applied to a configuration server in a database cluster comprising the configuration server and a fragment server, and is characterized in that the database cluster further comprises a zookeeper server which is respectively connected with the configuration server and the fragment server, and the device comprises:

the data fragment server comprises a selecting unit, a data fragment copy selecting unit and a fragment copy selecting unit, wherein the selecting unit is used for acquiring a data fragment identifier stored by a first storage node and a fragment server identifier for storing a data fragment when the first storage node in the zookeeper server is newly added with the first storage node, and selecting a target fragment server for storing a data fragment copy from other fragment servers according to the storage space surplus of each fragment server, wherein the first storage node is used for storing data fragment information, the first storage node comprises at least one first storage node, and each first storage node corresponds to one data fragment;

8. The apparatus of claim 7,

the selection unit is specifically configured to select at least one candidate sharded server that meets a specified rule from the sharded servers;

9. The apparatus of claim 8, wherein the amount of storage space remaining comprises: predicting the residual quantity and the residual quantity of the memory of the magnetic disk;

the selection unit is specifically configured to calculate, for each candidate shard server, a first ratio of the predicted residual of the disk of the candidate shard server to N, and a second ratio of the residual of the memory of the candidate shard server to M, where N is a sum of the predicted residual of the disk of the candidate shard server, and M is a sum of the residual of the memory of the shard server;

and setting and calculating the first ratio and the second ratio to obtain the selected parameter.

10. The apparatus of claim 9, wherein the predicted disk remaining amount is obtained by:

11. A data fragment copy deployment device is applied to a fragment server in a database cluster comprising a configuration server and a fragment server, and is characterized in that the database cluster further comprises a zookeeper server, the zookeeper server is respectively connected with the configuration server and the fragment server, and the device comprises:

12. The apparatus of claim 11, further comprising:

the computing unit is used for computing the disk usage amount of each data fragment stored by the fragment server in unit time and computing the disk usage amount of each data fragment copy stored by the fragment server in unit time; calculating the average usage amount of the disk in unit time according to the usage amounts of the disk in unit time of all the data fragments and the disk usage amount of all the data fragment copies in unit time, and predicting the residual amount of the disk after the specified time length is ended according to the average usage amount of the disk, the total disk capacity of the fragment server and the current disk occupancy amount prediction of the fragment server;

and the sending unit is used for sending the predicted disk prediction residual quantity to the configuration server.