CN110333986B

CN110333986B - Method for guaranteeing availability of redis cluster

Info

Publication number: CN110333986B
Application number: CN201910530849.2A
Authority: CN
Inventors: 陈小杰
Original assignee: Shanghai 2345 Network Technology Co ltd
Current assignee: Shanghai 2345 Network Technology Co ltd
Priority date: 2019-06-19
Filing date: 2019-06-19
Publication date: 2023-12-29
Anticipated expiration: 2039-06-19
Also published as: CN110333986A

Abstract

The invention relates to the technical field of redis clusters, in particular to a method for guaranteeing the availability of a redis cluster, which comprises the steps of storing the node information of the redis cluster and deleting a master-slave node of downtime from the redis cluster; re-adding dump.rdb files and redis examples of the downed main node into the redis cluster, configuring the weight of the re-added redis examples to be 0, and configuring other nodes according to the set weight; delete the re-added redis instance and clean up the rdb file. Compared with the prior art, the invention has the advantages that: the method for cluster migration is provided, when a master node and a slave node in a redis cluster are down, the whole cluster can continue to provide storage service to the outside, the unavailable state of the redis cluster when faults occur is solved, and preferably, the method for cluster monitoring is provided, the availability of the redis node and the information of the redis node can be monitored, and the method for cluster migration is provided, when the master node and the slave node in the redis cluster are restored to a normal state from the down state, the redis cluster is restored again.

Description

Method for guaranteeing availability of redis cluster

Technical Field

The invention relates to the technical field of redis clusters, in particular to a method for guaranteeing the availability of a redis cluster.

Background

Redis introduced the cluster functionality starting from version 3.0, to which version 3.2 cluster functionality had stabilized. Redis clusters provide a set of programs that provide sharing of data among multiple nodes between Redis. The Redis cluster is mainly characterized in that a data partition mode and a master-slave mode are adopted, a hash groove is introduced through the data partition mode, the hash groove is divided into different nodes, and data are respectively divided into different grooves, so that the aim of dividing the data into different nodes is fulfilled, the data pressure of a single node is reduced, and the scalability of the cluster is better; through the master-slave mode, when the master node is down, the slave node of the master node is elected to serve as a new master node to continue providing service, so that the availability of the cluster is improved.

However, although there are many groups of master-slave nodes in the redis cluster, as long as one group of master-slave nodes is hung up, the whole cluster is down and cannot provide services to the outside, which is unacceptable for one cluster, and the reason for the problem is that data partitioning is that if a certain master-slave node is hung up, slots existing in the node are not existed, and data stored in the slots cannot be stored and acquired. Therefore, there is a need to devise a method of guaranteeing the availability of redis clusters.

Disclosure of Invention

The invention aims to solve the defects of the prior art and provides a method for guaranteeing the availability of redis clusters, wherein the service provided by the clusters is not interrupted in the process of cluster migration.

In order to achieve the above object, a method for guaranteeing availability of redis clusters is designed, wherein the method comprises a cluster migration method, and the cluster migration method specifically comprises the following steps: step a, storing the node information of the redis cluster, and deleting the master-slave node of the downtime from the redis cluster; b, re-adding dump.rdb files and redis examples of the downed main node into the redis cluster, and configuring the weight of the re-added redis examples to be 0, wherein other nodes are still configured according to the set weight; deleting the re-added redis instance, and cleaning the rdb file of the deleted redis instance.

The method adopts SNMP and CTDB to form a cluster monitoring system to realize the monitoring of clusters, and the specific method is as follows: the SNMP client is deployed on each node of the redis cluster to acquire each item of important parameter information and cluster node information (cluster nodes) of the redis cluster, each item of important parameter information and cluster node information are sent to the SNMP server, each item of important parameter information and cluster node information are synchronized into the redis cluster by the SNMP server, and the SNMP server is deployed in the redis cluster to be high-availability through a CTDB.

When the cluster state is unavailable, if more than half of the nodes of the redis cluster are in the downtime state or the rest of the main nodes of the redis cluster are less than 3 or only three main nodes of the redis cluster are provided, at the moment, the cluster state does not meet the starting condition of the redis cluster, the cluster fails and can not provide service, only alarm information is sent, and the steps a-c are not carried out. Otherwise, the processing of step a-step c is performed.

The method also comprises a cluster returning method, and the cluster returning method specifically comprises the following steps: step d, adding the master-slave node into the redis cluster, and configuring a master-slave mode; and e, calling a provided redis-trie-rb script of the redis cluster, executing a rebaance command, adding use-empty-masters parameters and configuring weights to balance slot allocation in the redis cluster.

The method continuously sends a test packet to the down master node, and when the down master node is restarted and at least one slave node of the master node is started, the cluster migration method is implemented.

Compared with the prior art, the invention has the advantages that: the method for cluster migration is provided, when a master node and a slave node in a redis cluster are down, the whole cluster can continue to provide storage service to the outside, the unavailable state of the redis cluster when faults occur is solved, and preferably, the method for cluster monitoring is provided, the availability of the redis node and the information of the redis node can be monitored, and the method for cluster migration is provided, when the master node and the slave node in the redis cluster are restored to a normal state from the down state, the redis cluster is restored again.

Drawings

FIG. 1 is a schematic overall plan view of the method of the present invention in one embodiment.

FIG. 2 is a flow chart of a cluster migration method according to an embodiment of the present invention.

FIG. 3 is a flow chart of a cluster migration method according to an embodiment of the invention.

Detailed Description

The principles of this method will be apparent to those skilled in the art from the following description of the invention. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, the present embodiment includes three major parts, namely, cluster monitoring, cluster migration and cluster back migration, and first, the method for cluster monitoring in the present embodiment is described as follows: we provide a distributed monitoring system by adopting SNMP (simple network management protocol) +ctdb (CTDB is a cluster database component in cluster Samba, which can provide high availability load sharing CIFS server cluster), and the specific steps of the monitoring method are as follows:

1. the SNMP client Agent is deployed on each node, collects information, and obtains the redis cluster information by using a command provided by the redis cluster itself, in the embodiment, the following 2 commands are adopted, and according to a set time interval, the following two types of redis cluster information are obtained, so that backup is reserved for the subsequent recovery of data to a state before downtime. In the two types of redis cluster information, the first type is important parameter information of the cluster, the second type is cluster node information, the two types of redis cluster information are commonly used for realizing monitoring, and the state of the whole cluster is known to be normal or not when each node information needs to be monitored. The cluster node information is also used for migration, and the information of each node is stored to prevent loss.

1> obtaining important parameter information of a cluster through cluster info, including: the state of the current cluster (cluster_state, ok is normal, fail is abnormal), the number of master nodes of the current cluster, and the size of the cluster (cluster_size and cluster_knownnodes).

2> obtaining cluster node information by cluster nodes, comprising: id, ip, port, master-slave information, connection state, slot position and other information of each node of the redis cluster.

The Server program of SNMP receives the information collected by each Agent, and synchronizes the information into the redis cluster after analyzing the real information of the cluster.

2. Meanwhile, the CTDB is utilized to deploy the SNMP servers as high availability in the distributed redis cluster. The CTDB is a TDB database which spans a plurality of nodes and has consistent data and consistent locks, and is used for automatically switching to other nodes managed by the CTDB when the service end of one node is down, so that the availability of the monitoring system is ensured. And, preferably, the data can also be presented using a computer graphical interface.

Next, referring to fig. 2, the cluster migration method in the present embodiment is described as follows: when the cluster monitoring system finds that the cluster state is fail (unavailable) and a certain group of master-slave nodes are in a downtime state, the cluster monitoring system enters a cluster migration stage of a fault state.

1> firstly, judging whether the main nodes of the cluster are down by more than half or only three main nodes or less than three residual main nodes in redis, and if so, directly returning, not processing, only sending an alarm mail and not carrying out subsequent processing.

2> saving the information of the redis cluster before migration, and calling the redis-trieb. Rb script to delete the master-slave node of downtime from the redis cluster.

3> the cluster starts the redis examples on each node, the cluster automatically restores the data to the pre-downtime state by using dump.rdb files (disk-falling redis database files of the redis cluster in a disk), and finally, the redis-trieb.rb script is called to add the examples to the redis cluster.

4> calling the provided redis-trieb.rb script of the redis cluster, executing a rebalance command, configuring the weight of the redis instance started in 3> to be 0, and balancing the slot allocation in the redis cluster by other nodes according to the weight in the configuration file.

5> calling the provided redis-trieb. Rb script of the redis cluster, deleting the redis instances started in 3>, closing the instances and cleaning the rdb file, and completing the cluster migration.

Finally, referring to fig. 3, the cluster migration method in the present embodiment is described as follows: the monitoring system continuously sends heartbeat (sends test packets) to the downed master node, and when the redis master node is found to be restarted normally and at least one slave node of the master node is started, the cluster returning stage is entered, and the specific steps are as follows.

1> first, add master-slave redis nodes into the cluster, and configure master-slave mode in the save configuration.

2> calling a redis-trie. Rb script provided by the redis cluster, executing a rebaance command, adding use-empty-masters parameters (script operation for configuring weights), configuring weights, and balancing slot allocation in the redis cluster, so that cluster migration is completed.

Example 1

The specific steps of the embodiment are as follows, and the redis cluster of the embodiment has 6 main nodes, so that the embodiment guarantees the availability of the redis cluster and reserves the data before downtime of the redis cluster.

1> judging whether the main nodes of the cluster are down by more than three or only three main nodes or the rest main nodes are not three enough, if so, the cluster is unavailable, directly returns, only sends an alarm mail, and does not carry out subsequent processing.

2> in this embodiment, two master nodes covering the redis cluster are in a down state, at this time, redis cluster information before migration is saved, and a redis-trie. Rb script is called to delete the down master-slave node from the redis cluster.

2.1> first step, an add-node instruction of the redis-trieb. Rb script is called, and dump. Rdb file start numbers and redis instances on the down 2 main nodes in the cluster are added to the redis cluster again.

2.2> in the second step, a rebalance instruction of the redis cluster redis-trie. Rb script is called, the weight of 2 redis instances started in 2.1> is configured to be 0, and other nodes balance the slot allocation in the redis cluster according to the weight in the configuration file.

2.3> third step, call del-node instruction of redis cluster redis-trie. Rb script, delete redis instance started in 2.1>, then close these instances and clean rdb file, so that cluster migration is completed.

Claims

1. A method for guaranteeing availability of redis clusters is characterized by comprising a cluster migration method and a cluster returning method, wherein the cluster migration method comprises the following steps:

step a, storing the node information of the redis cluster, and deleting the master-slave node of the downtime from the redis cluster;

b, re-adding dump.rdb files and redis examples of the downed main node into the redis cluster, and configuring the weight of the re-added redis examples to be 0, wherein other nodes are still configured according to the set weight;

deleting the re-added redis instance, and cleaning the rdb file of the deleted redis instance;

the cluster migration method specifically comprises the following steps:

step d, adding the master-slave node into the redis cluster, and configuring a master-slave mode;

and e, calling a provided redis-trie-rb script of the redis cluster, executing a rebaance command, adding use-empty-masters parameters and configuring weights to balance slot allocation in the redis cluster.

2. The method for guaranteeing availability of redis clusters according to claim 1, wherein the method adopts SNMP and CTDB to form a cluster monitoring system to monitor clusters, and the specific method is as follows:

the SNMP client is deployed on each node of the redis cluster to acquire each item of important parameter information and each cluster node information of the redis cluster, each item of important parameter information and each cluster node information are sent to the SNMP server, each item of important parameter information and each cluster node information are synchronized into the redis cluster by the SNMP server, and the SNMP server is deployed in the redis cluster to be high-availability through the CTDB.

3. The method for guaranteeing availability of redis cluster according to claim 1 or 2, wherein when the cluster status is unavailable, if more than half nodes of the redis cluster are in a down status or the remaining master nodes of the redis cluster are less than 3 or only three master nodes of the redis cluster, an alarm message is sent, and step a-step c is not performed, otherwise, the processing of step a-step c is performed.

4. The method for guaranteeing availability of redis cluster according to claim 1, wherein the test packet is continuously sent to the downed master node, and the cluster migration method is implemented when the downed master node is restarted and at least one slave node of the master node is started.