CN110333986B - Method for guaranteeing availability of redis cluster - Google Patents

Method for guaranteeing availability of redis cluster Download PDF

Info

Publication number
CN110333986B
CN110333986B CN201910530849.2A CN201910530849A CN110333986B CN 110333986 B CN110333986 B CN 110333986B CN 201910530849 A CN201910530849 A CN 201910530849A CN 110333986 B CN110333986 B CN 110333986B
Authority
CN
China
Prior art keywords
cluster
redis
node
master
redis cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910530849.2A
Other languages
Chinese (zh)
Other versions
CN110333986A (en
Inventor
陈小杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai 2345 Network Technology Co ltd
Original Assignee
Shanghai 2345 Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai 2345 Network Technology Co ltd filed Critical Shanghai 2345 Network Technology Co ltd
Priority to CN201910530849.2A priority Critical patent/CN110333986B/en
Publication of CN110333986A publication Critical patent/CN110333986A/en
Application granted granted Critical
Publication of CN110333986B publication Critical patent/CN110333986B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/563Data redirection of data network streams

Abstract

The invention relates to the technical field of redis clusters, in particular to a method for guaranteeing the availability of a redis cluster, which comprises the steps of storing the node information of the redis cluster and deleting a master-slave node of downtime from the redis cluster; re-adding dump.rdb files and redis examples of the downed main node into the redis cluster, configuring the weight of the re-added redis examples to be 0, and configuring other nodes according to the set weight; delete the re-added redis instance and clean up the rdb file. Compared with the prior art, the invention has the advantages that: the method for cluster migration is provided, when a master node and a slave node in a redis cluster are down, the whole cluster can continue to provide storage service to the outside, the unavailable state of the redis cluster when faults occur is solved, and preferably, the method for cluster monitoring is provided, the availability of the redis node and the information of the redis node can be monitored, and the method for cluster migration is provided, when the master node and the slave node in the redis cluster are restored to a normal state from the down state, the redis cluster is restored again.

Description

Method for guaranteeing availability of redis cluster
Technical Field
The invention relates to the technical field of redis clusters, in particular to a method for guaranteeing the availability of a redis cluster.
Background
Redis introduced the cluster functionality starting from version 3.0, to which version 3.2 cluster functionality had stabilized. Redis clusters provide a set of programs that provide sharing of data among multiple nodes between Redis. The Redis cluster is mainly characterized in that a data partition mode and a master-slave mode are adopted, a hash groove is introduced through the data partition mode, the hash groove is divided into different nodes, and data are respectively divided into different grooves, so that the aim of dividing the data into different nodes is fulfilled, the data pressure of a single node is reduced, and the scalability of the cluster is better; through the master-slave mode, when the master node is down, the slave node of the master node is elected to serve as a new master node to continue providing service, so that the availability of the cluster is improved.
However, although there are many groups of master-slave nodes in the redis cluster, as long as one group of master-slave nodes is hung up, the whole cluster is down and cannot provide services to the outside, which is unacceptable for one cluster, and the reason for the problem is that data partitioning is that if a certain master-slave node is hung up, slots existing in the node are not existed, and data stored in the slots cannot be stored and acquired. Therefore, there is a need to devise a method of guaranteeing the availability of redis clusters.
Disclosure of Invention
The invention aims to solve the defects of the prior art and provides a method for guaranteeing the availability of redis clusters, wherein the service provided by the clusters is not interrupted in the process of cluster migration.
In order to achieve the above object, a method for guaranteeing availability of redis clusters is designed, wherein the method comprises a cluster migration method, and the cluster migration method specifically comprises the following steps: step a, storing the node information of the redis cluster, and deleting the master-slave node of the downtime from the redis cluster; b, re-adding dump.rdb files and redis examples of the downed main node into the redis cluster, and configuring the weight of the re-added redis examples to be 0, wherein other nodes are still configured according to the set weight; deleting the re-added redis instance, and cleaning the rdb file of the deleted redis instance.
The method adopts SNMP and CTDB to form a cluster monitoring system to realize the monitoring of clusters, and the specific method is as follows: the SNMP client is deployed on each node of the redis cluster to acquire each item of important parameter information and cluster node information (cluster nodes) of the redis cluster, each item of important parameter information and cluster node information are sent to the SNMP server, each item of important parameter information and cluster node information are synchronized into the redis cluster by the SNMP server, and the SNMP server is deployed in the redis cluster to be high-availability through a CTDB.
When the cluster state is unavailable, if more than half of the nodes of the redis cluster are in the downtime state or the rest of the main nodes of the redis cluster are less than 3 or only three main nodes of the redis cluster are provided, at the moment, the cluster state does not meet the starting condition of the redis cluster, the cluster fails and can not provide service, only alarm information is sent, and the steps a-c are not carried out. Otherwise, the processing of step a-step c is performed.
The method also comprises a cluster returning method, and the cluster returning method specifically comprises the following steps: step d, adding the master-slave node into the redis cluster, and configuring a master-slave mode; and e, calling a provided redis-trie-rb script of the redis cluster, executing a rebaance command, adding use-empty-masters parameters and configuring weights to balance slot allocation in the redis cluster.
The method continuously sends a test packet to the down master node, and when the down master node is restarted and at least one slave node of the master node is started, the cluster migration method is implemented.
Compared with the prior art, the invention has the advantages that: the method for cluster migration is provided, when a master node and a slave node in a redis cluster are down, the whole cluster can continue to provide storage service to the outside, the unavailable state of the redis cluster when faults occur is solved, and preferably, the method for cluster monitoring is provided, the availability of the redis node and the information of the redis node can be monitored, and the method for cluster migration is provided, when the master node and the slave node in the redis cluster are restored to a normal state from the down state, the redis cluster is restored again.
Drawings
FIG. 1 is a schematic overall plan view of the method of the present invention in one embodiment.
FIG. 2 is a flow chart of a cluster migration method according to an embodiment of the present invention.
FIG. 3 is a flow chart of a cluster migration method according to an embodiment of the invention.
Detailed Description
The principles of this method will be apparent to those skilled in the art from the following description of the invention. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, the present embodiment includes three major parts, namely, cluster monitoring, cluster migration and cluster back migration, and first, the method for cluster monitoring in the present embodiment is described as follows: we provide a distributed monitoring system by adopting SNMP (simple network management protocol) +ctdb (CTDB is a cluster database component in cluster Samba, which can provide high availability load sharing CIFS server cluster), and the specific steps of the monitoring method are as follows:
1. the SNMP client Agent is deployed on each node, collects information, and obtains the redis cluster information by using a command provided by the redis cluster itself, in the embodiment, the following 2 commands are adopted, and according to a set time interval, the following two types of redis cluster information are obtained, so that backup is reserved for the subsequent recovery of data to a state before downtime. In the two types of redis cluster information, the first type is important parameter information of the cluster, the second type is cluster node information, the two types of redis cluster information are commonly used for realizing monitoring, and the state of the whole cluster is known to be normal or not when each node information needs to be monitored. The cluster node information is also used for migration, and the information of each node is stored to prevent loss.
1> obtaining important parameter information of a cluster through cluster info, including: the state of the current cluster (cluster_state, ok is normal, fail is abnormal), the number of master nodes of the current cluster, and the size of the cluster (cluster_size and cluster_knownnodes).
2> obtaining cluster node information by cluster nodes, comprising: id, ip, port, master-slave information, connection state, slot position and other information of each node of the redis cluster.
The Server program of SNMP receives the information collected by each Agent, and synchronizes the information into the redis cluster after analyzing the real information of the cluster.
2. Meanwhile, the CTDB is utilized to deploy the SNMP servers as high availability in the distributed redis cluster. The CTDB is a TDB database which spans a plurality of nodes and has consistent data and consistent locks, and is used for automatically switching to other nodes managed by the CTDB when the service end of one node is down, so that the availability of the monitoring system is ensured. And, preferably, the data can also be presented using a computer graphical interface.
Next, referring to fig. 2, the cluster migration method in the present embodiment is described as follows: when the cluster monitoring system finds that the cluster state is fail (unavailable) and a certain group of master-slave nodes are in a downtime state, the cluster monitoring system enters a cluster migration stage of a fault state.
1> firstly, judging whether the main nodes of the cluster are down by more than half or only three main nodes or less than three residual main nodes in redis, and if so, directly returning, not processing, only sending an alarm mail and not carrying out subsequent processing.
2> saving the information of the redis cluster before migration, and calling the redis-trieb. Rb script to delete the master-slave node of downtime from the redis cluster.
3> the cluster starts the redis examples on each node, the cluster automatically restores the data to the pre-downtime state by using dump.rdb files (disk-falling redis database files of the redis cluster in a disk), and finally, the redis-trieb.rb script is called to add the examples to the redis cluster.
4> calling the provided redis-trieb.rb script of the redis cluster, executing a rebalance command, configuring the weight of the redis instance started in 3> to be 0, and balancing the slot allocation in the redis cluster by other nodes according to the weight in the configuration file.
5> calling the provided redis-trieb. Rb script of the redis cluster, deleting the redis instances started in 3>, closing the instances and cleaning the rdb file, and completing the cluster migration.
Finally, referring to fig. 3, the cluster migration method in the present embodiment is described as follows: the monitoring system continuously sends heartbeat (sends test packets) to the downed master node, and when the redis master node is found to be restarted normally and at least one slave node of the master node is started, the cluster returning stage is entered, and the specific steps are as follows.
1> first, add master-slave redis nodes into the cluster, and configure master-slave mode in the save configuration.
2> calling a redis-trie. Rb script provided by the redis cluster, executing a rebaance command, adding use-empty-masters parameters (script operation for configuring weights), configuring weights, and balancing slot allocation in the redis cluster, so that cluster migration is completed.
Example 1
The specific steps of the embodiment are as follows, and the redis cluster of the embodiment has 6 main nodes, so that the embodiment guarantees the availability of the redis cluster and reserves the data before downtime of the redis cluster.
1> judging whether the main nodes of the cluster are down by more than three or only three main nodes or the rest main nodes are not three enough, if so, the cluster is unavailable, directly returns, only sends an alarm mail, and does not carry out subsequent processing.
2> in this embodiment, two master nodes covering the redis cluster are in a down state, at this time, redis cluster information before migration is saved, and a redis-trie. Rb script is called to delete the down master-slave node from the redis cluster.
2.1> first step, an add-node instruction of the redis-trieb. Rb script is called, and dump. Rdb file start numbers and redis instances on the down 2 main nodes in the cluster are added to the redis cluster again.
2.2> in the second step, a rebalance instruction of the redis cluster redis-trie. Rb script is called, the weight of 2 redis instances started in 2.1> is configured to be 0, and other nodes balance the slot allocation in the redis cluster according to the weight in the configuration file.
2.3> third step, call del-node instruction of redis cluster redis-trie. Rb script, delete redis instance started in 2.1>, then close these instances and clean rdb file, so that cluster migration is completed.

Claims (4)

1. A method for guaranteeing availability of redis clusters is characterized by comprising a cluster migration method and a cluster returning method, wherein the cluster migration method comprises the following steps:
step a, storing the node information of the redis cluster, and deleting the master-slave node of the downtime from the redis cluster;
b, re-adding dump.rdb files and redis examples of the downed main node into the redis cluster, and configuring the weight of the re-added redis examples to be 0, wherein other nodes are still configured according to the set weight;
deleting the re-added redis instance, and cleaning the rdb file of the deleted redis instance;
the cluster migration method specifically comprises the following steps:
step d, adding the master-slave node into the redis cluster, and configuring a master-slave mode;
and e, calling a provided redis-trie-rb script of the redis cluster, executing a rebaance command, adding use-empty-masters parameters and configuring weights to balance slot allocation in the redis cluster.
2. The method for guaranteeing availability of redis clusters according to claim 1, wherein the method adopts SNMP and CTDB to form a cluster monitoring system to monitor clusters, and the specific method is as follows:
the SNMP client is deployed on each node of the redis cluster to acquire each item of important parameter information and each cluster node information of the redis cluster, each item of important parameter information and each cluster node information are sent to the SNMP server, each item of important parameter information and each cluster node information are synchronized into the redis cluster by the SNMP server, and the SNMP server is deployed in the redis cluster to be high-availability through the CTDB.
3. The method for guaranteeing availability of redis cluster according to claim 1 or 2, wherein when the cluster status is unavailable, if more than half nodes of the redis cluster are in a down status or the remaining master nodes of the redis cluster are less than 3 or only three master nodes of the redis cluster, an alarm message is sent, and step a-step c is not performed, otherwise, the processing of step a-step c is performed.
4. The method for guaranteeing availability of redis cluster according to claim 1, wherein the test packet is continuously sent to the downed master node, and the cluster migration method is implemented when the downed master node is restarted and at least one slave node of the master node is started.
CN201910530849.2A 2019-06-19 2019-06-19 Method for guaranteeing availability of redis cluster Active CN110333986B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910530849.2A CN110333986B (en) 2019-06-19 2019-06-19 Method for guaranteeing availability of redis cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910530849.2A CN110333986B (en) 2019-06-19 2019-06-19 Method for guaranteeing availability of redis cluster

Publications (2)

Publication Number Publication Date
CN110333986A CN110333986A (en) 2019-10-15
CN110333986B true CN110333986B (en) 2023-12-29

Family

ID=68142501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910530849.2A Active CN110333986B (en) 2019-06-19 2019-06-19 Method for guaranteeing availability of redis cluster

Country Status (1)

Country Link
CN (1) CN110333986B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111212145A (en) * 2020-01-09 2020-05-29 国网福建省电力有限公司 Redis cluster for power supply service command system
CN111565229B (en) * 2020-04-29 2020-11-27 创盛视联数码科技(北京)有限公司 Communication system distributed method based on Redis
CN112000515A (en) * 2020-08-07 2020-11-27 北京浪潮数据技术有限公司 Method and assembly for recovering instance data in redis cluster

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6986076B1 (en) * 2002-05-28 2006-01-10 Unisys Corporation Proactive method for ensuring availability in a clustered system
KR20090061522A (en) * 2007-12-11 2009-06-16 한국전자통신연구원 Large scale cluster monitoring system, and automatic building and restoration method thereof
CN102231681A (en) * 2011-06-27 2011-11-02 中国建设银行股份有限公司 High availability cluster computer system and fault treatment method thereof
CN105141456A (en) * 2015-08-25 2015-12-09 山东超越数控电子有限公司 Method for monitoring high-availability cluster resource
CN106301938A (en) * 2016-08-25 2017-01-04 成都索贝数码科技股份有限公司 A kind of high availability and the data base cluster system of strong consistency and node administration method thereof
CN108833503A (en) * 2018-05-29 2018-11-16 华南理工大学 A kind of Redis cluster method based on ZooKeeper
CN109783564A (en) * 2019-01-28 2019-05-21 上海雷腾软件股份有限公司 Support the distributed caching method and equipment of multinode
CN109815049A (en) * 2017-11-21 2019-05-28 北京金山云网络技术有限公司 Node delay machine restoration methods, device, electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9063939B2 (en) * 2011-11-03 2015-06-23 Zettaset, Inc. Distributed storage medium management for heterogeneous storage media in high availability clusters
US9338254B2 (en) * 2013-01-09 2016-05-10 Microsoft Corporation Service migration across cluster boundaries

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6986076B1 (en) * 2002-05-28 2006-01-10 Unisys Corporation Proactive method for ensuring availability in a clustered system
KR20090061522A (en) * 2007-12-11 2009-06-16 한국전자통신연구원 Large scale cluster monitoring system, and automatic building and restoration method thereof
CN102231681A (en) * 2011-06-27 2011-11-02 中国建设银行股份有限公司 High availability cluster computer system and fault treatment method thereof
CN105141456A (en) * 2015-08-25 2015-12-09 山东超越数控电子有限公司 Method for monitoring high-availability cluster resource
CN106301938A (en) * 2016-08-25 2017-01-04 成都索贝数码科技股份有限公司 A kind of high availability and the data base cluster system of strong consistency and node administration method thereof
CN109815049A (en) * 2017-11-21 2019-05-28 北京金山云网络技术有限公司 Node delay machine restoration methods, device, electronic equipment and storage medium
CN108833503A (en) * 2018-05-29 2018-11-16 华南理工大学 A kind of Redis cluster method based on ZooKeeper
CN109783564A (en) * 2019-01-28 2019-05-21 上海雷腾软件股份有限公司 Support the distributed caching method and equipment of multinode

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A new formalism for dynamic reconfiguration of data servers in a cluster;María S. Pérez, Alberto Sánchez, José M. Peña, Víctor Robles;Journal of Parallel and Distributed Computing;第65卷(第10期);全文 *
Redis集群可靠性的研究与优化;李燚;顾乃杰;黄增士;任开新;计算机工程(第05期);全文 *
一种改进的主从节点选举算法用于实现集群负载均衡;任乐乐;何灵敏;中国计量学院学报(第03期);全文 *

Also Published As

Publication number Publication date
CN110333986A (en) 2019-10-15

Similar Documents

Publication Publication Date Title
CN110333986B (en) Method for guaranteeing availability of redis cluster
CN111290834B (en) Method, device and equipment for realizing high service availability based on cloud management platform
CN109286529A (en) A kind of method and system for restoring RabbitMQ network partition
CN107404509B (en) Distributed service configuration system and information management method
CN102394914A (en) Cluster brain-split processing method and device
CN108173971A (en) A kind of MooseFS high availability methods and system based on active-standby switch
US10652100B2 (en) Computer system and method for dynamically adapting a software-defined network
CN110971662A (en) Two-node high-availability implementation method and device based on Ceph
CN106464516B (en) Event handling in a network management system
CN111935244B (en) Service request processing system and super-integration all-in-one machine
CN106021070A (en) Method and device for server cluster monitoring
CN114116912A (en) Method for realizing high availability of database based on Keepalived
CN117130730A (en) Metadata management method for federal Kubernetes cluster
CN113489149B (en) Power grid monitoring system service master node selection method based on real-time state sensing
CN105490847A (en) Real-time detecting and processing method of node failure in private cloud storage system
CN113835834A (en) K8S container cluster-based computing node capacity expansion method and system
CN110290163A (en) A kind of data processing method and device
CN116185697B (en) Container cluster management method, device and system, electronic equipment and storage medium
CN111092754B (en) Real-time access service system and implementation method thereof
CN113872997A (en) Container group POD reconstruction method based on container cluster service and related equipment
CN116723077A (en) Distributed IT automatic operation and maintenance system
CN114338670B (en) Edge cloud platform and network-connected traffic three-level cloud control platform with same
CN110569303B (en) MySQL application layer high-availability system and method suitable for various cloud environments
CN115391058A (en) SDN-based resource event processing method, resource creating method and system
CN114020279A (en) Application software distributed deployment method, system, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant