WO2024051577A1 - 分布式***部署方法、配置方法、***、设备及介质 - Google Patents

分布式***部署方法、配置方法、***、设备及介质 Download PDF

Info

Publication number
WO2024051577A1
WO2024051577A1 PCT/CN2023/116224 CN2023116224W WO2024051577A1 WO 2024051577 A1 WO2024051577 A1 WO 2024051577A1 CN 2023116224 W CN2023116224 W CN 2023116224W WO 2024051577 A1 WO2024051577 A1 WO 2024051577A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
regulator
load balancer
server
distributed system
Prior art date
Application number
PCT/CN2023/116224
Other languages
English (en)
French (fr)
Inventor
赖相旭
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2024051577A1 publication Critical patent/WO2024051577A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0823Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/70Admission control; Resource allocation
    • H04L47/78Architectures of resource allocation
    • H04L47/783Distributed allocation of resources, e.g. bandwidth brokers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing

Definitions

  • This application relates to the field of distributed system technology, in particular to a distributed system deployment method, configuration method, system, equipment and media.
  • one of the servers is usually set as the master server to provide services; the other server is set as the slave server, which is only responsible for data backup.
  • the deployment method of dual-machine hot standby system may frequently switch the master and slave servers when the business volume requests are large, affecting the performance of the system; in addition, data is prone to be lost during backup, and the master and slave servers cannot be guaranteed. Data consistency. Therefore, how to ensure system performance and data consistency when facing a large number of business requests is an issue that needs to be discussed urgently.
  • Embodiments of the present application provide a distributed system deployment method, configuration method, system, equipment and media.
  • inventions of the present application provide a distributed system deployment method.
  • the distributed system includes a first server and a second server.
  • the method includes: deploying a first node and a first node in the first server. Load balancer; deploy a second node and a second load balancer in the second server; wherein the first node and the second node are communicatively connected, and the first node and the second
  • the node confirms the node identity through the distributed consistency Raft protocol.
  • the first node is communicated with the first load balancer and the second load balancer respectively; the second node is respectively connected with the first load balancer.
  • the first load balancer and the second load balancer are both configured to distribute service requests according to load conditions.
  • embodiments of the present application provide a distributed system configuration method, which is applied to a distributed system obtained according to the distributed system deployment method described in the first aspect.
  • the method includes: combining the third node and The initial state of the four nodes is set to an unstarted state.
  • embodiments of the present application provide a distributed system, including: a first server, the first server is provided with a first load balancer and a first node; a second server, the second server is provided with a first load balancer and a first node.
  • Two load balancers and a second node wherein the first node and the second node are connected by communication, and the first node and the second node confirm node identities through the distributed consistency Raft protocol, so
  • the first node is communicatively connected to the first load balancer and the second load balancer respectively; the second node is communicatively connected to the first load balancer and the second load balancer respectively; the The first load balancer and the second load balancer are both configured to distribute service requests according to load conditions.
  • embodiments of the present application provide an electronic device, including: a memory, a processor, and data stored on the memory. and a computer program that can be run on a processor. When the processor executes the computer program, the distributed system deployment method in the first aspect or the distributed system configuration method in the second aspect is implemented.
  • embodiments of the present application provide a computer-readable storage medium that stores computer-executable instructions.
  • the computer can execute the computer program, the distributed implementation in the first aspect is realized.
  • Figure 1 is a distributed system architecture diagram provided by an embodiment of the present application.
  • Figure 2 is a distributed system architecture diagram provided by another embodiment of the present application.
  • Figure 3 is a work flow chart of the main regulator provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of the operating state of a Raft cluster node provided by an embodiment of the present application.
  • Figure 5 is a flow chart of a distributed system deployment method provided by an embodiment of the present application.
  • Figure 6 is a system architecture diagram of a unified network management platform provided by an embodiment of the present application.
  • Figure 7 is a system architecture diagram of a distributed database provided by an embodiment of the present application.
  • Figure 8 is a system architecture diagram of a business support system provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • words such as setting, installation, and connection should be understood in a broad sense. Those skilled in the art can reasonably determine the meaning of the above words in the embodiments of this application based on the content of the technical solution. .
  • words such as “further”, “exemplarily” or “optionally” are used as examples, illustrations or illustrations, and should not be interpreted as being more preferable or better than other embodiments or designs. Advantages. The use of the words “further,” “exemplarily,” or “optionally” is intended to present relevant concepts.
  • the embodiments of this application can be applied to devices such as servers.
  • the embodiments of this application are not specifically limited.
  • the common practice is to use a dual-machine hot standby method for system deployment.
  • This deployment method uses one of the machines to provide external services, that is, as the host, and the other The machine is only used for data backup, that is, as a slave machine.
  • This deployment method will have a great impact on performance when the volume of business requests is large, and the availability of the software will be greatly restricted. More seriously, it will lead to frequent master-slave switching between the two machines; in addition, data may appear during backup. In the case of data loss, especially when master-slave switching occurs frequently, data consistency cannot be effectively guaranteed.
  • the embodiments of this application provide a distributed system deployment method, configuration method, system, equipment and media.
  • distributed consistency protocol Raft
  • a Raft cluster is formed, and more detailed operations are performed through the Raft protocol.
  • Granular master-slave node switching, service provision and data backup ensure the consistency of dual-machine data; at the same time, through the load balancer, service requests are allocated to different nodes for processing according to load to improve service efficiency.
  • Figure 1 is a distributed system architecture diagram provided by an embodiment of the present application.
  • the distributed system architecture may include but is not limited to: a first server 100, a first load balancer 110, a first node 120, a second server 200, a second load balancer 210, and a second node 220. .
  • the first server 100 is provided with a first load balancer 110 and a first node 120, where the first node 120 is a Raft node;
  • the second server 200 is provided with a second load balancer 210 and a second node 220, where, The second node 220 is a Raft node.
  • the first node 120 and the second node 220 form a Raft cluster, in which the first node 120 and the second node 220 are connected by communication, and the node identities are confirmed between the first node 120 and the second node 220 through the Raft protocol.
  • 120 is communicatively connected to the first load balancer 110 and the second load balancer 210 respectively;
  • the second node 220 is communicatively connected to the first load balancer 110 and the second load balancer 210 respectively;
  • the first load balancer 110 and the second load balancer 210 are respectively communicatively connected.
  • the load balancers 210 are configured to distribute service requests according to load conditions. Among them, the first node 120 is bound to the IP1 address, the second node 220 is bound to the IP2 address, and the first load balancer 110 and the second load balancer 210 receive service requests through the same service request interface.
  • the first server 100 and the second server 200 run at the same time.
  • the leader node is selected from the first node 120 and the second node 220, and the other nodes serve as crowd nodes; after receiving When a service request is made, under normal circumstances, the leader node handles the service request, and the crowd nodes are backed up synchronously.
  • the write requests in the service requests are processed by the leader node, and the crowd nodes are synchronously backed up; since each node in the Raft cluster based on the Raft protocol has good data consistency, the read requests in the service requests are
  • the first load balancer 110 and/or the second load balancer 210 may allocate read requests to nodes with lower loads for processing according to the load conditions of each node, thereby improving server load utilization and improving service request processing efficiency.
  • the first node 120 is a leader node and the second node 220 is a mass node.
  • the first load balancer 110 allocates service requests to the first node 120 for processing, and the second node 220 synchronizes backup; when the service request When the load of the first node 120 exceeds the set threshold, the second load balancer 210 allocates the read requests in the service requests to the second node 220 for processing.
  • the first node 120 mainly processes the write requests in the service requests. By combining the Raft protocol with the load balancer, different service requests can be processed on different nodes at the same time, improving the load utilization of each node and improving the efficiency of service request processing.
  • the system when the system receives a service request, it may be processed by the first load balancer 110 first.
  • the first load balancer 110 allocates the service request according to the load conditions of the first node 120 and the second node 220; when When the service request processing volume of the first load balancer 110 reaches the threshold, the service request can be distributed to the second load balancer 210 for processing.
  • the second load balancer 210 allocates the service request according to the load conditions of the first node 120 and the second node 220.
  • Service request It is conceivable that the second load balancer 210 may also distribute the service request first.
  • the system when the system receives a service request, it can be evenly distributed to the first load balancer 110 and the second load balancer 210 for distribution.
  • the first load balancer 110 and the second load balancer 210 are the same load balancer and are deployed in the first server 100 and the second server 200 respectively.
  • Raft nodes and load balancers are deployed in two servers respectively to form a Raft cluster, so that operations such as processing service requests, master-slave switching, and data backup can be performed on the two servers. While both machines are still working, processing continues with finer-grained nodes, avoiding the situation where the traditional dual-machine hot standby system performs master-slave switching on a server-by-server basis, improving system stability and data consistency of the two machines; at the same time, through load balancing The server will serve you The request is allocated to different nodes for processing according to the load of each node, which improves the node load utilization and improves the efficiency of service request processing.
  • Figure 2 is a distributed system architecture diagram provided by another embodiment of the present application.
  • the distributed system architecture may include but is not limited to: a first server 300, a first load balancer 310, a first node 320, a third node 330, a first regulator 340, a second server 400, The second load balancer 410, the second node 420, the fourth node 430, and the second regulator 440.
  • the first server 300 is deployed with a first load balancer 310, a first node 320, a third node 330 and a first regulator 340, where the first node 320 and the third node 330 are both Raft nodes, and the first regulator 340 is configured to control the third node 330.
  • the second server 400 is deployed with a second load balancer 410, a second node 420, a fourth node 430 and a second regulator 440.
  • the second node 420 and the fourth node 430 are both Raft nodes.
  • the second regulator 440 is configured to control the fourth node 430.
  • the first node 320, the third node 330, the second node 420, and the fourth node 430 form a Raft cluster.
  • the first node 320, the third node 330, the second node 420 and the fourth node 430 are connected in communication.
  • the first node 320 confirms the node identity with the third node 330, the second node 420 and the fourth node 430 respectively through the Raft protocol.
  • the first node 320 is communicatively connected to the first load balancer 310 and the second load balancer 410 respectively;
  • the second node 420 is communicatively connected to the third node 330 and the fourth node 430 respectively, and the second node 420 is respectively connected to the first load balancer.
  • the balancer 310 and the second load balancer 410 are communicatively connected; the third node 330 is communicatively connected with the first node 320 and the second node 420 respectively, the third node 330 is communicatively connected with the first regulator 340; the fourth node 430 is respectively communicatively connected with The first node 320 and the second node 420 are communicatively connected, and the fourth node 430 is communicatively connected with the second regulator 440 .
  • the first node 320 is bound to the IP1 address
  • the second node 420 is bound to the IP2 address
  • the first load balancer 310 and the second load balancer 410 receive service requests through the same service request interface
  • the third node 330 and the fourth Node 430 is bound to the same IP3 address. It can be understood that the third node 330 and the fourth node 430 are the same node.
  • the first server 300 and the second server 400 run at the same time, and the third node 330 and the fourth node 430 are set as silent nodes, that is, the third node 330 and the fourth node 430 is turned off by default; based on the Raft protocol, the leader node is selected from the first node 320 and the second node 420, and the other nodes serve as mass nodes; when a service request is received, under normal circumstances, the leader node processes the service request. Synchronous backup of crowd nodes.
  • the write requests in the service requests are processed by the leader node, and the crowd nodes are synchronously backed up; for the read requests in the service requests, the first load balancer 310 and/or the second load balancer 410 According to the load of each node, the read request is assigned to the node with lower load for processing.
  • the first regulator 340 and the second regulator 440 determine one regulator as the master regulator and the other as the slave regulator. Only the master regulator operates at the same time, and the slave regulator only listens to the heartbeat sent by the master regulator. Information; In some embodiments, the first regulator 340 is the master regulator and the second regulator 440 is the slave regulator. At the same time, only the first regulator 340 operates, and the second regulator 440 only listens to the second regulator.
  • the heartbeat information sent by one regulator 340 when the slave regulator does not receive the heartbeat information from the master regulator within the preset time, it will automatically perform master-slave switching.
  • the second regulator 440 when the second regulator 440 receives the heartbeat information from the master regulator within the preset time, When the heartbeat information from the first regulator 340 is not received, the second regulator 440 switches to the master regulator, and the first regulator 340 switches to the slave regulator.
  • the master regulator is responsible for binding or activating the IP3 address. Therefore, when the regulator performs a master-slave switch, the IP3 address will also switch with the master regulator, that is, the original master regulator switches to a new slave regulator. , the original slave regulator is switched to the new master regulator, and the IP3 address is bound to the new master regulator again.
  • the main regulator is responsible for monitoring the health status of the Raft cluster.
  • the main regulator detects that a node has failed, it tries to repair the failed node; if it cannot be repaired, The main regulator activates the silent nodes of the server where the non-faulty node is located, so that the non-faulty nodes, the activated silent nodes and the faulty nodes form a 2/3 Raft cluster, ensuring that more than half of the Raft nodes in the Raft cluster are used to maintain the normal availability of the system.
  • the first regulator 340 is the master regulator and the second regulator 440 is the slave regulator.
  • the first regulator 340 detects that the second node 420 fails and cannot be repaired, the first regulator 340 Activate the third node 330 as a mass node, so that the first node 320 (non-faulty node), the second node 420 (faulty node) and the third node 330 (activated silent node) form 2/3Raft
  • the cluster ensures that more than half of the Raft nodes in the Raft cluster maintain the normal availability of the system.
  • the first regulator 340 is the master regulator and the second regulator 440 is the slave regulator.
  • the first regulator 340 detects that the first node 320 fails and cannot be repaired, the first regulator 340 Switching to the slave regulator, the second regulator 440 switches to the master regulator; the second regulator 440 activates the fourth node 430, so that the first node 320 (failure node), the second node 420 (non-failure node) and the Four nodes 430 (activated silent nodes) form a 2/3 Raft cluster, as shown in Figure 4, ensuring that more than half of the Raft nodes in the Raft cluster maintain the normal availability of the system.
  • master-slave switching is performed between the first regulator 340 and the second regulator 440 through heartbeat monitoring in order to ensure that the regulator can work normally even after the server or regulator on one side fails.
  • the cluster monitors; therefore, if the second regulator 440 does not receive the heartbeat information of the first regulator 340 within a preset time before the first regulator 340 detects that the first node 320 fails and cannot be repaired, the second regulator 340 After the first regulator 340 and the second regulator 440 perform master-slave switching, the second regulator 440 as the master regulator detects that the first node 320 fails and cannot be repaired, and activates the fourth node 430.
  • 430 serves as a mass node.
  • the first regulator 340 or the second regulator 440 monitors the Raft cluster.
  • the corresponding third node 330 or the fourth node 430 is activated (wherein, when the first node 320 fails, the corresponding fourth node 430 is activated; when the second node 420 fails, the corresponding third node 330 is activated) , thereby ensuring the normal operation of the Raft cluster and quickly restoring cluster services.
  • the initial state of one of the third node 330 and the fourth node 430 is set to the silent state, and the initial state of the other is set to the active state. In some embodiments, the initial state of the third node 330 is set to the silent state, and the initial state of the fourth node 430 is set to the active state.
  • the Raft cluster is a 3/3 cluster (the first node 320, the second node 420 , the fourth node 430 (three Raft nodes), as shown in Figure 4; correspondingly, the second regulator 440 is the master regulator, and the first regulator 340 is the slave regulator; when the first node 320 fails, the The status of the third node 330 and the fourth node 430 does not need to be changed, the regulator master-slave relationship does not switch, and the system still works normally; when the second node 420 fails, the first regulator 340 switches to the master regulator and activates the third regulator. At node 330, the second regulator 440 turns off the fourth node 430 and switches to a slave regulator to maintain normal operation of the system.
  • the third node 330 and the fourth node 430 only support basic functions such as voting and heartbeating, and do not perform data storage, thereby saving server resources.
  • the third node 330 and the fourth node 430 support data storage.
  • the activated node of the third node 330 or the fourth node 430 communicates with the first responsible balancer and the second load balancer 410 Connection; the activated third node 330 or the fourth node 430 also assumes the functions of data synchronization backup and processing read requests of service requests.
  • multiple third nodes 330 and fourth nodes 430 can be deployed, but the third nodes 330 and the fourth nodes 430 need to correspond one to one.
  • the first regulator 340 may deploy a first regulator 340 controls multiple third nodes 330; the first regulator can also be deployed with multiple first regulators 340 and third nodes 330 in one-to-one correspondence, and one first regulator 340 controls one third node 330; the second regulator 440 Same reason.
  • the master regulator monitors the Raft cluster through the following methods:
  • the main regulator periodically checks the Raft process to detect whether there is an abnormality in the node.
  • the main regulator regularly sends service requests to the Raft cluster and writes detection data to the Raft cluster.
  • the detection data can be a simple identification code, which is not specifically limited here; the main regulator then sends service requests to Raft, and from Raft
  • the previously written detection data is read from each node in the cluster; if the detection data cannot be read from a node, the node is considered to be faulty.
  • FIG. 5 is a flow chart of a distributed system deployment method provided by an embodiment of the present application. As shown in Figure 5, this distributed system deployment method can be used for servers, dual-machine architecture systems, etc. In the embodiment of FIG. 5 , the distributed system deployment method is applied to a distributed system.
  • the distributed system includes a first server and a second server.
  • the distributed system deployment method may include but is not limited to step S1000 and step S2000.
  • Step S1000 Deploy the first node and the first load balancer in the first server.
  • Step S2000 Deploy the second node and the second load balancer in the second server.
  • the first node and the second node are communicated and connected, the first node and the second node confirm the node identity through the distributed consistency Raft protocol, and the first node is communicated and connected with the first load balancer and the second load balancer respectively. ;
  • the second node is communicated and connected with the first load balancer and the second load balancer respectively; the first load balancer and the second load balancer are both configured to distribute service requests according to load conditions.
  • the distributed system deployment method further includes: deploying at least one third node in the first server; deploying at least one fourth node corresponding to the third node in the second server; wherein, the third node The third node communicates with the first node and the second node respectively, and the node identity is confirmed through the Raft protocol between the third node and the first node and the second node; the fourth node communicates with the first node and the second node respectively, and the fourth node The node identity is confirmed through the Raft protocol with the first node and the second node; the third node and the fourth node are configured with the same IP address.
  • multiple third nodes and fourth nodes may be deployed.
  • the third nodes and the fourth nodes must correspond one to one, and the number of deployments must be in the order of "one.” "The third node and the fourth node" are increased or decreased in units.
  • Each pair of the third node and the fourth node is configured with the same IP address. The IP addresses between different third nodes are different, and the IP addresses between different fourth nodes are different. The IP addresses are different.
  • the third node when the third node and the fourth node are configured with storage functions, the third node is communicatively connected to the first load balancer and the second load balancer respectively, so that the third node can obtain the first load
  • the service request distributed by the balancer and/or the second load balancer the fourth node is communicatively connected to the first load balancer and the second load balancer respectively, so that the fourth node can obtain the first load balancer and/or the second load balancer. Service requests distributed by the load balancer.
  • the distributed system deployment method further includes: deploying at least one first regulator corresponding to the third node in the first server; deploying at least one second regulator corresponding to the fourth node in the second server. Regulator; wherein, the first regulator and the second regulator are communicatively connected, the first regulator is configured to monitor the health status of the distributed system and manage the third node, and the second regulator is configured to monitor the health of the distributed system. Status and management of the fourth node.
  • only one first regulator can be deployed to control multiple third nodes; multiple first regulators can also be deployed, with the first regulators corresponding to the third nodes one-to-one; different numbers can also be deployed
  • the first regulator and the third node establish a one-to-one and/or one-to-many control relationship, which is not specifically limited here; the deployment of the second regulator is the same.
  • the initial state of the third node and the fourth node defaults to an unstarted state.
  • An embodiment of the present application provides a distributed system configuration method, which is applied to the distributed system obtained according to the distributed system deployment method described in any of the above embodiments, or the distributed system configuration method described in any of the above embodiments.
  • Distributed Systems The distributed system configuration method at least includes but is not limited to the following steps:
  • the first regulator is the default master regulator
  • the second regulator is the default slave regulator
  • the distributed system configuration method includes: regularly sending a heartbeat message to the second regulator through the first regulator; If the second regulator does not receive the heartbeat message within the preset time, it is confirmed that the second regulator is the master regulator and the first regulator is the slave regulator.
  • the distributed system configuration method further includes: binding the master regulator to the IP address of the third node and/or the fourth node; and performing master-slave switching between the master regulator and the slave regulator. Next, bind the IP address to the new primary regulator. It can be understood that when the master regulator and the slave regulator perform master-slave switching, the master regulator switches to the new slave regulator, the slave regulator switches to the new master regulator, and then the above-mentioned IP address is switched to the new master regulator.
  • the master regulator is bound; in some embodiments, the first regulator is the master regulator, and the second regulator is the slave regulator. After the master-slave switching occurs, the first regulator switches to the slave regulator, and the second regulator switch to the main regulator. It can be understood that when the primary regulator is the first regulator, the IP address of the primary regulator is bound to the third node; when the primary regulator is the second regulator, the IP address of the primary regulator is bound to the fourth node. Binding.
  • the distributed system configuration method further includes: monitoring the first node and the second node through a main regulator; when the main regulator detects an abnormality in the first node, the main regulator monitors the first node. Repair is performed and the repair result is obtained; when the repair result is a repair failure, the fourth node is activated through the main regulator, so that the initial state of the fourth node is changed from the unstarted state to the started state.
  • the distributed system configuration method further includes: monitoring the first node and the second node through the main regulator; when the main regulator finds that the second node is abnormal, the main regulator performs the following steps on the second node: Repair to obtain the repair result; when the repair result is repair failure, activate the third node through the main regulator, so that the initial state of the third node changes from the unstarted state to the started state.
  • the implementation process of the distributed system configuration method is the same as the relevant process in the distributed system described in the above embodiment, and will not be further described here.
  • Figure 6 is a system architecture diagram of a unified network management platform provided by an embodiment of the present application.
  • An operator in city A has a unified network management platform that adopts dual-machine hot standby deployment. As the business expands, more and more network elements are connected. Frequent performance and alarm reporting during peak business periods lead to frequent active and standby switchovers. , causing the network management platform to be often unavailable.
  • the unified network management service platform is transformed according to the raft protocol to obtain unified network management service platform A and unified network management service platform B.
  • Figure 7 is a system architecture diagram of a distributed database provided by an embodiment of the present application.
  • kubernetes is a new distributed architecture solution based on container technology and an open source container cluster management system
  • Implementation environment description The operator provides two servers with dual network cards. The environment deploys kubernetes and supports the application container engine (Docker) service. As shown in Figure 7.
  • Step 1 Configure the network cards of two servers A and B.
  • the network card 1 of server A is bound to IP1
  • the network card 1 of server B is bound to IP2
  • the network card 2 of server A and the network card 2 of server B are both bound to IP3, where IP3 and the corresponding network card are turned off.
  • Step 2 Deploy two sets of ETCD microservices on two servers respectively.
  • ETCD is a highly available key-value storage system, mainly used for shared key-value warehouse and service discovery.
  • Step 3 Deploy regulators on server A and server B respectively. After startup, the regulator of a random server is set as the primary regulator (assuming that the regulator of server A is the primary regulator), the primary regulator will be on the local machine. The network card corresponding to IP3 is set to the startup state.
  • Step 4 Start IP1 and the corresponding ETCD microservice A and IP2 and the corresponding ETCD microservice B to form 2/3 of the raft cluster.
  • Step 5 Deploy the kubelet service.
  • the load balancer used by kubelet to access the ETCD cluster is an internal component of kubelet, so there is no need to deploy an additional load balancer.
  • kubelet is the main node agent, which monitors the pods assigned to the nodes; among them, Pod is the smallest unit that can be created and managed in the kubernetes system, and is the smallest resource object model created or deployed by the user in the resource object model. It is also a resource object for running containerized applications on kubernetes.
  • Step 6 When server B fails, the main regulator will start the backup ETCD microservice bound to IP3. At this time, 2/3 of the raft cluster can still maintain normal provision of cloud platform services.
  • Step 7 When server A fails, the main regulator will be migrated to server B, that is, the regulator in server B switches to the main regulator. At this time, the network card 2 of server A is set to the off state, and the network card 2 of server B is switched off. Set it to the startup state, and start the backup ETCD microservice of server B bound to IP3 at the same time. At this time, 2/3 of the normal raft cluster services are provided.
  • Figure 8 is a system architecture diagram of a business support system provided by an embodiment of the present application.
  • a telecom operator deployed a lightweight Business Support System (BSS) with two sets of servers in a certain location.
  • BSS Business Support System
  • the system was required to provide load balancing to improve service efficiency. Once a single server failed In the event of an exception, the business can be quickly restored and data consistency guaranteed.
  • Implementation environment description Both servers have a single network card and only deploy a business support platform to provide a single service. Its implementation is shown in Figure 8.
  • Step 1 Implement the business support system BSS based on the raft protocol.
  • Step 2 Configure the network cards of two servers A and B. Bind the network card 1 of server A to IP1, and bind the network card 1 of server B to IP2. Both server A and server B reserve an unused IP3.
  • Step 3 Deploy two sets of BSS on two servers: deploy the first BSS in server A and bind it to IP1, deploy the backup BSS and bind it to IP3; deploy the second BSS in server B and bind it to For IP2, deploy a backup BSS and bind to IP3.
  • Step 4 Deploy regulators on two servers respectively. After startup, the regulator of a random server will be the main server (assuming that the regulator of server A is the main regulator). The main regulator will bind IP3 to the same server (server A) on network card 1.
  • Step 5 Deploy a load balancer on the two servers, connect to the BSS cluster, and set the service request interface.
  • the load balancer dynamically distributes the service request based on the load conditions of the two servers.
  • Step 6 Start IP1 and the corresponding first BSS service and IP2 and the corresponding second BSS service to form a 2/3 raft cluster.
  • Step 7 When there is a problem with server B, the main regulator will pull up the backup BSS service with IP3 of the same server (server A). At this time, 2/3 of the raft cluster can still be provided and normal services can be provided.
  • Step 8 When there is a problem with server A, the main regulator will be migrated to server B, that is, the regulator of server B switches to the primary regulator. At this time, unbind IP3 on the network card 1 of server A and bind it to the network card 1 of server B. Set IP3, and start the backup BSS service of server B bound to IP3. At this time, 2/3 of the normal raft cluster is provided.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 1000 includes a memory 1100 and a processor 1200 .
  • the number of memories 1100 and processors 1200 can be one or more.
  • one memory 1100 and one processor 1200 are taken as an example.
  • the memory 1100 and processor 1200 in the device can be connected through a bus or other means.
  • Figure 9 Take the example of connecting via a bus.
  • the memory 1100 can be used to store software programs, computer executable programs and modules, such as program instructions/modules corresponding to the distributed system deployment method or distributed system configuration method provided in any embodiment of the present application.
  • the processor 1200 implements the above distributed system deployment method or distributed system configuration method by running software programs, instructions and modules stored in the memory 1110 .
  • the memory 1100 may mainly include a program storage area and a data storage area, where the program storage area may store an operating system and at least one application program required for a function.
  • the memory 1100 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device.
  • memory 1100 may also include memory located remotely relative to processor 1200, and these remote memories may be connected to the device through a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.
  • An embodiment of the present application also provides a computer-readable storage medium that stores computer-executable instructions.
  • the computer-executable instructions are used to execute the distributed system deployment method or distributed system configuration provided in any embodiment of the present application. method.
  • the embodiments of this application form a Raft cluster by deploying Raft nodes and load balancers in two servers respectively, so that operations such as processing service requests, master-slave switching, and data backup can be updated while both servers remain working.
  • Fine-grained node processing avoids the master-slave switching of traditional dual-machine hot standby systems on a server-by-server basis, improving system stability and data consistency of dual-machine systems; at the same time, the load balancer distributes service requests according to the load of each node. The situation is assigned to different nodes for processing, which improves node load utilization and improves service request processing efficiency.
  • the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may consist of several physical components. Components execute cooperatively. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable media.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
  • a component may be, but is not limited to, a process, processor, object, executable file, thread of execution, program or computer running on a processor.
  • applications running on the computing device and the computing device may be components.
  • One or more components can reside in a process or thread of execution, and the component can be localized on one computer or distributed between 2 or more computers. Additionally, these components can execute from various computer-readable media having various data structures stored thereon.
  • a component may, for example, be based on a signal having one or more data packets (eg, data from two components interacting with another component, such as a local system, a distributed system, or a network, such as the Internet, which interacts with other systems via signals) Communicate through local or remote processes.
  • data packets eg, data from two components interacting with another component, such as a local system, a distributed system, or a network, such as the Internet, which interacts with other systems via signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Hardware Redundancy (AREA)

Abstract

本申请实施例提供了一种分布式***部署方法、配置方法、***、设备及介质,该方法包括:在第一服务器中部署第一节点和第一负载均衡器(S1000);在第二服务器中部署第二节点和第二负载均衡器(S2000);其中,第一节点与第二节点之间通信连接,第一节点和第二节点通过分布式一致性Raft协议确认节点身份,第一节点分别与第一负载均衡器和第二负载均衡器通信连接;第二节点分别与第一负载均衡器和第二负载均衡器通信连接;第一负载均衡器和第二均衡负载器均被设置为根据负载情况分发服务请求。

Description

分布式***部署方法、配置方法、***、设备及介质
相关申请的交叉引用
本申请基于申请号为202211084128.1、申请日为2022年09月06日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及分布式***技术领域,尤其是一种分布式***部署方法、配置方法、***、设备及介质。
背景技术
在双机热备***中,通常将其中一台服务器设置为主服务器,用于提供服务;将另一台服务器设置为从服务器,仅用于负责数据备份。
相关技术中,双机热备***的部署方式在业务量请求较大时,可能会频繁切换主从服务器,影响***的性能;另外数据在备份时容易出现丢失的情况,无法保证主从服务器中数据的一致性。因此,如何在面临大量业务请求时,保证***性能与数据一致性,是当下亟待讨论的问题。
发明内容
本申请实施例提供一种分布式***部署方法、配置方法、***、设备及介质。
第一方面,本申请实施例提供一种分布式***部署方法,所述分布式***包括第一服务器和第二服务器,所述方法包括:在所述第一服务器中部署第一节点和第一负载均衡器;在所述第二服务器中部署第二节点和第二负载均衡器;其中,所述第一节点与所述第二节点之间通信连接,所述第一节点和所述第二节点通过分布式一致性Raft协议确认节点身份,所述第一节点分别与所述第一负载均衡器和所述第二负载均衡器通信连接;所述第二节点分别与所述第一负载均衡器和所述第二负载均衡器通信连接;所述第一负载均衡器和所述第二均衡负载器均被设置为根据负载情况分发服务请求。
第二方面,本申请实施例提供一种分布式***配置方法,应用于根据如第一方面所述的分布式***部署方法得到的分布式***,所述方法包括:将所述第三节点和所述四节点的初始状态设置为未启动状态。
第三方面,本申请实施例提供一种分布式***,包括:第一服务器,所述第一服务器设置有第一负载均衡器以及第一节点;第二服务器,所述第二服务器设置有第二负载均衡器以及第二节点;其中,所述第一节点与所述第二节点之间通信连接,所述第一节点和所述第二节点通过分布式一致性Raft协议确认节点身份,所述第一节点分别与所述第一负载均衡器和所述第二负载均衡器通信连接;所述第二节点分别与所述第一负载均衡器和所述第二负载均衡器通信连接;所述第一负载均衡器和所述第二均衡负载器均被设置为根据负载情况分发服务请求。
第四方面,本申请实施例提供一种电子设备,包括:存储器、处理器及存储在存储器上 并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如第一方面中的分布式***部署方法或如第二方面中的分布式***配置方法。
第五方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,所述计算机可执行所述计算机程序时实现如第一方面中的分布式***部署方法或如第二方面中的分布式***配置方法。
附图说明
图1为本申请一实施例提供的分布式***架构图;
图2为本申请另一实施例提供的分布式***架构图;
图3为本申请一实施例提供的主调节器的工作流程图;
图4为本申请一实施例提供的Raft集群节点运行态的示意图;
图5为本申请一实施例提供的分布式***部署方法的流程图;
图6为本申请一实施例提供的统一网管平台的***架构图;
图7为本申请一实施例提供的分布式数据库的***架构图;
图8为本申请一实施例提供的业务支撑***的***架构图;
图9为本申请一实施例提供的一种电子设备结构示意图。
具体实施方式
为了使本申请的目的、技术方法及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的实施例仅用以解释本申请,并不用于限定本申请。
需要说明的是,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
本申请实施例的描述中,除非另有明确的限定,设置、安装、连接等词语应做广义理解,所属技术领域技术人员可以结合技术方案的内容合理确定上述词语在本申请实施例中的含义。本申请实施例中,“进一步地”、“示例性地”或者“可选地”等词用于表示作为例子、例证或说明,不应被解释为比其它实施例或设计方案更优选或更具有优势。使用“进一步地”、“示例性地”或者“可选地”等词旨在呈现相关概念。
本申请实施例可以应用于服务器等设备上。本申请实施例并不具体限定。
在双机服务器架构的场景中,为提高软件的可用性,常规做法是采用双机热备方式进行***部署,该部署方式利用其中的一台机器用于对外提供服务,即作为主机,另外一台机器仅用于做数据备份,即作为从机。这种部署方式在业务请求量较大时,性能会受到很大的影响,软件的可用性会受到很大限制,更严重的会导致双机频繁进行主从切换;另外数据在备份时可能会出现数据丢失的情况,尤其是当频繁地进行主从机切换时,数据的一致性无法得到有效的保证。
本申请实施例提供了一种分布式***部署方法、配置方法、***、设备及介质,通过在双机***中部署分布式一致性协议(Raft)节点,组成Raft集群,通过Raft协议进行更细粒度的主从节点切换、服务提供和数据备份,从而保证双机数据的一致性;同时通过负载均衡器,根据负载将服务请求分配至不同节点进行处理,提高服务效率。
下面结合附图,对本申请实施例作进一步阐述。
图1是本申请一实施例提供的分布式***架构图。如图1所示,该分布式***架构可以包括但不限于:第一服务器100、第一负载均衡器110、第一节点120、第二服务器200、第二负载均衡器210、第二节点220。
第一服务器100中设置有第一负载均衡器110和第一节点120,其中,第一节点120为Raft节点;第二服务器200中设置有第二负载均衡器210和第二节点220,其中,第二节点220为Raft节点。
第一节点120及第二节点220组成Raft集群,其中,第一节点120与第二节点220之间通信连接,第一节点120与第二节点220之间通过Raft协议确认节点身份,第一节点120分别与第一负载均衡器110和第二负载均衡器210通信连接;第二节点220分别与第一负载均衡器110和第二负载均衡器210通信连接;第一负载均衡器110和第二均衡负载器210均被设置为根据负载情况分发服务请求。其中,第一节点120绑定I P1地址,第二节点220绑定I P2地址,第一负载均衡器110和第二负载均衡器210通过同一个服务请求接口接收服务请求。
该分布式***在工作时,第一服务器100和第二服务器200同时运行,基于Raft协议,从第一节点120和第二节点220中选出领导节点,另外的节点作为群众节点;在接收到服务请求时,在正常情况下,由领导节点处理服务请求,群众节点同步备份。在服务请求量较大时,对于服务请求中的写请求,由领导节点处理,群众节点同步备份;由于基于Raft协议的Raft集群中的各节点具有良好的数据一致性,对于服务请求中的读请求,可以由第一负载均衡器110和/或第二负载均衡器210根据各节点的负载情况将读请求分配给负载较低的节点进行处理,从而提高服务器负载利用率,提升服务请求处理效率。
在一些实施方式中,如第一节点120为领导节点,第二节点220为群众节点,第一负载均衡器110将服务请求分配给第一节点120处理,第二节点220同步备份;当服务请求量较多,第一节点120负载超过设定阈值时,第二负载均衡器210将服务请求中的读请求分配给第二节点220进行处理,第一节点120主要处理服务请求中的写请求。通过Raft协议与负载均衡器结合,可以同时在不同节点处理不同的服务请求,提升各节点的负载利用率,提高了服务请求处理效率。
在一些实施方式中,当***收到服务请求时,可以优先由第一负载均衡器110处理,第一负载均衡器110根据第一节点120和第二节点220的负载情况,分配服务请求;当第一负载均衡器110的服务请求处理量达到阈值时,可以将服务请求分发至第二负载均衡器210处理,第二负载均衡器210根据第一节点120和第二节点220的负载情况,分配服务请求;可以想到的是,也可以优先由第二负载均衡器210进行服务请求的分发。
在一些实施方式中,当***收到服务请求时,可以平均分发给第一负载均衡器110和第二负载均衡器210进行分配。
在一些实施方式中,第一负载均衡器110和第二负载均衡器210为同一个负载均衡器分别部署在第一服务器100和第二服务器200中。
本实施例方案,通过在双机服务器***的技术上,在两个服务器中分别部署Raft节点和负载均衡器,组成Raft集群,使得处理服务请求、主从切换和数据备份等操作在两个服务器均保持工作的情况下以更细粒度的节点继续处理,避免了传统双机热备***以服务器为单位进行主从切换的情况,提升***稳定性和双机的数据一致性;同时通过负载均衡器将服务请 求根据各节点负载情况分配至不同节点进行处理,提高了节点负载利用率,提升了服务请求处理效率。
图2是本申请另一实施例提供的分布式***架构图。如图2所示,该分布式***架构可以包括但不限于:第一服务器300、第一负载均衡器310、第一节点320、第三节点330、第一调节器340、第二服务器400、第二负载均衡器410、第二节点420、第四节点430、第二调节器440。
第一服务器300中部署有第一负载均衡器310、第一节点320、第三节点330以及第一调节器340,其中,第一节点320和第三节点330均为Raft节点,第一调节器340被设置为控制第三节点330。
第二服务器400中部署有第二负载均衡器410、第二节点420、第四节点430以及第二调节器440,其中,第二节点420和第四节点430均为Raft节点,第二调节器440被设置为控制第四节点430。
第一节点320、第三节点330、第二节点420、第四节点430组成Raft集群。其中,第一节点320、第三节点330、第二节点420以及第四节点430通信连接,第一节点320分别与第三节点330、第二节点420以及第四节点430通过Raft协议确认节点身份,第一节点320分别与第一负载均衡器310、第二负载均衡器410通信连接;第二节点420分别与第三节点330以及第四节点430通信连接,第二节点420分别与第一负载均衡器310、第二负载均衡器410通信连接;第三节点330分别与第一节点320、第二节点420通信连接,第三节点330与第一调节器340通信连接;第四节点430分别与第一节点320、第二节点420通信连接,第四节点430与第二调节器440通信连接。
其中,第一节点320绑定IP1地址,第二节点420绑定IP2地址,第一负载均衡器310和第二负载均衡器410通过同一个服务请求接口接收服务请求,第三节点330和第四节点430绑定同一个IP3地址。可以理解的是,第三节点330和第四节点430为相同的节点。
在一些实施方式中,该分布式***在工作时,第一服务器300和第二服务器400同时运行,将第三节点330和第四节点430设置为静默节点,即第三节点330和第四节点430默认关闭;基于Raft协议,从第一节点320和第二节点420中选出领导节点,另外的节点作为群众节点;在接收到服务请求时,在正常情况下,由领导节点处理服务请求,群众节点同步备份。在服务请求量较大时,对于服务请求中的写请求,由领导节点处理,群众节点同步备份;对于服务请求中的读请求,由第一负载均衡器310和/或第二负载均衡器410根据各节点的负载情况将读请求分配给负载较低的节点进行处理。
从第一调节器340和第二调节器440中,确定其中一个调节器为主调节器,另外一个为从调节器,同一时刻只有主调节器运作,从调节器仅监听主调节器发送的心跳信息;在一些实施方式中,第一调节器340为主调节器、第二调节器440为从调节器,则在同一时刻中,只有第一调节器340运作,第二调节器440仅监听第一调节器340发送的心跳信息;当从调节器在预设时间内没有收到主调节器的心跳信息,则自动做主从切换,在一些实施方式中,当第二调节器440在预设时间内没有收到第一调节器340的心跳信息时,则第二调节器440切换为主调节器,第一调节器340切换为从调节器。在一些实施方式中,主调节器负责绑定或激活IP3地址,因此当调节器进行主从切换时,IP3地址也会随着主调节器做切换,即原主调节器切换为新的从调节器,原从调节器切换为新的主调节器,IP3地址重新与新的主调节器进行绑定。
如图3所示,在分布式***处理服务请求的过程中,主调节器负责监听Raft集群的健康状态,当主调节器监听到某一节点发生故障时,尝试修复该故障节点;若无法修复,主调节器激活未故障节点所在服务器的静默节点,使得未故障节点和被激活的静默节点以及故障节点组成2/3Raft集群,保证Raft集群中的Raft节点超半数,维持***正常可用。
在一些实施方式中,第一调节器340为主调节器,第二调节器440为从调节器,当第一调节器340检测到第二节点420发生故障且无法修复时,第一调节器340激活第三节点330,第三节点330作为群众节点,以使第一节点320(未故障节点)、第二节点420(故障节点)和第三节点330(被激活的静默节点)组成2/3Raft集群,如图4所示,保证Raft集群中的Raft节点超半数,维持***正常可用。
在一些实施方式中,第一调节器340为主调节器,第二调节器440为从调节器,当第一调节器340检测到第一节点320发生故障且无法修复时,第一调节器340切换为从调节器,第二调节器440切换为主调节器;第二调节器440激活第四节点430,以使第一节点320(故障节点)、第二节点420(未故障节点)和第四节点430(被激活的静默节点)组成2/3Raft集群,如图4所示,保证Raft集群中的Raft节点超半数,维持***正常可用。可以理解的是,第一调节器340和第二调节器440之间通过心跳监测进行主从切换,目的是为了在其中一侧的服务器或调节器故障后,也能够保证调节器正常工作对Raft集群进行监控;因此,若在第一调节器340检测到第一节点320发生故障且无法修复前,第二调节器440在预设时间内未收到第一调节器340的心跳信息,则第一调节器340和第二调节器440会在进行主从切换后,由第二调节器440作为主调节器检测到第一节点320发生故障且无法修复,并激活第四节点430,第四节点430作为群众节点。
通过第三节点330、第四节点430、第一调节器340和第二调节器440的设置,由第一调节器340或第二调节器440对Raft集群进行监控,在第一节点320或者第二节点420发生故障时,激活对应的第三节点330或第四节点430(其中,第一节点320故障时,对应激活第四节点430;第二节点420故障时,对应激活第三节点330),从而保证Raft集群的正常运行,快速恢复集群服务。
在一些实施方式中,将第三节点330和第四节点430的其中一个初始状态设置为静默状态,另一个的初始状态设置为激活状态。在一些实施方式中,第三节点330初始状态设置为静默状态,第四节点430的初始状态设置为激活状态,此时,Raft集群为一个3/3集群(第一节点320、第二节点420、第四节点430三个Raft节点),如图4所示;对应地,第二调节器440为主调节器,第一调节器340为从调节器;当第一节点320发生故障时,第三节点330和第四节点430的状态无需更改,调节器主从关系不进行切换,***仍正常工作;当第二节点420发生故障时,第一调节器340切换为主调节器并激活第三节点330,第二调节器440关闭第四节点430且切换为从调节器,以维持***正常运作。
在一些实施方式中,第三节点330和第四节点430仅支持投票和心跳等基础功能,不进行数据存储,从而节约服务器资源。
在一些实施方式中,第三节点330和第四节点430支持数据存储,此时,第三节点330或第四节点430中被激活的节点与第一负责均衡器和第二负载均衡器410通信连接;被激活的第三节点330或第四节点430同样承担数据同步备份和处理服务请求的读请求的功能。
在一些实施方式中,第三节点330和第四节点430还可以部署多个,但第三节点330和第四节点430需要一一对应。可以理解地,第一调节器340可以部署一个,一个第一调节器 340控制多个第三节点330;第一调节还可以部署多个,第一调节器340和第三节点330一一对应,一个第一调节器340控制一个第三节点330;第二调节器440同理。
在一些实施方式中,主调节器通过以下方法对Raft集群进行监控:
主调节器定期检查Raft进程,检测是否由节点异常。
主调节器定期向Raft集群发送服务请求,向Raft集群写入检测数据,其中,检测数据可以为一个简单的标识码,在此不做具体限定;主调节器再向Raft发送服务请求,从Raft集群的各节点中读取在先写入的检测数据;若无法从某一节点中读取该检测数据,则认为该节点故障。
图5是本申请一实施例提供的分布式***部署方法的流程图。如图5所示,该分布式***部署方法可用于服务器、双机架构***等。在图5的实施例中,该分布式***部署方法应用于分布式***,分布式***包括第一服务器和第二服务器,该分布式***部署方法可以包括但不限于步骤S1000、步骤S2000。
步骤S1000:在第一服务器中部署第一节点和第一负载均衡器。
步骤S2000:在第二服务器中部署第二节点和第二负载均衡器。
其中,第一节点与第二节点之间通信连接,第一节点和第二节点通过分布式一致性Raft协议确认节点身份,第一节点分别与第一负载均衡器和第二负载均衡器通信连接;第二节点分别与第一负载均衡器和第二负载均衡器通信连接;第一负载均衡器和第二均衡负载器均被设置为根据负载情况分发服务请求。
在一些实施方式中,该分布式***部署方法还包括:在第一服务器中部署至少一个第三节点;在第二服务器中部署至少一个与第三节点对应的第四节点;其中,第三节点分别与第一节点、第二节点通信连接,第三节点与第一节点、第二节点之间通过Raft协议确认节点身份;第四节点分别与第一节点、第二节点通信连接,第四节点与第一节点、第二节点之间通过Raft协议确认节点身份;第三节点和第四节点配置有相同的IP地址。
在一些实施方式中,第三节点和第四节点可以部署有多个,在部署多个第三节点和第四节点时,第三节点和第四节点要一一对应,部署数量要以“一对第三节点和第四节点”为单位增加或减少,每对第三节点和第四节点配置一个相同的IP地址,不同的第三节点之间的IP地址不同,不同的第四节点之间的IP地址不同。
在一些实施方式中,在第三节点和第四节点配置有存储功能的情况下,第三节点分别与第一负载均衡器和第二负载均衡器通信连接,使得第三节点能够获取第一负载均衡器和/或第二负载均衡器分发的服务请求;第四节点分别与第一负载均衡器和第二负载均衡器通信连接,使得第四节点能够获取第一负载均衡器和/或第二负载均衡器分发的服务请求。
在一些实施方式中,该分布式***部署方法还包括:在第一服务器中部署至少一个与第三节点对应的第一调节器;在第二服务器中部署至少一个与第四节点对应的第二调节器;其中,第一调节器和第二调节器通信连接,第一调节器被设置为监控分布式***的健康状态以及管理第三节点,第二调节器被设置为监控分布式***的健康状态以及管理第四节点。
在一些实施方式中,可以仅部署一个第一调节器,对应控制多个第三节点;也可以部署多个第一调节器,第一调节器与第三节点一一对应;还可以部署不同数量的第一调节器和第三节点,构建一对一和/或一对多的控制关系,在此不做具体限定;第二调节器的部署同理。
在一些实施方式中,第三节点和第四节点的初始状态默认为未启动状态。
本申请一实施例提供了一种分布式***配置方法,该分布式***配置方法应用于根据上述任一实施例描述的分布式***部署方法得到的分布式***,或上述任一实施例描述的分布式***。该分布式***配置方法至少包括但不限于以下步骤:
将第三节点和四节点的初始状态设置为未启动状态。
在一些实施方式中,第一调节器为默认主调节器,第二调节器为默认从调节器;该分布式***配置方法包括:通过第一调节器定时向第二调节器发送心跳消息;在第二调节器在预设时间内未收到心跳消息的情况下,确认第二调节器为主调节器,第一调节器为从调节器。
在一些实施方式中,该分布式***配置方法还包括:将主调节器与第三节点和/或第四节点的IP地址进行绑定;在主调节器与从调节器进行主从切换的情况下,将IP地址与新的主调节器进行绑定。可以理解的是,当主调节器与从调节器进行主从切换的情况下,主调节器切换为新的从调节器,从调节器切换为新的主调节器,再将上述IP地址与新的主调节器进行绑定;在一些实施方式中,第一调节器为主调节器,第二调节器为从调节器,发生主从切换后,第一调节器切换为从调节器,第二调节器切换为主调节器。可以理解的是,当主调节器为第一调节器,将主调节器与第三节点的IP地址进行绑定;当主调节器为第二调节器,将主调节器与第四节点的IP地址进行绑定。
在一些实施方式中,该分布式***配置方法还包括:通过主调节器监控第一节点和第二节点;在主调节器检测到第一节点出现异常的情况下,主调节器对第一节点进行修复,得到修复结果;在修复结果为修复失败的情况下,通过主调节器激活第四节点,使得第四节点的初始状态从未启动状态变为启动状态。
在一些实施方式中,该分布式***配置方法还包括:通过主调节器监控第一节点和第二节点;在主调节器发现第二节点出现异常的情况下,主调节器对第二节点进行修复,得到修复结果;在修复结果为修复失败的情况下,通过主调节器激活第三节点,使得第三节点的初始状态从未启动状态变为启动状态。
该分布式***配置方法的实现流程在与上述实施例中描述的分布式***中的相关流程相同,在此不做进一步展开描述。
下面以具体的应用场景对本申请的分布式***部署进行进一步描述,以下实施例仅为了进一步清楚地描述本申请方案,并不做具体限定。
图6为本申请一实施例提供的统一网管平台的***架构图。
某运营商在A市有一套采用双机热备部署方式的统一网管平台,随着业务拓展,下接的网元越来越多,在业务高峰期性能和告警上报频繁导致频频发生主备切换,导致网管平台经常不可用。
现对该统一网管平台进行本申请的分布式***部署改造,如图6所示。
实施步骤如下:
将统一网管服务平台根据raft协议进行改造,得到统一网管服务平台A和统一网管服务平台B。
将改造后的统一网管服务平台A部署到第一服务器中,将改造后的统一网管服务平台B部署到第二服务器中,将统一网管服务平台A和统一网管服务平台B组成一个2个raft节点的集群。
设置一个负载均衡器,并部署到两个服务器上,设置服务请求接口,当收到服务请求时,根据两台服务器的负载情况动态分发请求。
图7为本申请一实施例提供的分布式数据库的***架构图。
某运营商希望在A地用两台服务器部署一套轻量级的kubernetes平台(kubernetes是一个全新的基于容器技术的分布式架构解决方案,是一个开源的容器集群管理***),用于部署网管微服务***。实施环境说明:运营商提供了两台服务器是双网卡,环境部署kubernetes,支持应用容器引擎(Docker)服务。如图7所示。
实施步骤如下:
步骤一:配置两台服务器A和B的网卡,服务器A的网卡1绑定IP1,服务器B的网卡1绑定到IP2,服务器A的网卡2和服务器B的网卡2均绑定到IP3,其中IP3及对应的网卡处于关闭状态。
步骤二:在两台服务器上分别部署两套ETCD微服务,在服务器A中部署ETCD微服务A映射到宿主机的IP1,在服务器A中部署备用ETCD服务映射到宿主机的IP3;在服务器B中部署ETCD微服务B映射到宿主机的IP2,在服务器B中部署备用ETCD服务映射到宿主机的IP3。其中,ETCD是一个高可用的键值存储***,主要用于共享键值仓库和服务发现。
步骤三:在服务器A和服务器B上分别部署调节器,启动后随机一台服务器的调节器设为主调节器(假设服务器A的调节器为主调节器),主调节器将处于本机的IP3对应的网卡置为启动状态。
步骤四:启动IP1及对应的ETCD微服务A和IP2及对应的ETCD微服务B,形成2/3的raft集群。
步骤五:部署kubelet服务,需要说明的是,kubelet访问ETCD集群的负载均衡器是属于kubelet的内部小组件,因此不需要额外的再部署负载均衡器。其中,kubelet是主要的节点代理,它会监视已分配给节点的pod;其中,Pod是kubernetes***中可以创建和管理的最小单元,是资源对象模型中由用户创建或部署的最小资源对象模型,也是在kubernetes上运行容器化应用的资源对象。
步骤六:当服务器B出现故障时,主调节器将绑定在IP3的备用ETCD微服务启动,此时仍能保持2/3的raft集群正常提供云平台服务。
步骤七:当服务器A出现故障时,主调节器将迁移到服务器B,即服务器B中的调节器切换为主调节器,此时将服务器A的网卡2置为关闭状态,服务器B的网卡2置为启动状态,同时启动服务器B绑定到IP3的备用ETCD微服务,此时提供2/3的正常raft集群服务。
图8为本申请一实施例提供的业务支撑***的***架构图。
某电信运营商在某地用两套服务器部署一套轻量级的业务支撑***(Business Support System,BSS),在业务办理高峰期,要求***能提供负载均衡提升服务效率,一旦单台服务器发生异常时,能迅速恢复业务并保证数据一致性。实施环境说明:两台服务器均为单网卡,只部署业务支撑平台,提供单一服务。其实施如图8所示。
实施步骤如下:
步骤一:基于raft协议实现业务支撑***BSS。
步骤二:配置两台服务器A和B的网卡,将服务器A的网卡1绑定IP1,服务器B的网卡1绑定到IP2,服务器A和服务器B均预留一个未使用的IP3。
步骤三:在两台服务器上分别部署两套BSS:在服务器A中部署第一BSS并绑定到的IP1,部署备用BSS并绑定到IP3;在服务器B中部署第二BSS并绑定到的IP2,部署备用BSS并绑定到IP3。
步骤四:分别在两台服务器上部署调节器,启动后随机一台服务器的调节器为主服务器(假设服务器A的调节器为主调节器),主调节器将IP3绑定到同一服务器(服务器A)的网卡1上。
步骤五:在两台服务器中部署负载均衡器,连接BSS集群,并设置服务请求接口,当收到服务请求时,负载均衡器根据两台服务器的负载情况动态分发服务请求。
步骤六:启动IP1及对应的第一BSS服务和启动IP2及对应的第二BSS服务,形成2/3的raft集群。
步骤七:当服务器B出现问题时,主调节器将同服务器(服务器A)IP3的备用BSS服务拉起,此时仍能提供2/3的raft集群,可以提供正常的服务。
步骤八:当服务器A出现问题时,主调节将迁移到服务器B,即服务器B的调节器切换为主调节器,此时将服务器A的网卡1上的IP3解绑,服务器B的网卡1绑定IP3,同时启动服务器B绑定到IP3的备用BSS服务,此时提供2/3的正常raft集群。
图9是本申请一实施例提供的一种电子设备结构示意图。如图9所示,电子设备1000包括存储器1100、处理器1200。存储器1100、处理器1200的数量可以是一个或多个,图9中以一个存储器1100和一个处理器1200为例;设备中的存储器1100和处理器1200可以通过总线或其他方式连接,图9中以通过总线连接为例。
存储器1100作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块,如本申请任一实施例提供的分布式***部署方法或分布式***配置方法对应的程序指令/模块。处理器1200通过运行存储在存储器1110中的软件程序、指令以及模块实现上述分布式***部署方法或分布式***配置方法。
存储器1100可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作***、至少一个功能所需的应用程序。此外,存储器1100可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件或其他非易失性固态存储器件。在一些实例中,存储器1100还可包括相对于处理器1200远程设置的存储器,这些远程存储器可以通过网络连接至设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
本申请一实施例还提供了一种计算机可读存储介质,存储有计算机可执行指令,该计算机可执行指令用于执行如本申请任一实施例提供的分布式***部署方法或分布式***配置方法。
本申请实施例方案,通过在两个服务器中分别部署Raft节点和负载均衡器,组成Raft集群,使得处理服务请求、主从切换和数据备份等操作在两个服务器均保持工作的情况下以更细粒度的节点继续处理,避免了传统双机热备***以服务器为单位进行主从切换的情况,提升***稳定性和双机的数据一致性;同时通过负载均衡器将服务请求根据各节点负载情况分配至不同节点进行处理,提高了节点负载利用率,提升了服务请求处理效率。
本申请实施例描述的***架构以及应用场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域技术人员可知,随着***架构的演变和新应用场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、***、设备中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。
在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
在本说明书中使用的术语“部件”、“模块”、“***”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。例如,部件可以是但不限于,在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序或计算机。通过图示,在计算设备上运行的应用和计算设备都可以是部件。一个或多个部件可驻留在进程或执行线程中,部件可位于一个计算机上或分布在2个或更多个计算机之间。此外,这些部件可从在上面存储有各种数据结构的各种计算机可读介质执行。部件可例如根据具有一个或多个数据分组(例如来自于自与本地***、分布式***或网络间的另一部件交互的二个部件的数据,例如通过信号与其它***交互的互联网)的信号通过本地或远程进程来通信。
以上参照附图说明了本申请的一些实施例,并非因此局限本申请的权利范围。本领域技术人员不脱离本申请的范围和实质内所作的任何修改、等同替换和改进,均应在本申请的权利范围之内。

Claims (13)

  1. 一种分布式***部署方法,其中,所述分布式***包括第一服务器和第二服务器,所述方法包括:
    在所述第一服务器中部署第一节点和第一负载均衡器;
    在所述第二服务器中部署第二节点和第二负载均衡器;
    其中,所述第一节点与所述第二节点之间通信连接,所述第一节点和所述第二节点通过分布式一致性Raft协议确认节点身份,所述第一节点分别与所述第一负载均衡器和所述第二负载均衡器通信连接;所述第二节点分别与所述第一负载均衡器和所述第二负载均衡器通信连接;所述第一负载均衡器和所述第二均衡负载器均被设置为根据负载情况分发服务请求。
  2. 根据权利要求1所述的方法,其中,所述方法还包括:
    在所述第一服务器中部署至少一个第三节点;
    在所述第二服务器中部署至少一个与所述第三节点对应的第四节点;
    其中,所述第三节点分别与所述第一节点、所述第二节点通信连接,所述第三节点与所述第一节点、所述第二节点之间通过Raft协议确认节点身份;所述第四节点分别与所述第一节点、所述第二节点通信连接,所述第四节点与所述第一节点、所述第二节点之间通过Raft协议确认节点身份;所述第三节点和所述第四节点配置有相同的I P地址。
  3. 根据权利要求2所述的方法,其中,在所述第三节点和所述第四节点配置有存储功能的情况下,所述第三节点分别与所述第一负载均衡器和所述第二负载均衡器通信连接,使得所述第三节点能够获取所述第一负载均衡器和/或所述第二负载均衡器分发的所述服务请求;所述第四节点分别与所述第一负载均衡器和所述第二负载均衡器通信连接,使得所述第四节点能够获取所述第一负载均衡器和/或所述第二负载均衡器分发的所述服务请求。
  4. 根据权利要求3所述的方法,其中,所述方法还包括:
    在所述第一服务器中部署至少一个与所述第三节点对应的第一调节器;
    在所述第二服务器中部署至少一个与所述第四节点对应的第二调节器;
    其中,所述第一调节器和所述第二调节器通信连接,所述第一调节器被设置为监控所述分布式***的健康状态以及管理所述第三节点,所述第二调节器被设置为监控所述分布式***的健康状态以及管理所述第四节点。
  5. 一种分布式***配置方法,其中,应用于根据如上述权利要求4所述的分布式***部署方法得到的分布式***,所述方法包括:
    将所述第三节点和所述四节点的初始状态设置为未启动状态。
  6. 根据权利要求5所述的方法,其中,所述第一调节器为默认主调节器,第二调节器为默认从调节器,所述方法包括:
    通过所述第一调节器定时向所述第二调节器发送心跳消息;
    在所述第二调节器在预设时间内未收到心跳消息的情况下,确认所述第二调节器为主调节器,所述第一调节器为从调节器。
  7. 根据权利要求6所述的方法,其中,所述方法包括:
    将所述主调节器与所述第三节点和/或所述第四节点的IP地址进行绑定;
    在所述主调节器与所述从调节器进行主从切换的情况下,将所述IP地址与新的主调节器进行绑定。
  8. 根据权利要求7所述的方法,其中,所述方法还包括:
    通过所述主调节器监控所述第一节点和所述第二节点;
    在所述主调节器检测到所述第一节点出现异常的情况下,所述主调节器对所述第一节点进行修复,得到修复结果;
    在所述修复结果为修复失败的情况下,通过所述主调节器激活所述第四节点,使得所述第四节点的所述初始状态从所述未启动状态变为启动状态。
  9. 根据权利要求7所述的方法,其中,所述方法还包括:
    通过所述主调节器监控所述第一节点和所述第二节点;
    在所述主调节器发现所述第二节点出现异常的情况下,所述主调节器对所述第二节点进行修复,得到修复结果;
    在所述修复结果为修复失败的情况下,通过所述主调节器激活所述第三节点,使得所述第三节点的所述初始状态从所述未启动状态变为启动状态。
  10. 一种分布式***,包括:
    第一服务器,所述第一服务器设置有第一负载均衡器以及第一节点;
    第二服务器,所述第二服务器设置有第二负载均衡器以及第二节点;
    其中,所述第一节点与所述第二节点之间通信连接,所述第一节点和所述第二节点通过分布式一致性Raft协议确认节点身份,所述第一节点分别与所述第一负载均衡器和所述第二负载均衡器通信连接;所述第二节点分别与所述第一负载均衡器和所述第二负载均衡器通信连接;所述第一负载均衡器和所述第二均衡负载器均被设置为根据负载情况分发服务请求。
  11. 根据权利要求10所述的***,其中,所述第一服务器设置有至少一个第三节点,所述第二服务器设置有至少一个与所述第三节点对应的第四节点,其中,所述第三节点分别与所述第一节点、所述第二节点通信连接,所述第三节点与所述第一节点、所述第二节点之间通过Raft协议确认节点身份;所述第四节点分别与所述第一节点、所述第二节点通信连接,所述第四节点与所述第一节点、所述第二节点之间通过Raft协议确认节点身份;所述第三节点和所述第四节点配置有相同的IP地址;
    所述第一服务器设置有至少一个与所述第三节点对应的第一调节器,所述第二服务器设置有至少一个与所述第四节点对应的第二调节器,所述第一调节器和所述第二调节器之间通信连接;其中,所述第一调节器被设置为监控所述分布式***的健康状态以及管理所述第三节点,所述第二调节器被设置为监控所述分布式***的健康状态以及管理所述第四节点。
  12. 一种电子设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如权利要求1至4任一项所述的分布式***部署方法或如权利要求5至9任一项所述的分布式***配置方法。
  13. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,所述计算机可执行所述计算机程序时实现如权利要求1至4任一项所述的分布式***部署方法或如权利要求5至9任一项所述的分布式***配置方法。
PCT/CN2023/116224 2022-09-06 2023-08-31 分布式***部署方法、配置方法、***、设备及介质 WO2024051577A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211084128.1 2022-09-06
CN202211084128.1A CN117714386A (zh) 2022-09-06 2022-09-06 分布式***部署方法、配置方法、***、设备及介质

Publications (1)

Publication Number Publication Date
WO2024051577A1 true WO2024051577A1 (zh) 2024-03-14

Family

ID=90143006

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/116224 WO2024051577A1 (zh) 2022-09-06 2023-08-31 分布式***部署方法、配置方法、***、设备及介质

Country Status (2)

Country Link
CN (1) CN117714386A (zh)
WO (1) WO2024051577A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108494835A (zh) * 2018-03-08 2018-09-04 郑州云海信息技术有限公司 基于Raft算法的分布式动态路由的实现方法及***
WO2020062131A1 (zh) * 2018-09-29 2020-04-02 北京连云决科技有限公司 一种基于区块链技术的容器云管理***
CN112910937A (zh) * 2019-11-19 2021-06-04 北京金山云网络技术有限公司 容器集群中的对象调度方法、装置、服务器和容器集群
US20210200814A1 (en) * 2019-12-30 2021-07-01 Servicenow, Inc. Discovery of containerized platform and orchestration services

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108494835A (zh) * 2018-03-08 2018-09-04 郑州云海信息技术有限公司 基于Raft算法的分布式动态路由的实现方法及***
WO2020062131A1 (zh) * 2018-09-29 2020-04-02 北京连云决科技有限公司 一种基于区块链技术的容器云管理***
CN112910937A (zh) * 2019-11-19 2021-06-04 北京金山云网络技术有限公司 容器集群中的对象调度方法、装置、服务器和容器集群
US20210200814A1 (en) * 2019-12-30 2021-07-01 Servicenow, Inc. Discovery of containerized platform and orchestration services

Also Published As

Publication number Publication date
CN117714386A (zh) 2024-03-15

Similar Documents

Publication Publication Date Title
CN110224871B (zh) 一种Redis集群的高可用方法及装置
CN102355369B (zh) 虚拟化集群***及其处理方法和设备
CA3168286A1 (en) Data flow processing method and system
US8032786B2 (en) Information-processing equipment and system therefor with switching control for switchover operation
US9992058B2 (en) Redundant storage solution
CN105159798A (zh) 一种虚拟机的双机热备方法、双机热备管理服务器和***
CN109213571B (zh) 一种内存共享方法、容器管理平台及计算机可读存储介质
CN110069365B (zh) 管理数据库的方法和相应的装置、计算机可读存储介质
US11349706B2 (en) Two-channel-based high-availability
CN111385296B (zh) 一种业务进程重启方法、装置、存储介质以及***
EP3648405B1 (en) System and method to create a highly available quorum for clustered solutions
CN113515408A (zh) 一种数据容灾方法、装置、设备及介质
CN116881053B (zh) 数据处理方法及交换板、数据处理***、数据处理装置
WO2018171728A1 (zh) 服务器、存储***及相关方法
WO2020252724A1 (zh) 日志处理方法、设备及计算机可读存储介质
CN111580753B (zh) 存储卷级联***、批量作业处理***和电子设备
WO2024051577A1 (zh) 分布式***部署方法、配置方法、***、设备及介质
CN116467120A (zh) 主主架构的数据库部署方法、数据库访问方法及装置
CN114124803B (zh) 设备管理方法、装置、电子设备及存储介质
CN113596195B (zh) 公共ip地址管理方法、装置、主节点及存储介质
JP6954693B2 (ja) フォールトトレラントシステム、サーバ、それらの運用方法、及びプログラム
CN114398203A (zh) 云灾备***、方法、电子设备及存储介质
WO2020241032A1 (ja) フォールトトレラントシステム、サーバ、フォールトトレラントシステムの運用方法、サーバの運用方法、及びサーバの運用方法のプログラム
CN112882771A (zh) 应用***的服务器切换方法及装置、存储介质及电子设备
CN112988335A (zh) 一种高可用的虚拟化管理***、方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23862271

Country of ref document: EP

Kind code of ref document: A1