WO2023092772A1 - 一种虚拟化集群高可用性的实现方法和设备 - Google Patents

一种虚拟化集群高可用性的实现方法和设备 Download PDF

Info

Publication number
WO2023092772A1
WO2023092772A1 PCT/CN2021/139934 CN2021139934W WO2023092772A1 WO 2023092772 A1 WO2023092772 A1 WO 2023092772A1 CN 2021139934 W CN2021139934 W CN 2021139934W WO 2023092772 A1 WO2023092772 A1 WO 2023092772A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage
virtual machine
host
controller
dvs
Prior art date
Application number
PCT/CN2021/139934
Other languages
English (en)
French (fr)
Inventor
边瑞锋
胡林
杨经纬
Original Assignee
***数智科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ***数智科技有限公司 filed Critical ***数智科技有限公司
Publication of WO2023092772A1 publication Critical patent/WO2023092772A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/301Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is a virtual computing platform, e.g. logically partitioned systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/55Prevention, detection or correction of errors
    • H04L49/557Error correction, e.g. fault recovery or fault tolerance
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/70Virtual switches
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45575Starting, stopping, suspending or resuming virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45591Monitoring or debugging support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Definitions

  • the invention belongs to the technical field of virtualization, and in particular relates to a method and equipment for realizing high availability of a virtualized cluster.
  • the application of virtualization technology can realize server integration, provide an independent, efficient and flexible operating environment for application systems, save resources and facilitate management.
  • Server virtualization must have high availability (High Availability, HA) in order to form a stable and continuous basic platform. When the server or the virtual machine running on it fails, the application system will continue or briefly interrupt the service.
  • High Availability High Availability
  • High-availability clusters minimize the impact of software, hardware, or human-induced failures on the business by protecting the user's business programs to provide uninterrupted services to the outside world. If a node fails, a backup node will take over within seconds. Therefore, for the user, the cluster never goes down.
  • the main function of high-availability cluster software is to realize the automation of fault detection and business switching.
  • Virtualization software such as VMware, openstack, etc. usually use automatic real-time migration of virtual machines to ensure high availability of virtual machines. On physical servers with less resource consumption, thereby maintaining business continuity. When the virtual machine itself has a fault such as network abnormality or insufficient virtual machine system resources, the virtual machine cannot be automatically migrated, and high availability cannot be realized.
  • vSphere HA and OpenStack are often used to achieve high availability in a virtualized environment.
  • vSphere HA uses multiple ESXi hosts in the cluster to provide high availability for fast interruption recovery for applications running in virtual machines.
  • Sphere HA protects against server failures by restarting virtual machines on other hosts within the cluster. Continuously monitor virtual machines and reset them when failures are detected, preventing application failures.
  • vSphere HA provides high availability for virtual machines by centralizing virtual machines and the hosts on which they reside within a cluster. The hosts in the cluster are monitored, and in the event of a failure, the virtual machines on the failed host are restarted on alternate hosts.
  • a host is automatically selected as the preferred host.
  • the preferred host communicates with vCenter Server and monitors the status of all protected virtual machines and slave hosts. Different types of host failures can occur, and the preferred host must detect and handle failures accordingly. The preferred host must be able to distinguish a failed host from a host that is in a network partition or has been isolated from the network. The preferred host uses network and datastore heartbeats to determine the type of failure.
  • vSphere HA relies on server clusters, which have requirements for the number of hosts, at least three. Inter-cluster communication has high requirements on the network and requires a highly reliable cluster network. The larger the scale of the multicast mechanism of the cluster, the lower the efficiency.
  • the monitoring of virtual machines depends on VMware Tools, and installing tools in virtual machines is unacceptable in some cases. And vSphere is commercial software, closed source. There is the problem of difficulty in upgrading.
  • Host high availability means that when a hardware failure occurs on a physical computing node (such as disk damage, CPU or memory failure causing downtime, physical network failure, and power failure), the node is automatically shut down, and the virtual machine on the node is healthy in the cluster. restart on the compute node.
  • High availability of virtual machines means that when a virtual machine fails and shuts down, the monitoring software can automatically restart the virtual machine.
  • Openstack high availability is implemented based on three steps: Monitoring, Fencing and Recovery. The tracking and monitoring of computing nodes is isolated by detecting whether the service on the node is faulty or not. Pacemaker provides the isolation function for cluster nodes.
  • the technical problem to be solved by the present invention is to provide a high-availability implementation method and device for a virtualized cluster in view of the deficiencies of the above-mentioned existing technologies, which can ensure continuous service of physical machines and virtual machines through fast fault recovery, and guarantee Data Security.
  • a device for realizing high availability of a virtualized cluster including: an HA controller, a storage node, a DVS controller, and several computing nodes;
  • the HA controller is used for monitoring the heartbeat of the host, and making decisions and further controlling the overtime host;
  • the computing node is used for reporting and storing the heartbeat information of the host and monitoring the virtual machine
  • the storage node is configured to receive a storage heartbeat through a storage network
  • the DVS controller is used to control the virtual machine switch of each computing node to manage and configure network policies.
  • the above-mentioned HA controller uses UDP port to monitor, timing the heartbeat message and performing alarm processing on the error message, so as to ensure the efficiency of service and transmission.
  • HA-monitor, Storage Agent and DVS Agent are deployed on each computing node mentioned above;
  • the HA-monitor regularly reports heartbeat information to the HA controller
  • the HA-monitor also monitors the state of the virtual machine
  • the DVS Agent controls and communicates with the DVS through the DVS network.
  • the above-mentioned HA-monitor monitors the state of the virtual machine, monitors various events of the virtual machine, and performs restart or alarm operations on the virtual machine according to the HA policy, and restarts the virtual machine when the virtual machine process exits abnormally or the virtual machine kernel panic event occurs.
  • the above-mentioned HA controller checks the state of the computing node DVS Agent through the interface of the DVS controller.
  • the above-mentioned storage nodes provide APIs for the HA controller to query the stored heartbeat information.
  • Storage-agent When Storage-agent sends storage events, it also periodically sends heartbeats to Storage-monitor;
  • the HA controller obtains host status information through Storage-monitor.
  • a method for realizing high availability of a virtualized cluster comprising:
  • Step 1 The computing node reports and stores the heartbeat information of the host and monitors the virtual machine
  • Step 2 The storage node receives the storage heartbeat through the storage network
  • Step 3 The HA controller monitors the heartbeat of the host, and makes decisions and further controls the timeout host;
  • Step 4 The DVS controller controls the virtual machine switches of each computing node to manage and configure network policies.
  • the HA controller executes the following processing strategy:
  • Step 3-1 Actively connect to libvirt to query the status of the virtual machine. If the connection is successful and the status of the virtual machine is correct, the alarm will display that the HA-monitor is abnormal. Otherwise, the host is abnormal, and then go to step 3-2;
  • Step 3-2 Query the storage heartbeat through the storage network:
  • Step 3-3 Query the power status through the BMC interface:
  • step 3-2 for the storage type that cannot store the heartbeat, query the host status through the DVS controller, if the DVS Agent is normal, it is determined to be a problem with the management network, and an alarm is processed;
  • the management network of the entire cluster is checked. If the management network of hosts exceeding a certain threshold in the cluster fails, it is determined to be a problem with the management network and an alarm is processed.
  • the server virtualization high availability function of the present invention mainly includes virtual machine HA and host HA.
  • a virtual machine When a virtual machine encounters an abnormal shutdown, it can be automatically restarted through the monitoring software. If the host is abnormal and unresponsive, the host can be isolated through the IPMI interface, and the virtual machine running on the host will be automatically migrated. It can effectively prevent the split-brain phenomenon in which multiple virtual machines in the cluster access the same storage. When the host or virtual machine fails, it can respond quickly.
  • the virtual machine failure detection time is within 1 second.
  • the host fault detection time can be adjusted as needed. The default is 3 heartbeat cycles, each cycle is 5 seconds, that is, the fault detection time is 15 seconds.
  • the fault recovery detection process adopts multiple mechanisms to prevent false alarms, avoid single strategy failure, greatly prevent errors, and effectively prevent split-brain phenomenon.
  • the present invention does not rely on third-party software, and is completely autonomous and controllable as a part of the virtualization management software.
  • Cluster management is flexible in deployment, supports dynamic management of clusters and supports clusters of any node.
  • Fig. 1 is the composition figure of equipment of the present invention
  • Fig. 2 is the working flow diagram of main components in the equipment of the present invention.
  • Fig. 3 is the realization principle diagram of DVS of the present invention.
  • Fig. 4 is the overall workflow of the device of the present invention.
  • a device for realizing high availability of a virtualized cluster includes: HA controller, storage node, DVS controller and several computing nodes;
  • the HA controller is used for monitoring the heartbeat of the host, and making decisions and further controlling the overtime host;
  • the computing node is used for reporting and storing the heartbeat information of the host and monitoring the virtual machine
  • the storage node is configured to receive a storage heartbeat through a storage network
  • the DVS controller is used to control the virtual machine switch of each computing node to manage and configure network policies.
  • the HA controller is a centralized controller, which is responsible for collecting host heartbeats, making decisions and further controlling overtime hosts; its high availability is guaranteed by the server, which will not be discussed in this article.
  • the HA controller uses a UDP port to monitor, timing heartbeat messages and alarming error messages, so as to ensure service and transmission efficiency.
  • HA-monitor, storage Agent and DVS Agent are deployed on each computing node;
  • the HA-monitor regularly reports heartbeat information to the HA controller
  • the HA-monitor also monitors the state of the virtual machine
  • the DVS Agent controls and communicates with the DVS through the DVS network.
  • the HA-monitor monitors the state of the virtual machine, monitors various events of the virtual machine, and restarts or warns the virtual machine according to the HA strategy, and restarts the virtual machine when the following two events occur:
  • the virtual machine process exits abnormally, that is, the qemu process exits abnormally due to various wishes, and the virtual machine is also in an abnormal shutdown state at this time.
  • the HA controller checks the status of the computing node DVS Agent through the interface of the DVS controller.
  • the storage node provides an api (Application Programming Interface, application programming interface) for the HA controller to query the stored heartbeat information.
  • api Application Programming Interface, application programming interface
  • the storage monitor is set for the ocfs2 cluster file system and ceph, and the Storage-agent (corresponding to the storage Agent) is set at each computing node ;
  • Storage-agent When Storage-agent sends storage events, it also periodically sends heartbeats to Storage-monitor (storage node);
  • the HA controller obtains host status information through Storage-monitor.
  • a method for realizing high availability of a virtualized cluster comprising:
  • Step 1 The computing node reports and stores the heartbeat information of the host and monitors the virtual machine
  • Step 2 The storage node receives the storage heartbeat HeartBeat through the storage network
  • Step 3 The HA controller monitors the heartbeat of the host, and makes decisions and further controls the timeout host;
  • Step 4 The DVS controller controls the virtual machine switches of each computing node to manage and configure network policies.
  • the workflow of the main components is shown in FIG. 2 .
  • HA-monitor is also responsible for monitoring the status of virtual machines. Various events of the virtual machine can be monitored. According to the HA policy, the virtual machine can be restarted or alarmed. The following two events require a virtual machine restart:
  • the virtual machine process exits abnormally. That is, the qemu process exits abnormally due to various wishes, and the virtual machine is also in an abnormal shutdown state at this time.
  • a storage monitor is implemented for the ocfs2 cluster file system and ceph, and a Storage-agent is implemented on each computing node.
  • Storage-agent sends storage events, it also periodically sends heartbeats to Storage-monitor.
  • the HA controller can obtain host status information through Storage-monitor. If the storage type does not support Storage-monitor, it can be handled by other methods such as DVS.
  • DVS is an implementation of a distributed virtual switch, mainly including DVS controller, DVS Agent, OVS and other components.
  • DVS Agent When DVS Agent sends network events, it also sends heartbeats to DVS controller regularly.
  • the HA controller can obtain the status information of the host through the DVS controller.
  • the HA controller executes the following processing strategy:
  • Step 3-1 Actively connect to libvirt to query the status of the virtual machine. If the connection is successful and the status of the virtual machine is correct, the alarm will display that the HA-monitor is abnormal. Otherwise, the host is abnormal, and then go to step 3-2;
  • Step 3-2 Query the storage heartbeat through the storage network:
  • step 3-2 for the storage type that cannot store the heartbeat, query the host state through the DVS controller, if the DVS Agent is normal, it is determined to be a problem with the management network, and an alarm is processed.
  • the management network of the entire cluster can be checked. If the management network of hosts exceeding a certain threshold in the cluster fails, it can be determined that the management network is faulty.
  • Step 3-3 Query the power status through the BMC interface:
  • Steps 3-1 to 3-3 form the following anti-false alarm strategy:
  • Storage heartbeat network check When using network storage, the storage network deploys a heartbeat check mechanism to determine whether abnormal hosts have access to network storage. If the heartbeat network is normal, it can be judged that the host is working normally.
  • DVS controller check. For storage types that do not support storage heartbeat, check the host status through the DVS controller.
  • the DVS network is independent of other networks and is used to control the DVS Agent on the host. If the DVS Agent on the host is normal, it can also be determined that the host is working normally.
  • Cluster network check Network problems generally affect many hosts. If most hosts in the cluster have problems, it can be judged that it is a network problem. In this case, only an alarm is required. The host failure threshold in the cluster is set, and only when the failure threshold is not exceeded, the high-availability shutdown migration operation will be performed.
  • BMC network check The type of host failure can be further judged through the BMC network. Whether there is a hardware failure. And through IPMI and BMC communication to shut down the power of the host, the virtual machine migration operation can only be performed after the host is powered off. This prevents a split-brain phenomenon where multiple virtual machines use the same storage.
  • HA High Availability eliminates single point of failure and automatically recovers from failures (services are automatically migrated to a normal node), providing sustainable services.
  • BMC is a small operating system independent of the server system, which is used to facilitate remote server management, monitoring, installation, restart and other operations.
  • the BMC starts to run immediately when it is powered on. Since it is independent of the business program, it is not affected, and it avoids entering the computer room due to a crash or reinstalling the system.
  • IPMI Intelligent Platform Management Interface
  • Intelligent Platform Management Interface an industrial standard for managing peripheral devices used in enterprise systems based on Intel structure.
  • SuperMicro Users can use the IPMI protocol to connect to the server BMC to monitor the physical health characteristics of the server, such as temperature, voltage, fan working status, power supply status, etc.
  • Fencing a mechanism for troubleshooting faulty nodes, can control the power to shut down unavailable nodes.
  • Libvirt is an open source API, daemon, and management tool for managing virtualization platforms.
  • QEMU is an open source software used to complete hardware virtualization and virtual machine hosting.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Hardware Redundancy (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本发明公开了一种虚拟化集群高可用性的实现方法和设备,设备包括HA控制器、存储节点、DVS控制器和若干计算节点;所述HA控制器,用于主机心跳的监听,并对超时主机进行决策及进一步控制;所述计算节点,用于主机心跳信息的上报、存储以及虚拟机的监控;所述存储节点,用于通过存储网络接收存储心跳;所述DVS控制器,用于控制每个计算节点的虚拟机交换机,进行网络策略的管理配置。本发明能够保证物理机、虚拟机通过快速故障恢复保证持续服务,并通过存储共享保障数据安全。

Description

一种虚拟化集群高可用性的实现方法和设备 技术领域
本发明属于虚拟化技术领域,具体涉及一种虚拟化集群高可用性的实现方法和设备。
背景技术
虚拟化技术的应用能够实现服务器整合,为应用***提供独立、高效和灵活的运行环境,同时节约资源、方便管理。服务器虚拟化必须具备高可用性(High Availability,HA),才能构成稳定持续的基础平台。服务器或其上运行的虚拟机发生故障时,应用***不间断或短暂间断服务。
最常见的高可用解决方案是采用服务器集群技术。高可用集群通过保护用户的业务程序对外不间断提供的服务,把因软件、硬件或人为造成的故障对业务的影响降低到最小程度。如果某个节点失效,备援节点将在几秒钟的时间内接管职责。因此,对于用户而言,集群永远不会停机。高可用集群软件的主要作用就是实现故障检查和业务切换的自动化。
在非虚拟化***中,要对某个计算机应用实施高可用性,需在每台节点服务器分别安装同一种应用,然后将所有节点组成一台集群服务器。应用***种类繁多,不同应用对服务器的配置要求差异较大,如果每个应用都占用两台以上服务器,将造成服务器资源的浪费;如果仅对关键应用实施高可用性,则非关键应用将始终存在单点故障风险。
虚拟化软件如VMware、openstack等,通常是以虚拟机的自动实时迁移来保证虚拟机高可用,即当某一台物理服务器由于故障或维护原因导致服务中断后,其虚拟机自动切换到其他运算资源消耗较小的物理服务器上,从而保持业务的连续性。当虚拟机本身出现网络异常、虚拟机***资源不足等故障时,虚拟机就无法实现自动迁移,高可用性不能实现。
当前常采用基于vSphere HA、OpenStack实现在虚拟化环境下的高可用,vSphere HA利用群集的多台ESXi主机,为虚拟机中运行的应用程序提供快速中断恢复的高可用性。Sphere HA通过在群集内的其他主机上重新启动虚拟机,防止服务器故障。持续监控虚拟机并在检测到故障时对其进行重新设置,防止应用程序故障。vSphere HA可以将虚拟机及其所驻留的主机集中在群集内,从而为虚拟机提供高可用性。群集中的主机均会受到监控,如果发生故障,故障主机上的虚拟机将在备用主机上重新启动。创建vSphere HA群集时,会自动选择一台主机作为首选主机。首选主机可与vCenter Server进行通信,并监控所有受保护的虚拟机以及从属主机的状态。可能会发生不同类型的主机故障,首选主机必须检测并相应地处理故障。首选主机必须可以区分故障主机与处于网络分区中或已与网络隔离的主机。首选主机使用网络和数据存储检测信号来确定故障的类型。但是vSphere HA依赖于服务器集群,集群对 主机的数量有要求,要求最少3台。集群间的通信对网络要求较高,需要高可靠的集群网络。集群的组播机制规模越大效率越低。对虚拟机的监控依赖于VMware Tools,虚拟机内安装tools在有些情况下是不可接受的。且vSphere属于商业软件,闭源。存在升级改造困难的问题。
在OpenStack中,高可用方案分为主机高可用和虚拟机高可用。主机高可用指在物理计算节点发生硬件故障时(如磁盘损坏、CPU或内存故障导致宕机、物理网络故障和电源故障),自动将该节点关闭,该节点上的虚机在集群中其它健康的计算节点上重启。虚拟机高可用指在虚拟机发生故障停机时,监控软件能自动重启虚拟机。Openstack高可用是基于三个步骤来实现:监控(Monitoring)、隔离(Fencing)和恢复(Recovery)。对计算节点的跟踪监控通过探测节点上服务的故障与否来进行隔离,Pacemaker提供了对集群节点的隔离功能,需要在计算节点上实现一个Evacuate(疏散)的资源代理,从而允许Pacemaker触发节点上的Evacuate恢复操作。Pacemaker和Corosync是使用最多的服务高可用监控工具,但Corosync对计算节点的支持数目有限,Pacemaker_remote解决了这种限制。但是Openstack依赖的组件较多,pacemaker,corosync等组件配置复杂,不利于维护。Pacemaker问题较多,存在不稳定因素。部署复杂,集群至少需要3个节点。集群内采用组播机制,规模越大效率越低。且目前,OpenStack没有一套完成的监控、隔离和恢复方案,因此,用户必须自己实现服务监控和节点隔离,同时触发故障计算节点上的Evacuate操作。如果使用Pacemaker集群资源管理器,则需要在计算节点上实现一个Evacuate的资源代理,从而允许Pacemaker触发节点上的Evacuate操作。
发明内容
本发明所要解决的技术问题是针对上述现有技术的不足,提供一种虚拟化集群高可用性的实现方法和设备,能够保证物理机、虚拟机通过快速故障恢复保证持续服务,并通过存储共享保障数据安全。
为实现上述技术目的,本发明采取的技术方案为:
一种虚拟化集群高可用性的实现设备,包括:HA控制器、存储节点、DVS控制器和若干计算节点;
所述HA控制器,用于主机心跳的监听,并对超时主机进行决策及进一步控制;
所述计算节点,用于主机心跳信息的上报、存储以及虚拟机的监控;
所述存储节点,用于通过存储网络接收存储心跳;
所述DVS控制器,用于控制每个计算节点的虚拟机交换机,进行网络策略的管理配置。
为优化上述技术方案,采取的具体措施还包括:
上述的HA控制器使用UDP端口监听,对于心跳报文进行计时及对错误报文进行告警处理,以保证服务及传输的效率。
上述的每个计算节点上部署HA-monitor、存储Agent和DVS Agent;
所述HA-monitor定时向HA控制器上报心跳信息;
所述HA-monitor还监控虚拟机的状态;
所述DVS Agent通过DVS网络与DVS控制和通信。
上述的HA-monitor监控虚拟机的状态,监控虚拟机的各种事件,并根据HA策略对虚拟机进行重启或者告警操作,出现虚拟机进程异常退出或虚拟机内核panic事件时进行虚拟机重启。
上述的HA控制器通过DVS控制器的接口检查计算节点DVS Agent的状态。
上述的存储节点提供api供HA控制器查询存储的心跳信息。
上述的设备中,不同的存储类型设有不同的存储监控器,且针对ocfs2集群文件***和ceph设置存储监控器,并在每个计算节点设置Storage-agent;
Storage-agent发送存储事件的同时也定时发送心跳给Storage-monitor;
HA控制器通过Storage-monitor获取主机状态信息。
一种虚拟化集群高可用性的实现方法,包括:
步骤1:计算节点进行主机心跳信息的上报、存储以及虚拟机的监控;
步骤2:存储节点通过存储网络接收存储心跳;
步骤3:HA控制器进行主机心跳的监听,并对超时主机进行决策及进一步控制;
步骤4:DVS控制器控制每个计算节点的虚拟机交换机,进行网络策略的管理配置。
上述的步骤3中,如果连续3个周期没有收到主机心跳,HA控制器执行如下处理策略:
步骤3-1:主动连接libvirt进行查询虚机状态,如果连接成功并且虚机状态正确,则告警显示HA-monitor异常,否则说明主机异常,则进入步骤3-2;
步骤3-2:通过存储网络查询存储心跳:
如果存储心跳正常,说明管理网络异常,则进行告警处理,否则进入步骤3-3;
步骤3-3:通过BMC接口查询电源状态:
若电源状态正常,则关闭主机并启动虚拟机迁移流程,否则告警并关闭主机执行虚拟机迁移流程。
上述的步骤3-2中,对于无法存储心跳的存储类型,通过DVS控制器查询主机状态,DVS Agent如果正常,则判定为管理网络的问题,进行告警处理;
对于不支持存储网络及DVS控制器的情况,对整个集群的管理网络进行检查,如果集群中超过一定阈值的主机管理网络都出现故障,则判定为管理网络的问题,进行告警处理。
本发明具有以下有益效果:
本发明的服务器虚拟化高可用功能主要包括虚拟机HA和主机HA。虚拟机在遇到异常关闭的时候能够通过监控软件自动重新启动。主机出现异常无响应,能通过IPMI接口隔离主机,并自动迁移运行于该主机上的虚拟机。能够有效防止集群中多个虚拟机访问同一存储的脑裂现象。主机或者虚拟机发生故障时,能够快速响应。虚拟机故障检测时间在1秒内。主机故障检测时间可以根据需要调整,默认为3个心跳周期,每个周期为5秒,即故障检测时间为15秒。
1.集中式的心跳检测。和集群采用的分布式心跳机制不同,此方式简单,策略单一,便于进行集中维护管理。
2.故障恢复检测流程,采用多种机制防误报,避免单一的策略失败,极大的防止错误的发生,并能有效防止脑裂现象。
3.本发明不依赖第三方软件,作为虚拟化管理软件的一部分,完全自主可控。
4.采用集中式控制,各个节点***开销很小,并可以任意扩展。不依赖于组播机制,对集群的规模没有限制,针对小规模集群比现有技术更有优势,大规模集群能达到商用虚拟化软件的实现效果。
5.集群管理部署灵活,支持集群动态管理并支持任意节点的集群。
附图说明
图1为本发明设备构成图;
图2为本发明设备中主要组件工作流程图;
图3为本发明DVS实现原理图;
图4为本发明设备整体工作流程。
具体实施方式
以下结合附图对本发明的实施例作进一步详细描述。
参见图1,一种虚拟化集群高可用性的实现设备,包括:HA控制器、存储节点、DVS控制器和若干计算节点;
所述HA控制器,用于主机心跳的监听,并对超时主机进行决策及进一步控制;
所述计算节点,用于主机心跳信息的上报、存储以及虚拟机的监控;
所述存储节点,用于通过存储网络接收存储心跳;
所述DVS控制器,用于控制每个计算节点的虚拟机交换机,进行网络策略的管理配置。
实施例中,所述HA控制器是一个集中式的控制器,负责收集主机心跳,并对超时主机进行决策及进一步控制;其高可用性由服务端来保证,此文不做讨论。
所述HA控制器使用UDP端口监听,对于心跳报文进行计时及对错误报文进行告警处理,以保证服务及传输的效率。
实施例中,每个计算节点上部署HA-monitor、存储Agent和DVS Agent;
所述HA-monitor定时向HA控制器上报心跳信息;
所述HA-monitor还监控虚拟机的状态;
所述DVS Agent通过DVS网络与DVS控制和通信。
实施例中,所述HA-monitor监控虚拟机的状态,监控虚拟机的各种事件,并根据HA策略对虚拟机进行重启或者告警操作,出现以下两种事件时进行虚拟机重启:
(1)虚拟机进程异常退出,即qemu进程由于各种愿意异常退出,此时虚拟机也处于异常关闭状态。
(2)虚拟机内核panic。依赖于虚拟机内部的pvpanic驱动。目前大多数***已经实现。
实施例中,所述HA控制器通过DVS控制器的接口检查计算节点DVS Agent的状态。
所述存储节点提供api(Application Programming Interface,应用程序接口)供HA控制器查询存储的心跳信息。
实施例中,所述设备中,不同的存储类型设有不同的存储监控器,且针对ocfs2集群文件***和ceph设置存储监控器,并在每个计算节点设置Storage-agent(对应于存储Agent);
Storage-agent发送存储事件的同时也定时发送心跳给Storage-monitor(存储节点);
HA控制器通过Storage-monitor获取主机状态信息。
一种虚拟化集群高可用性的实现方法,包括:
步骤1:计算节点进行主机心跳信息的上报、存储以及虚拟机的监控;
步骤2:存储节点通过存储网络接收存储心跳HeartBeat;
步骤3:HA控制器进行主机心跳的监听,并对超时主机进行决策及进一步控制;
步骤4:DVS控制器控制每个计算节点的虚拟机交换机,进行网络策略的管理配置。
实施例中,主要组件工作流程如图2所示。
HA-monitor:
HA-monitor也负责监控虚拟机的状态。可以监控虚拟机的各种事件。根据HA策略可以对虚拟机进行重启或者告警操作。以下两种事件需要进行虚拟机重启:
1.虚拟机进程异常退出。即qemu进程由于各种愿意异常退出,此时虚拟机也处于异常 关闭状态。
2.虚拟机内核panic。依赖于虚拟机内部的pvpanic驱动。目前大多数***已经实现。
Storage-monitor:
不同的存储类型有不同的存储监控器,本发明中针对ocfs2集群文件***和ceph实现了存储监控器,并在每个计算节点实现Storage-agent。Storage-agent发送存储事件的同时也定时发送心跳给Storage-monitor。HA控制器就可以通过Storage-monitor获取主机状态信息。对于存储类型不支持Storage-monitor可以通过DVS等其他方式来处理。
DVS:
DVS是分布式虚拟交换机的一种实现,主要包括DVS控制器,DVS Agent,OVS等组件。
DVS Agent发送网络事件的同时也定时发送心跳给DVS控制器。HA控制器就可以通过DVS控制器获取主机的状态信息。
DVS主要实现如图3所示。
实施例中,所述步骤3中,如果连续3个周期没有收到主机心跳,HA控制器(HA-controller)执行如下处理策略:
步骤3-1:主动连接libvirt进行查询虚机状态,如果连接成功并且虚机状态正确,则告警显示HA-monitor异常,否则说明主机异常,则进入步骤3-2;
步骤3-2:通过存储网络查询存储心跳:
如果存储心跳正常,说明管理网络异常,则进行告警处理,否则进入步骤3-3;
实施例中,所述步骤3-2中,对于无法存储心跳的存储类型,通过DVS控制器查询主机状态,DVS Agent如果正常,则判定为管理网络的问题,进行告警处理。
对于不支持存储网络及DVS控制器的情况,可对整个集群的管理网络进行检查。如果集群中超过一定阈值的主机管理网都出现故障,则可判定是管理网的问题。
步骤3-3:通过BMC接口查询电源状态:
若电源状态正常,则关闭主机并启动虚拟机迁移流程,否则告警并关闭主机执行虚拟机迁移流程。
步骤3-1至步骤3-3形成如下防误报策略:
1.首先连接主机libvirt进行第一步检测,初步判断是不是管理网络的问题。
当HA-controller发现主机心跳超时并且不能主动连接主机,这时有两种可能:
一种是管理网络出现故障,主机及虚拟机工作正常;
一种是主机宕机。
以下几种机制使用一种或者结合起来进一步判断是否主机真的出现异常。
2.存储心跳网络检查。在采用网络存储的时候,存储网络部署心跳检查机制,判断异常主机是否有对网络存储的访问。如果此心跳网络正常,可以判断主机工作正常。
DVS控制器检查。对于不支持存储心跳的存储类型,通过DVS控制器来检查主机状态。DVS网络独立于其他网络,用于控制主机上的DVS Agent。如果主机上的DVS Agent正常,也可以确定主机工作正常。
3.集群网络检查。网络问题一般会影响很多主机,如果集群里的大多数主机都出现问题,则可判断是网络问题,这种情况只需要告警处理。集群中主机故障设置阈值,只有在没有超过故障阈值时,才进高可用关机迁移操作。
4.BMC网络检查。通过BMC网络可以进一步判断主机故障类型。是否有硬件发生故障。并通过IPMI和BMC通信进行关闭主机电源,只有在主机断电后才能进行虚拟机迁移操作。这样就防止了多个虚拟机使用相同存储的脑裂现象。
***整体工作流程如图4所示。
缩略语和关键术语定义
HA高可用型(High Availability),消除单点故障并将故障自动恢复(服务自动迁移到一个正常的节点),提供服务可持续的服务。
BMC是一个独立于服务器***的小型操作***,作用是方便服务器远程管理、监控、安装、重启等操作。BMC接通电源即启动运行,由于独立于业务程序不受影响,避免了因死机或者重新安装***而进入机房。
IPMI是智能型平台管理接口(Intelligent Platform Management Interface)的缩写,是管理基于Intel结构的企业***中所使用的***设备采用的一种工业标准,该标准由英特尔、惠普、NEC、美国戴尔电脑和SuperMicro等公司制定。用户可以利用IPMI协议连接服务器BMC,监视服务器的物理健康特征,如温度、电压、风扇工作状态、电源状态等。
Fencing,对故障节点进行排除的机制,可以控制电源关闭不可用节点。
Libvirt,是用于管理虚拟化平台的开源的API,后台程序和管理工具。
QEMU,是一款用来完成硬件虚拟化及虚拟机托管的开源软件。
DVS,Distributed Virtual Switch分布式虚拟交换机。
以上仅是本发明的优选实施方式,本发明的保护范围并不仅局限于上述实施例,凡属于本发明思路下的技术方案均属于本发明的保护范围。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理前提下的若干改进和润饰,应视为本发明的保护范围。

Claims (10)

  1. 一种虚拟化集群高可用性的实现设备,其特征在于,包括:HA控制器、存储节点、DVS控制器和若干计算节点;
    所述HA控制器,用于主机心跳的监听,并对超时主机进行决策及进一步控制;
    所述计算节点,用于主机心跳信息的上报、存储以及虚拟机的监控;
    所述存储节点,用于通过存储网络接收存储心跳;
    所述DVS控制器,用于控制每个计算节点的虚拟机交换机,进行网络策略的管理配置。
  2. 根据权利要求1所述的一种虚拟化集群高可用性的实现设备,其特征在于,所述HA控制器使用UDP端口监听,对于心跳报文进行计时及对错误报文进行告警处理,以保证服务及传输的效率。
  3. 根据权利要求1所述的一种虚拟化集群高可用性的实现设备,其特征在于,每个计算节点上部署HA-monitor、存储Agent和DVS Agent;
    所述HA-monitor定时向HA控制器上报心跳信息;
    所述HA-monitor还监控虚拟机的状态;
    所述DVS Agent通过DVS网络与DVS控制和通信。
  4. 根据权利要求1所述的一种虚拟化集群高可用性的实现设备,其特征在于,所述HA-monitor监控虚拟机的状态,监控虚拟机的各种事件,并根据HA策略对虚拟机进行重启或者告警操作,出现虚拟机进程异常退出或虚拟机内核panic事件时进行虚拟机重启。
  5. 根据权利要求1所述的一种虚拟化集群高可用性的实现设备,其特征在于,所述HA控制器通过DVS控制器的接口检查计算节点DVS Agent的状态。
  6. 根据权利要求1所述的一种虚拟化集群高可用性的实现设备,其特征在于,所述存储节点提供api供HA控制器查询存储的心跳信息。
  7. 根据权利要求1所述的一种虚拟化集群高可用性的实现设备,其特征在于,所述设备中,不同的存储类型设有不同的存储监控器,且针对ocfs2集群文件***和ceph设置存储监控器,并在每个计算节点设置Storage-agent;
    Storage-agent发送存储事件的同时也定时发送心跳给Storage-monitor;
    HA控制器通过Storage-monitor获取主机状态信息。
  8. 根据权利要求1-7任一所述的一种虚拟化集群高可用性的实现设备的虚拟化集群高可用性的实现方法,其特征在于,包括:
    步骤1:计算节点进行主机心跳信息的上报、存储以及虚拟机的监控;
    步骤2:存储节点通过存储网络接收存储心跳;
    步骤3:HA控制器进行主机心跳的监听,并对超时主机进行决策及进一步控制;
    步骤4:DVS控制器控制每个计算节点的虚拟机交换机,进行网络策略的管理配置。
  9. 根据权利要求8所述的一种虚拟化集群高可用性的实现方法,其特征在于,所述步骤3中,如果连续3个周期没有收到主机心跳,HA控制器执行如下处理策略:
    步骤3-1:主动连接libvirt进行查询虚机状态,如果连接成功并且虚机状态正确,则告警显示HA-monitor异常,否则说明主机异常,则进入步骤3-2;
    步骤3-2:通过存储网络查询存储心跳:
    如果存储心跳正常,说明管理网络异常,则进行告警处理,否则进入步骤3-3;
    步骤3-3:通过BMC接口查询电源状态:
    若电源状态正常,则关闭主机并启动虚拟机迁移流程,否则告警并关闭主机执行虚拟机迁移流程。
  10. 根据权利要求9所述的一种虚拟化集群高可用性的实现方法,其特征在于,所述步骤3-2中,对于无法存储心跳的存储类型,通过DVS控制器查询主机状态,DVS Agent如果正常,则判定为管理网络的问题,进行告警处理;
    对于不支持存储网络及DVS控制器的情况,对整个集群的管理网络进行检查,如果集群中超过一定阈值的主机管理网络都出现故障,则判定为管理网络的问题,进行告警处理。
PCT/CN2021/139934 2021-11-26 2021-12-21 一种虚拟化集群高可用性的实现方法和设备 WO2023092772A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111418707.0 2021-11-26
CN202111418707.0A CN114090184B (zh) 2021-11-26 2021-11-26 一种虚拟化集群高可用性的实现方法和设备

Publications (1)

Publication Number Publication Date
WO2023092772A1 true WO2023092772A1 (zh) 2023-06-01

Family

ID=80304829

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/139934 WO2023092772A1 (zh) 2021-11-26 2021-12-21 一种虚拟化集群高可用性的实现方法和设备

Country Status (2)

Country Link
CN (1) CN114090184B (zh)
WO (1) WO2023092772A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115190040B (zh) * 2022-05-23 2023-09-29 浪潮通信技术有限公司 虚拟机高可用的实现方法及装置
CN114880080B (zh) * 2022-07-11 2022-09-20 国网信息通信产业集团有限公司 一种虚拟机高可用方法及计算集群
CN116382850B (zh) * 2023-04-10 2023-11-07 北京志凌海纳科技有限公司 一种利用多存储心跳检测的虚拟机高可用管理装置及***
CN118138588A (zh) * 2024-05-08 2024-06-04 北京城建智控科技股份有限公司 云主机高可用***和云平台

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170293537A1 (en) * 2014-10-06 2017-10-12 Nec Corporation Management system for virtual machine failure detection and recovery
CN107544839A (zh) * 2016-06-27 2018-01-05 腾讯科技(深圳)有限公司 虚拟机迁移***、方法及装置
CN109614201A (zh) * 2018-12-04 2019-04-12 武汉烽火信息集成技术有限公司 防脑裂的OpenStack虚拟机高可用***
CN112069032A (zh) * 2020-09-11 2020-12-11 杭州安恒信息技术股份有限公司 一种虚拟机的可用性检测方法、***及相关装置
CN112994977A (zh) * 2021-02-24 2021-06-18 紫光云技术有限公司 一种服务器主机高可用的方法
CN113778607A (zh) * 2020-06-10 2021-12-10 中兴通讯股份有限公司 虚拟机实现高可用方法及装置、云管理平台、存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8412672B1 (en) * 2010-09-08 2013-04-02 Netapp, Inc. High availability network storage system incorporating non-shared storage suitable for use with virtual storage servers
CN105095001B (zh) * 2014-05-08 2018-01-30 ***股份有限公司 分布式环境下虚拟机异常恢复方法
CN104394011A (zh) * 2014-11-11 2015-03-04 浪潮电子信息产业股份有限公司 一种通过告警信息支持服务器虚拟化运维的方法
CN107491344B (zh) * 2017-09-26 2020-09-01 北京思特奇信息技术股份有限公司 一种实现虚拟机高可用性的方法及装置
CN109634716B (zh) * 2018-12-04 2021-02-09 武汉烽火信息集成技术有限公司 防脑裂的OpenStack虚拟机高可用管理端装置及管理方法
CN111953566B (zh) * 2020-08-13 2022-03-11 北京中电兴发科技有限公司 一种基于分布式故障监控的方法和虚拟机高可用***
CN113608836A (zh) * 2021-08-06 2021-11-05 上海英方软件股份有限公司 一种基于集群的虚拟机高可用方法及***

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170293537A1 (en) * 2014-10-06 2017-10-12 Nec Corporation Management system for virtual machine failure detection and recovery
CN107544839A (zh) * 2016-06-27 2018-01-05 腾讯科技(深圳)有限公司 虚拟机迁移***、方法及装置
CN109614201A (zh) * 2018-12-04 2019-04-12 武汉烽火信息集成技术有限公司 防脑裂的OpenStack虚拟机高可用***
CN113778607A (zh) * 2020-06-10 2021-12-10 中兴通讯股份有限公司 虚拟机实现高可用方法及装置、云管理平台、存储介质
CN112069032A (zh) * 2020-09-11 2020-12-11 杭州安恒信息技术股份有限公司 一种虚拟机的可用性检测方法、***及相关装置
CN112994977A (zh) * 2021-02-24 2021-06-18 紫光云技术有限公司 一种服务器主机高可用的方法

Also Published As

Publication number Publication date
CN114090184B (zh) 2022-11-29
CN114090184A (zh) 2022-02-25

Similar Documents

Publication Publication Date Title
WO2023092772A1 (zh) 一种虚拟化集群高可用性的实现方法和设备
TWI746512B (zh) 實體機器故障分類處理方法、裝置和虛擬機器恢復方法、系統
JP4345334B2 (ja) 耐障害計算機システム、プログラム並列実行方法およびプログラム
US8117495B2 (en) Systems and methods of high availability cluster environment failover protection
US7036035B2 (en) System and method for power management in a computer system having multiple power grids
CN107147540A (zh) 高可用性***中的故障处理方法和故障处理集群
WO2015169199A1 (zh) 分布式环境下虚拟机异常恢复方法
CN109656742B (zh) 一种节点异常处理方法、装置及存储介质
WO2016058307A1 (zh) 资源的故障处理方法及装置
CN105302661A (zh) 一种实现虚拟化管理平台高可用的***和方法
CN103152414A (zh) 一种基于云计算的高可用***及其实现方法
US10317985B2 (en) Shutdown of computing devices
US20150019671A1 (en) Information processing system, trouble detecting method, and information processing apparatus
JP5285045B2 (ja) 仮想環境における故障復旧方法及びサーバ及びプログラム
JP6124644B2 (ja) 情報処理装置および情報処理システム
US7437445B1 (en) System and methods for host naming in a managed information environment
JP2004355446A (ja) クラスタシステム及びその制御方法
JP5285044B2 (ja) クラスタシステム復旧方法及びサーバ及びプログラム
CN110677288A (zh) 一种通用于多场景部署的边缘计算***及方法
CN106528276A (zh) 一种基于任务调度的故障处理方法
JP6828558B2 (ja) 管理装置、管理方法及び管理プログラム
WO2016068973A1 (en) Analysis for multi-node computing systems
CN113760459A (zh) 虚拟机故障检测方法、存储介质和虚拟化集群
Lee et al. NCU-HA: A lightweight HA system for kernel-based virtual machine
CN113626147A (zh) 基于虚拟化技术的海洋平台计算机控制方法及***

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21965491

Country of ref document: EP

Kind code of ref document: A1