WO2021228103A1

WO2021228103A1 - Load balancing method and apparatus for cloud host cluster, and server

Info

Publication number: WO2021228103A1
Application number: PCT/CN2021/093112
Authority: WO
Inventors: 程相群
Original assignee: 北京金山云网络技术有限公司
Priority date: 2020-05-15
Filing date: 2021-05-11
Publication date: 2021-11-18
Also published as: CN111614746B; CN111614746A

Abstract

Provided are a load balancing method and apparatus for a cloud host cluster, and a server. The method comprises: according to a resource usage priority, determining a target cloud host cluster from cloud host clusters; on the basis of first configuration information and second configuration information, determining, from physical hosts, a target node which corresponds to each cloud host in the target cloud host cluster, so as to provide a target resource amount of hardware resources for the target cloud host cluster by means of target nodes; monitoring the actual usage amount of each hardware resource in the target cloud host cluster; and adjusting the target nodes of the target cloud host cluster according to the actual usage amount, so as to perform load balancing on the target cloud host cluster. By means of the present application, the load of a cloud host cluster can be balanced more rationally, so that the problems of resource pre-emption and resource wastage can be effectively alleviated.

Description

云主机集群的负载均衡方法、装置及服务器Load balancing method, device and server of cloud host cluster

本申请要求于2020年5月15日提交中国专利局、申请号为202010417658.8发明名称为“云主机集群的负载均衡方法、装置及服务器”的中国专利申请的优先权，其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on May 15, 2020 with the application number 202010417658.8 and the invention titled "Load balancing method, device and server for cloud host clusters", the entire content of which is incorporated by reference In this application.

技术领域Technical field

本申请涉及互联网技术领域，尤其是涉及一种云主机集群的负载均衡方法、装置及服务器。This application relates to the field of Internet technology, and in particular to a load balancing method, device and server for a cloud host cluster.

背景技术Background technique

云主机是云计算在基础设施应用上的重要组成部分，物理服务器可以为云主机提供CPU(Central Processing Unit/Processor，中央处理器)或内存等硬件资源，通过对物理服务器提供的硬件资源进行配置，可以组成多个相同或不同的云主机，为提高物理服务器的CPU或内存资源使用效率，需要对硬件资源的配置进行优化。目前通常采用开源社区提供的numad服务程序对云主机进行负载均衡，以通告物理服务器的资源使用效率，然而经发明人研究发现，采用这种方式对云主机进行负载均衡时存在合理性较差的问题，容易造成资源抢占和资源浪费的情况。The cloud host is an important part of the infrastructure application of cloud computing. The physical server can provide the cloud host with hardware resources such as CPU (Central Processing Unit/Processor) or memory, and configure the hardware resources provided by the physical server. , Can form multiple same or different cloud hosts, in order to improve the efficiency of the physical server CPU or memory resources, the configuration of hardware resources needs to be optimized. At present, the numad service program provided by the open source community is usually used to load balance the cloud host to notify the resource usage efficiency of the physical server. However, the inventor found that the load balance of the cloud host using this method is less reasonable. Problems, it is easy to cause resource preemption and resource waste.

发明内容Summary of the invention

有鉴于此，本申请的目的在于提供一种云主机集群的负载均衡方法、装置及服务器，可以更为合理地对云主机集群的负载进行均衡，可以有效缓解资源抢占和资源浪费的问题。In view of this, the purpose of this application is to provide a load balancing method, device, and server for a cloud host cluster, which can balance the load of the cloud host cluster more reasonably, and can effectively alleviate the problems of resource preemption and resource waste.

第一方面，本申请实施例提供了一种云主机集群的负载均衡方法，应用于控制服务器，所述控制服务器存储有多个物理主机的第一配置信息、多个云主机集群的第二配置信息和各个所述云主机集群的资源使用优先级，每个所述物理主机均包括多个硬件资源，所述云主机集群包括基于所述物理主机提供所述硬件资源搭建的多个云主机，所述第一配置信息包括各个所述物理主机中每个所述硬件资源的当前剩余资源量，所述第二配置信息包括所述目标云主机集群中每个云主机分别所需的硬件资源的目标资源占比和目标资源量，所述方法包括：按照所述资源使用优先级从各个所述云主机集群中确定目标云主机集群；基于所述第一配置信息和所述第二配置信息，从各个所述物理主机中确定所述目标云主机集群中每个所述云主机对应的物理主机，作为目标节点，以通过所述目标节点为所述目标云主机集群提供所述目标资源量的硬件资源；监听所述目标云主机集群中每个所述硬件资源的实际使用量；根据所述实际使用量调整所述目标云主机集群的所述目标节点，以对所述目标云主机集群进行负载均衡。In the first aspect, an embodiment of the present application provides a load balancing method for a cloud host cluster, which is applied to a control server, and the control server stores first configuration information of multiple physical hosts and second configuration of multiple cloud host clusters. Information and the resource usage priority of each of the cloud host clusters, each of the physical hosts includes multiple hardware resources, and the cloud host cluster includes multiple cloud hosts built based on the physical hosts providing the hardware resources, The first configuration information includes the current remaining resource amount of each of the hardware resources in each of the physical hosts, and the second configuration information includes the amount of hardware resources required by each cloud host in the target cloud host cluster. The target resource percentage and the target resource amount, the method includes: determining a target cloud host cluster from each of the cloud host clusters according to the resource use priority; based on the first configuration information and the second configuration information, The physical host corresponding to each cloud host in the target cloud host cluster is determined from each of the physical hosts as a target node, so that the target cloud host cluster is provided with the target amount of resources through the target node Hardware resources; monitor the actual usage of each of the hardware resources in the target cloud host cluster; adjust the target node of the target cloud host cluster according to the actual usage to perform Load balancing.

在一种实施方式中，所述基于所述第一配置信息和所述第二配置信息，从各个所述物理主机中确定所述目标云主机集群中每个所述云主机对应的物理主机，作为目标节点的步骤，包括：根据所述第二配置信息中每个所述硬件资源的目标资源占比的大小，确定所述目标云主机集群的主资源；根据第二配置信息，从各个所述物理主机中选取包含有所述主资源的物理主机；根据选取的物理主机中所述主资源的当前剩余资源量的大小，确定所述目标云主机集群中每个所述云主机对应的目标节点。In an embodiment, the determining a physical host corresponding to each cloud host in the target cloud host cluster from each of the physical hosts based on the first configuration information and the second configuration information, The step of serving as a target node includes: determining the main resource of the target cloud host cluster according to the size of the target resource proportion of each of the hardware resources in the second configuration information; The physical host containing the main resource is selected from the physical hosts; the target corresponding to each cloud host in the target cloud host cluster is determined according to the current remaining resource amount of the main resource in the selected physical host node.

在一种实施方式中，所述根据选取的物理主机中所述主资源的当前剩余资源量的大小，确定所述目标云主机集群中每个所述云主机对应的目标节点的步骤，包括：从所述目标云主机集群中随机选取云主机；将选取的物理主机中所述主资源的当前剩余资源量最大的物理主机，确定为选取的云主机对应的目标节点；计算当前剩余资源量与选取的云主机中所述主资源的目标资源量之间的差值，其中，所述当前剩余资源量为选取的云主机对应的目标节点中所述主资源的当前剩余的资源量；基于所述差值更新选取的云主机对应的目标节点中所述主资源的当前剩余资源量；从所述目标云主机集群的其余云主机中随机选取下一云主机，将选取的物理主机中所述主资源的当前剩余资源量最大的物理主机，确定为选取的下一云主机对应的目标节点，直至确定所述目标云主机集群中每个所述云主机对应的目标节点。In one embodiment, the step of determining the target node corresponding to each cloud host in the target cloud host cluster according to the current remaining resource amount of the main resource in the selected physical host includes: Randomly select a cloud host from the target cloud host cluster; determine the physical host with the largest current remaining resource amount of the main resource among the selected physical hosts as the target node corresponding to the selected cloud host; calculate the current remaining resource amount and The difference between the target resource amount of the main resource in the selected cloud host, where the current remaining resource amount is the current remaining resource amount of the main resource in the target node corresponding to the selected cloud host; The difference value updates the current remaining resource amount of the main resource in the target node corresponding to the selected cloud host; the next cloud host is randomly selected from the remaining cloud hosts in the target cloud host cluster, and the selected physical host is The physical host with the largest amount of current remaining resources of the primary resource is determined as the target node corresponding to the selected next cloud host until the target node corresponding to each cloud host in the target cloud host cluster is determined.

在一种实施方式中，所述根据所述实际使用量调整所述目标云主机集群的所述目标节点，以对所述目标云主机集群进行负载均衡的步骤，包括：针对每个目标节点，根据该目标节点对应的多个云主机中每个所述硬件资源的实际使用量和每个所述硬件资源的当前剩余资源量，计算该目标节点中每个所述硬件资源的资源使用率；如果该目标节点中所述硬件资源的资源使用率大于或等于所述硬件资源对应的预设阈值，根据每个所述物理主机中所述硬件资源的当前剩余资源量的大小，确定该目标节点对应的每个所述云主机待迁移的目标节点；将该目标节点对应的每个云主机迁移至该云主机待迁移的目标节点，以对所述目标云主机集群进行负载均衡。In one embodiment, the step of adjusting the target node of the target cloud host cluster according to the actual usage to load balance the target cloud host cluster includes: for each target node, Calculate the resource usage rate of each hardware resource in the target node according to the actual usage amount of each hardware resource in the multiple cloud hosts corresponding to the target node and the current remaining resource amount of each hardware resource; If the resource usage rate of the hardware resource in the target node is greater than or equal to the preset threshold corresponding to the hardware resource, determine the target node according to the current remaining resource amount of the hardware resource in each physical host Corresponding to the target node of each cloud host to be migrated; migrate each cloud host corresponding to the target node to the target node of the cloud host to be migrated, so as to perform load balancing on the target cloud host cluster.

在一种实施方式中，所述根据该目标节点对应的多个云主机中每个所述硬件资源的实际使用量和每个所述硬件资源的当前剩余资源量，计算该目标节点中每个所述硬件资源的资源使用率的步骤，包括：计算该目标节点对应的多个云主机中每个所述硬件资源的实际使用量和该目标节点中每个所述硬件资源的标定资源量的比值，得到各个所述云主机中各个所述硬件资源的临时资源占比；针对每个云主机，对该云主机中每个所述硬件资源的临时资源占比进行归一化处理，得到该云主机每个所述硬件资源的实际资源占比；根据该目标节点对应的各个所述云主机中每个所述硬件资源的实际资源占比和目标资源量，计算该目标节点中每个所述硬件资源的资源使用率。In an embodiment, the calculation of each of the target nodes is based on the actual usage of each of the hardware resources in the multiple cloud hosts corresponding to the target node and the current remaining resource amount of each of the hardware resources. The step of the resource usage rate of the hardware resource includes: calculating the actual usage amount of each of the hardware resources in the plurality of cloud hosts corresponding to the target node and the calibration resource amount of each of the hardware resources in the target node Ratio to obtain the proportion of temporary resources of each of the hardware resources in each of the cloud hosts; for each cloud host, normalize the proportion of temporary resources of each of the hardware resources in the cloud host to obtain the The actual resource proportion of each of the hardware resources of the cloud host; according to the actual resource proportion and the target resource amount of each of the hardware resources in each of the cloud hosts corresponding to the target node, calculate each of the target nodes The resource usage rate of the hardware resource.

在一种实施方式中，所述根据该目标节点对应的多个云主机中每个所述硬件资源的实际使用量和每个所述硬件资源的当前剩余资源量，计算该目标节点中每个所述硬件资源的资源使用率的步骤，包括：In an embodiment, the calculation of each of the target nodes is based on the actual usage of each of the hardware resources in the multiple cloud hosts corresponding to the target node and the current remaining resource amount of each of the hardware resources. The step of the resource usage rate of the hardware resource includes:

分别计算该目标节点对应的每个云主机中每个所述硬件资源的实际使用量和该目标节点中每个所述硬件资源的标定资源量的比值，得到各个所述云主机中每个所述硬件资源的临时资源占比；Calculate the ratio of the actual usage of each hardware resource in each cloud host corresponding to the target node to the calibrated resource amount of each hardware resource in the target node, and obtain each cloud host The proportion of temporary resources of the hardware resources;

针对该目标节点的每个所述硬件资源，对每个云主机中该硬件资源的临时占比进行归一化处理，得到该云主机中该硬件资源的实际资源占比；For each hardware resource of the target node, normalize the temporary proportion of the hardware resource in each cloud host to obtain the actual resource proportion of the hardware resource in the cloud host;

根据该目标节点对应的各个所述云主机中每个所述硬件资源的实际资源占比和所属硬件资源的标定资源量，计算该目标节点中每个所述硬件资源的资源使用率。Calculate the resource usage rate of each hardware resource in the target node according to the actual resource proportion of each hardware resource in each cloud host corresponding to the target node and the calibrated resource amount of the hardware resource to which it belongs.

在一种实施方式中，所述方法还包括：如果所述云主机对应多个目标节点，基于所述云主机中每个所述硬件资源的目标资源量，计算将所述云主机迁移至各个所述物理主机后，各个所述物理主机中每个所述硬件资源的资源，作为预估资源占比；针对每个硬件资源，如果所述物理主机中该硬件资源的预估资源占比小于该硬件资源对应的预设阈值，将所述云主机迁移至所述物理主机。In one embodiment, the method further includes: if the cloud host corresponds to multiple target nodes, calculating the migration of the cloud host to each target resource amount based on the target resource amount of each hardware resource in the cloud host After the physical host, the resource of each hardware resource in each physical host is used as the estimated resource proportion; for each hardware resource, if the estimated resource proportion of the hardware resource in the physical host is less than The preset threshold corresponding to the hardware resource migrates the cloud host to the physical host.

在一种实施方式中，所述硬件资源包括CPU资源、内存资源、FPU资源、FPGA资源、GPU资源和网络资源中的一种或多种。In an embodiment, the hardware resources include one or more of CPU resources, memory resources, FPU resources, FPGA resources, GPU resources, and network resources.

第二方面，本申请实施例还提供一种云主机集群的负载均衡装置，应用于控制服务器，所述控制服务器存储有多个物理主机的第一配置信息、多个云主机集群的第二配置信息和各个所述云主机集群的资源使用优先级，每个所述物理主机均包括多个硬件资源，所述云主机集群包括基于所述物理主机提供所述硬件资源搭建的多个云主机，所述第一配置信息包括各个所述物理主机中每个所述硬件资源的当前剩余资源量，所述第二配置信息包括所述目标云主机集群中每个云主机分别所需的硬件资源的目标资源占比和目标资源量，所述装置包括：集群确定模块，设置为按照所述资源使用优先级从各个所述云主机集群中确定目标云主机集群；节点确定模块，设置为基于所述第一配置信息和所述第二配置信息，从各个所述物理主机中确定所述目标云主机集群中每个所述云主机对应的物理主机，作为目标节点，以通过所述目标节点为所述目标云主机集群提供所述目标资源量的硬件资源；使用量监听模块，设置为监听所述目标云主机集群中每个所述硬件资源的实际使用量；节点调整模块，设置为根据所述实际使用量调整所述目标云主机集群的所述目标节点，以对所述目标云主机集群进行负载均衡。In the second aspect, an embodiment of the present application also provides a load balancing device for a cloud host cluster, which is applied to a control server, and the control server stores first configuration information of multiple physical hosts and second configuration of multiple cloud host clusters. Information and the resource usage priority of each of the cloud host clusters, each of the physical hosts includes multiple hardware resources, and the cloud host cluster includes multiple cloud hosts built based on the physical hosts providing the hardware resources, The first configuration information includes the current remaining resource amount of each of the hardware resources in each of the physical hosts, and the second configuration information includes the amount of hardware resources required by each cloud host in the target cloud host cluster. The target resource percentage and the target resource amount, the device includes: a cluster determination module configured to determine a target cloud host cluster from each cloud host cluster according to the resource use priority; a node determination module configured to be based on the The first configuration information and the second configuration information determine the physical host corresponding to each cloud host in the target cloud host cluster from each of the physical hosts, as the target node, and use the target node as the target node. The target cloud host cluster provides hardware resources of the target resource amount; a usage monitoring module is configured to monitor the actual usage of each hardware resource in the target cloud host cluster; a node adjustment module is configured to monitor the actual usage of each hardware resource in the target cloud host cluster; The actual usage adjusts the target node of the target cloud host cluster to perform load balancing on the target cloud host cluster.

第三方面，本申请实施例还提供一种服务器，包括处理器和存储器；所述存储器上存储有计算机程序，所述计算机程序在被所述处理器运行时执行如第一方面提供的任一项所述的方法。In a third aspect, an embodiment of the present application further provides a server, including a processor and a memory; the memory is stored with a computer program, and the computer program executes any one provided in the first aspect when the computer program is run by the processor. The method described in the item.

第四方面，本申请实施例还提供一种计算机存储介质，设置为储存为第一方面提供的任一项所述方法所用的计算机软件指令。In a fourth aspect, an embodiment of the present application also provides a computer storage medium configured to store computer software instructions used in any of the methods provided in the first aspect.

本申请实施例提供的云主机集群的负载均衡方法、装置及服务器，应用于控制服务器，控制服务器存储有多个物理主机的第一配置信息(包括各个物理主机中各个硬件资源的当前剩余资源量)、多个云主机集群的第二配置信息(包括目标云主机集群中每个云主机分别所需的硬件资源的目标资源占比和目标资源量)和各个云主机集群的资源使用优先级，每个物理主机均包括多个硬件资源，云主机集群包括基于物理主机提供硬件资源搭建的多个云主机，该方法首先按照资源使用优先级从各个云主机集群中确定目标云主机集群，并基于第一配置信息和第二配置信息，从各个物理主机中确定目标云主机集群中各个云主机对应的目标节点，以通过目标节点为目标云主机集群提供目标资源量的硬件资源，监听目标云主机集群中各个硬件资源的实际使用量，从而根据实际使用量调整目标云主机集群的目标节点，以对目标云主机集群进行负载均衡。上述方法基于预先存储的第一配置信息和第二配置信息确定目标云主机集群中每个云主机对应的目标节点，并基于云主机集群中每个硬件资源的实际使用量调整目标云主机集群的目标节点，从而较好地实现目标云主机集群的负载均衡，相较于相关技术中的负载均衡方法容易存在资源抢占和资源浪费的情况，本申请实施例通过基于目标云主机集群中每个硬件资源的实际使用量实时对目标节点进行调整，从而较好地利用了目标节点提供的硬件资源，有效缓解了资源抢占和资源浪费等情况，进而有效提高了云主机负载均衡的合理性。The load balancing method, device, and server of a cloud host cluster provided by the embodiments of the present application are applied to a control server, and the control server stores first configuration information of multiple physical hosts (including the current remaining resource amount of each hardware resource in each physical host) ), the second configuration information of multiple cloud host clusters (including the target resource proportion and target resource amount of the hardware resources required by each cloud host in the target cloud host cluster) and the resource usage priority of each cloud host cluster, Each physical host includes multiple hardware resources. A cloud host cluster includes multiple cloud hosts built based on physical hosts providing hardware resources. This method first determines the target cloud host cluster from each cloud host cluster according to the priority of resource use, and based on The first configuration information and the second configuration information determine the target node corresponding to each cloud host in the target cloud host cluster from each physical host, so as to provide the target cloud host cluster with hardware resources of the target resource amount through the target node, and monitor the target cloud host The actual usage of each hardware resource in the cluster, so as to adjust the target node of the target cloud host cluster according to the actual usage, so as to load balance the target cloud host cluster. The above method determines the target node corresponding to each cloud host in the target cloud host cluster based on pre-stored first configuration information and second configuration information, and adjusts the target cloud host cluster based on the actual usage of each hardware resource in the cloud host cluster Target node, so as to better realize the load balancing of the target cloud host cluster. Compared with the load balancing method in the related technology, resource preemption and resource waste are prone to occur. The actual usage of resources is adjusted in real time to the target node, thereby making better use of the hardware resources provided by the target node, effectively alleviating resource preemption and resource waste, and effectively improving the rationality of cloud host load balancing.

本申请的其他特征和优点将在随后的说明书中阐述，并且，部分地从说明书中变得显而易见，或者通过实施本申请而了解。本申请的目的和其他优点在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。Other features and advantages of the present application will be described in the following description, and partly become obvious from the description, or understood by implementing the present application. The purpose and other advantages of the application are realized and obtained by the structures specifically pointed out in the description, claims and drawings.

为使本申请的上述目的、特征和优点能更明显易懂，下文特举较佳实施例，并配合所附附图，作详细说明如下。In order to make the above-mentioned objectives, features and advantages of the present application more obvious and understandable, the preferred embodiments and accompanying drawings are described in detail as follows.

附图说明Description of the drawings

为了更清楚地说明本申请具体实施方式或相关技术中的技术方案，下面将对具体实施方式或相关技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本申请的一些实施方式，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the specific embodiments of this application or the technical solutions in related technologies, the following will briefly introduce the drawings that need to be used in the specific embodiments or related technical descriptions. Obviously, the drawings in the following description are For some of the embodiments of the present application, for those of ordinary skill in the art, other drawings may be obtained based on these drawings without creative work.

图1为本申请实施例提供的一种云主机集群的负载均衡方法的流程示意图；FIG. 1 is a schematic flowchart of a load balancing method for a cloud host cluster provided by an embodiment of this application;

图2为本申请实施例提供的另一种云主机集群的负载均衡方法的流程示意图；2 is a schematic flowchart of another load balancing method for a cloud host cluster provided by an embodiment of this application;

图3为本申请实施例提供的一种云主机集群的负载均衡装置的结构示意图；3 is a schematic structural diagram of a load balancing device for a cloud host cluster provided by an embodiment of the application;

图4为本申请实施例提供的一种服务器的结构示意图。Fig. 4 is a schematic structural diagram of a server provided by an embodiment of the application.

具体实施方式Detailed ways

为使本申请的目的、技术方案、及优点更加清楚明白，以下参照附图并举实施例，对本申请进一步详细说明。显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。本领域普通技术人员基于本申请中的实施例所获得的所有其他实施例，都属于本申请保护的范围。In order to make the purpose, technical solutions, and advantages of the present application clearer, the following further describes the present application in detail with reference to the accompanying drawings and embodiments. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in this application fall within the protection scope of this application.

目前，为提高服务器CPU和内存资源的使用效率以及优化资源配置，普遍采用开源社区提供的Nuamd服务程序进行跨numa(non uniform menory access，非同一内存访问)节点的负载均衡，在一些实施例中，通过Nuamd服务程序统计每个numa节点上的CPU使用情况、内存使用情况和kvm-qemu进程对资源的占用情况等信息，并基于上述信息调整kvm-qemu在numa节点的绑定配置，由内核配合进行numa节点的迁移，从而达到负载均衡的目的。但是采用这种方式对云主机进行负载均衡时存在合理性较差的问题，Nuamd服务程序版本较为陈旧，受CPU硬件的更新换代以及云服务器集成的资源类型逐渐增多等影响，Nuamd服务程序现已无法满足当前资源多元化、计算类型多元化和服务类型多元化等需求。在一些实施例中：(1)Nuamd服务程序仅支持两个numa节点的负载均衡，然而当前服务器numa节点普遍多于2个；(2)Nuamd服务程序仅支持CPU和MEM(Memory Device，内存)两类资源，且不易对Nuamd服务程序所支持的资源进行扩展。At present, in order to improve the efficiency of server CPU and memory resources and optimize resource allocation, the Nuamd service program provided by the open source community is generally used to perform load balancing across numa (non uniform memory access) nodes. In some embodiments , Through the Nuamd service program to count the CPU usage, memory usage and resource occupancy of the kvm-qemu process on each numa node, and adjust the binding configuration of kvm-qemu on the numa node based on the above information. Cooperate with the migration of numa nodes to achieve the purpose of load balancing. However, there is a problem of poor rationality when using this method to load balance the cloud host. The Nuamd service program version is relatively old. Due to the upgrading of CPU hardware and the gradual increase in resource types integrated with the cloud server, the Nuamd service program is now It cannot meet the current needs of diversified resources, diversified computing types, and diversified service types. In some embodiments: (1) Nuamd service program only supports load balancing of two numa nodes, but the current server numa node generally has more than two; (2) Nuamd service program only supports CPU and MEM (Memory Device, memory) Two types of resources, and it is not easy to expand the resources supported by the Nuamd service program.

为改善上述问题，本申请实施提供了一种云主机集群的负载均衡方法、装置及服务器，可以更为合理地对云主机集群的负载进行均衡，可以有效缓解资源抢占和资源浪费的问题。In order to improve the above problems, the implementation of this application provides a load balancing method, device and server for a cloud host cluster, which can balance the load of the cloud host cluster more reasonably, and can effectively alleviate the problems of resource preemption and resource waste.

可以理解的是，云主机是部署于物理主机上的，所以云主机对于物理主机而言是一种负载。因此通过合理的将云主机集群中的云主机分配至物理主机，以使得各物理主机的负载相对均衡，避免部分物理主机负载过高另一部分物理主机负载过高的情况的过程，可以视为对物理主机的一种负载均衡。下文中为描述方便，将该过程称为云主机集群的负载均衡。It is understandable that the cloud host is deployed on the physical host, so the cloud host is a load for the physical host. Therefore, the process of reasonably allocating the cloud hosts in the cloud host cluster to the physical hosts so that the load of each physical host is relatively balanced, and avoiding the situation that some physical hosts are overloaded and the other part of the physical hosts are overloaded can be regarded as correct. A load balancing of physical hosts. Hereinafter, for the convenience of description, this process is referred to as the load balancing of the cloud host cluster.

为便于对本申请实施例进行理解，首先对本申请实施例所公开的一种云主机集群的负载均衡方法进行详细介绍，该方法应用于控制服务器，控制服务器存储有多个物理主机的第一配置信息、多个云主机集群的第二配置信息和各个云主机集群的资源使用优先级，每个物理主机均包括一个或多个硬件资源，云主机集群包括基于物理主机提供硬件资源搭建的多个云主机。其中，硬件资源可以包括CPU、GPU(Graphics Processing Unit，图形处理器)、FPU(Float Point Unit，浮点运算单元)、FPGA(Field-Programmable Gate Array，现场可编程门阵列)、MEM(内存类型)或NET(网络类型)等资源中的一种或多种。参见图1所示的一种云主机集群的负载均衡方法的流程示意图，该方法主要包括以下步骤S102至步骤S110：In order to facilitate the understanding of the embodiments of the present application, a load balancing method for a cloud host cluster disclosed in the embodiments of the present application is first introduced in detail. The method is applied to a control server, and the control server stores first configuration information of multiple physical hosts. , The second configuration information of multiple cloud host clusters and the resource usage priority of each cloud host cluster. Each physical host includes one or more hardware resources. The cloud host cluster includes multiple clouds built based on physical hosts to provide hardware resources. Host. Among them, hardware resources may include CPU, GPU (Graphics Processing Unit), FPU (Float Point Unit), FPGA (Field-Programmable Gate Array), MEM (Memory Type) ) Or NET (network type) and other resources. Referring to the schematic flowchart of a load balancing method for a cloud host cluster shown in FIG. 1, the method mainly includes the following steps S102 to S110:

步骤S102，按照资源使用优先级从各个云主机集群中确定目标云主机集群。Step S102: Determine a target cloud host cluster from each cloud host cluster according to the priority of resource use.

其中，可以按照服务类型划分多个云主机集群，例如，CPU服务为主的云主机集群、内存服务为主的云主机集群、GPU运算型的云主机集群、网络宽带服务为主的云主机集群以及普通云主机集群等。在实际应用中，可以预先设置各类服务类型的云主机集群的资源使用优先级，并按照资源使用优先级由高到低的顺序依次将各个云主机集群确定为目标云主机集群。Among them, multiple cloud host clusters can be divided according to service types, for example, a cloud host cluster based on CPU services, a cloud host cluster based on memory services, a cloud host cluster based on GPU computing, and a cloud host cluster based on network broadband services. And ordinary cloud host clusters. In practical applications, the resource usage priorities of cloud host clusters of various service types can be set in advance, and each cloud host cluster is determined as the target cloud host cluster according to the order of resource usage priority from high to low.

示例性的，假设一共有三个云主机集群，分别记为云主机集群A、云主机集群B以及云主机集群C。其中，云主机集群A为CPU服务为主的云主机集群，云主机集群B为内存服务为主的云主机集群，云主机集群C为GPU运算型的云主机集群，如果在预先设置的资源使用优先级中CPU服务为主的云主机集群高于内存服务为主的云主机集群，且内存服务为主的云主机集群高于GPU运算型的云主机集群，则可以是将云主机集群A确定为目标云主机集群，再将云主机集群B确定为目标云主机集群，再将云主机集群C确定为目标云主机集群。Exemplarily, suppose there are three cloud host clusters, denoted as cloud host cluster A, cloud host cluster B, and cloud host cluster C, respectively. Among them, cloud host cluster A is a cloud host cluster mainly serving CPU, cloud host cluster B is a cloud host cluster mainly serving memory, and cloud host cluster C is a cloud host cluster of GPU computing type. If you use the preset resources In the priority, the cloud host cluster with the main CPU service is higher than the cloud host cluster with the main memory service, and the cloud host cluster with the main memory service is higher than the cloud host cluster of the GPU computing type, then the cloud host cluster A can be determined Is the target cloud host cluster, and then the cloud host cluster B is determined as the target cloud host cluster, and then the cloud host cluster C is determined as the target cloud host cluster.

步骤S104，基于第一配置信息和第二配置信息，从各个物理主机中确定目标云主机集群中每个云主机对应的物理主机，作为目标节点，以通过目标节点为目标云主机集群提供目标资源量的硬件资源。Step S104, based on the first configuration information and the second configuration information, determine the physical host corresponding to each cloud host in the target cloud host cluster from each physical host, as a target node, and provide target resources for the target cloud host cluster through the target node The amount of hardware resources.

其中，上述第一配置信息包括各个物理主机中每个硬件资源的当前剩余资源量，硬件资源的当前剩余资源量也即该资源当前可用的资源量，例如，物理主机node(节点)1中包括的CPU资源的当前剩余资源量为28*2GHz(即node1中当前可用的CPU资源中包括28个2GHz的CPU)、MEM资源的当前剩余资源量为100G(即node1当前可用的MEM资源中包括总存储容量为100G的存储单元)以及GPU资源的当前剩余量为0。另外，第二配置信息包括目标云主机集群中每个云主机分别所需的硬件资源的目标资源占比和目标资源量，其中目标资源占比表示在一个周期内该云主机使用该硬件资源的时长与该周期总时长的比值，例如，目标云主机集群为CPU服务为主的云主机集群包括六台云主机，则第二配置信息中可以记录有每台云主机均需要CPU资源“目标资源占比50％；目标资源量8*2GHz”、MEM资源“目标资源占比30％；目标资源量8GB”、GPU资源“0”以及NET资源“目标资源占比20％；目标资源量1Gb/s”。Wherein, the above-mentioned first configuration information includes the current remaining resource amount of each hardware resource in each physical host. The current remaining resource amount of the hardware resource is also the amount of resources currently available for the resource. For example, the physical host node (node) 1 includes The current remaining resource amount of the CPU resource is 28*2GHz (that is, the CPU resources currently available in node1 include 28 2GHz CPUs), and the current remaining resource amount of the MEM resource is 100G (that is, the current available MEM resources in node1 include the total A storage unit with a storage capacity of 100G) and the current remaining amount of GPU resources are 0. In addition, the second configuration information includes the target resource percentage and target resource volume of the hardware resources required by each cloud host in the target cloud host cluster, where the target resource percentage represents the amount of hardware resources used by the cloud host in a period. The ratio of the duration to the total duration of the cycle. For example, if the target cloud host cluster is a CPU-based cloud host cluster including six cloud hosts, the second configuration information can record that each cloud host requires CPU resources "target resource" Accounted for 50%; target resource amount 8*2GHz", MEM resource "target resource accounted for 30%; target resource amount 8GB", GPU resource "0" and NET resource "target resource accounted for 20%; target resource amount 1Gb/ s".

在一种实施方式中，可以根据第二配置信息中各个硬件资源的目标资源占比的大小，确定目标云主机集群的主资源，并将主资源的当前剩余资源量最大的物理主机确定为目标云主机集群中某个云主机对应的目标节点，直至基于各个物理主机中主资源的当前剩余资源量确定目标云主机集群中每个云主机对应的目标节点。例如，CPU服务为主的云主机集群中CPU资源的目标资源占比最大，则确定CPU资源为该目标云主机集群的主资源，若物理主机node0中CPU资源的当前剩余资源量为14*2GHz，node1中CPU资源的当前剩余资源量为28*2GHz，node2中CPU资源的当前资源剩余量为0，则将node1确定为目标云主机集群中一个云主机对应的目标节点，若目标云主机集群中每台云主机需要CPU资源的目标资源量为8*2GHz，则更新node1中CPU资源的当前资源剩余量为20*2GHz，并更新node1中的MEM资源、GPU资源以及NET资源。继续基于各个物理主机的中CPU资源的当前剩余量确定目标云主机集群中其余云主机对应的目标节点。In an implementation manner, the main resource of the target cloud host cluster may be determined according to the target resource proportion of each hardware resource in the second configuration information, and the physical host with the largest amount of current remaining resources of the main resource may be determined as the target The target node corresponding to a certain cloud host in the cloud host cluster until the target node corresponding to each cloud host in the target cloud host cluster is determined based on the current remaining resource amount of the main resource in each physical host. For example, if the target resource of the CPU resource in a cloud host cluster where the CPU service is the main service is the largest, it is determined that the CPU resource is the main resource of the target cloud host cluster. If the current remaining resource amount of the CPU resource in the physical host node0 is 14*2GHz , The current remaining resource amount of the CPU resource in node1 is 28*2GHz, and the current remaining resource amount of the CPU resource in node2 is 0, then node1 is determined as the target node corresponding to a cloud host in the target cloud host cluster, if the target cloud host cluster The target amount of CPU resources required by each cloud host in the cloud host is 8*2GHz, then update the current remaining amount of CPU resources in node1 to 20*2GHz, and update the MEM resources, GPU resources, and NET resources in node1. Continue to determine the target node corresponding to the remaining cloud hosts in the target cloud host cluster based on the current remaining amount of the CPU resources of each physical host.

步骤S106，监听目标云主机集群中每个硬件资源的实际使用量。Step S106: Monitor the actual usage of each hardware resource in the target cloud host cluster.

在实际应用中，一方面虽然为目标云主机集群配置了目标资源量，但是目标云主机集群在运行过程中实际使用的资源量可能并未达到该目标资源量，为了缓解资源浪费的问题，本申请实施例对全部云主机集群中每个硬件资源的实际使用量进行监听，从而基于实际使用量对目标云主机集群中每个云主机集的目标节点进行调节，从而在一定程度上缓解资源浪费的问题。In practical applications, on the one hand, although the target cloud host cluster is configured with a target amount of resources, the actual amount of resources used by the target cloud host cluster during operation may not reach the target amount of resources. In order to alleviate the problem of resource waste, this The application embodiment monitors the actual usage of each hardware resource in all cloud host clusters, so as to adjust the target node of each cloud host set in the target cloud host cluster based on the actual usage, thereby alleviating resource waste to a certain extent The problem.

另一方面，虽然为目标云主机集群配置了目标资源量，但是目标云主机集群在运行过程中实际使用的资源量可能超过该目标资源量，进而造成目标云主机集群中各云主机之间相互抢占资源，为了缓解抢占资源的问题，本申请实施例对全部云主机集群中每个硬件资源的实际使用量进行监听，从而基于实际使用量对目标云主机集群中每个云主机集的目标节点进行调节，从而在一定程度上缓解抢占资源的问题。On the other hand, although the target amount of resources is configured for the target cloud host cluster, the actual amount of resources used by the target cloud host cluster during operation may exceed the target amount of resources, which may cause mutual interaction between the cloud hosts in the target cloud host cluster. Resource preemption. In order to alleviate the problem of resource preemption, this embodiment of the present application monitors the actual usage of each hardware resource in all cloud host clusters, so as to monitor the target node of each cloud host set in the target cloud host cluster based on the actual usage. Make adjustments to alleviate the problem of seizing resources to a certain extent.

步骤S108，根据实际使用量调整目标云主机集群的目标节点，以对目标云主机集群进行负载均衡。Step S108: Adjust the target node of the target cloud host cluster according to the actual usage, so as to perform load balancing on the target cloud host cluster.

在一种实施方式中，对于每个物理主机，可以根据该物理主机对应的云主机中每个硬件资源的实际使用量和当前剩余资源量计算硬件资源的资源使用率，基于资源使用率判断是否需要对该物理主机对应的云主机进行迁移，例如，当资源使用率大于预设阈值时确定将该物理主机对应的云主机迁移至其他物理主机，从而降低该物理主机的负载，同时充分利用其余物理主机中的剩余资源，在一定程度上缓解了资源浪费的问题。本文中的云主机中的硬件资源是指云主机使用的硬件资源，同理，本文中的云主机集群中的硬件资源是指云主机集群所使用的硬件资源。In an implementation manner, for each physical host, the resource usage rate of each hardware resource in the cloud host corresponding to the physical host can be calculated according to the actual usage amount and the current remaining resource amount of the hardware resource, and the resource usage rate can be judged based on the resource usage rate. The cloud host corresponding to the physical host needs to be migrated. For example, when the resource usage rate is greater than a preset threshold, it is determined to migrate the cloud host corresponding to the physical host to another physical host, thereby reducing the load of the physical host and making full use of the rest The remaining resources in the physical host alleviate the problem of resource waste to a certain extent. The hardware resources in the cloud host in this article refer to the hardware resources used by the cloud host. Similarly, the hardware resources in the cloud host cluster in this article refer to the hardware resources used by the cloud host cluster.

本申请实施例提供的上述云主机集群的负载均衡方法，基于预先存储的第一配置信息和第二配置信息确定目标云主机集群中每个云主机对应的目标节点，并基于云主机集群中每个硬件资源的实际使用量调整目标云主机集群的目标节点，从而较好地实现目标云主机集群的负载均衡，相较于相关技术中的负载均衡方法容易存在资源抢占和资源浪费的情况，本申请实施例通过基于目标云主机集群中每个硬件资源的实际使用量实时对目标节点进行调整，从而较好地利用了目标节点提供的硬件资源，有效缓解了资源抢占和资源浪费等情况，进而有效提高了云主机负载均衡的合理性。The load balancing method of the cloud host cluster provided by the embodiment of the present application determines the target node corresponding to each cloud host in the target cloud host cluster based on the first configuration information and the second configuration information stored in advance, and is based on each cloud host cluster in the cloud host cluster. The actual usage of each hardware resource adjusts the target node of the target cloud host cluster, so as to better realize the load balancing of the target cloud host cluster. Compared with the load balancing method in related technologies, it is prone to resource preemption and resource waste. The application embodiment adjusts the target node in real time based on the actual usage of each hardware resource in the target cloud host cluster, thereby making better use of the hardware resources provided by the target node, effectively alleviating resource preemption and resource waste, and then Effectively improve the rationality of cloud host load balancing.

在一种实施方式中，上述硬件资源包括CPU资源、内存资源、FPU资源、FPGA资源、GPU资源和网络资源中的一种或多种。本申请实施例以硬件资源包括CPU资源、内存资源、GPU资源和网络资源为例，对上述第一配置信息和第二配置信息进行说明，如表1所示的第一配置信息，物理主机包括node0、node1和node2。由表1可知，物理主机node0中CPU资源的当前剩余资源量为28*2GHz、MEM资源(也即，上述内存资源)的当前剩余资源量为100G、GPU资源的当前资源剩余量为2*1600(即node0当前可用的GPU资源为2个频率为1600MHz的显卡)、Eth(Ethenet，以太网)资源(也即，上述网络资源)的当前剩余资源量为Mellanox(一种网卡型号)25Gb/s(即node0当前可用的Eth资源包括带宽为25Gb/s的Mellanox型网卡)；物理主机node1中CPU资源的当前剩余资源量为 28*2GHz、MEM资源的当前剩余资源量为100G、GPU资源的当前资源剩余量为0、Eth资源的当前剩余资源量为I40e(一种网卡型号)20Gb/s(即node1当前可用的Eth资源包括带宽为20Gb/s的I40e型网卡)；物理主机node2中CPU资源的当前剩余资源量为28*2GHz、MEM资源的当前剩余资源量为100G、GPU资源的当前资源剩余量为2*1600、Eth资源的当前剩余资源量为0。In an embodiment, the aforementioned hardware resources include one or more of CPU resources, memory resources, FPU resources, FPGA resources, GPU resources, and network resources. In this embodiment of the application, hardware resources including CPU resources, memory resources, GPU resources, and network resources are taken as examples to describe the above-mentioned first configuration information and second configuration information. For the first configuration information shown in Table 1, the physical host includes node0, node1, and node2. It can be seen from Table 1 that the current remaining resource amount of the CPU resource in the physical host node0 is 28*2GHz, the current remaining resource amount of the MEM resource (that is, the above-mentioned memory resource) is 100G, and the current remaining resource amount of the GPU resource is 2*1600 (That is, the GPU resources currently available on node0 are two graphics cards with a frequency of 1600MHz), the current remaining resources of the Eth (Ethenet, Ethernet) resources (that is, the above-mentioned network resources) are Mellanox (a network card model) 25Gb/s (That is, the Eth resources currently available on node0 include the Mellanox network card with a bandwidth of 25Gb/s); the current remaining resources of the CPU resources in the physical host node1 are 28*2GHz, the current remaining resources of the MEM resources are 100G, and the current remaining resources of the GPU resources The remaining amount of resources is 0, the current remaining amount of Eth resources is I40e (a network card model) 20Gb/s (that is, the currently available Eth resources of node1 include I40e network cards with a bandwidth of 20Gb/s); CPU resources in the physical host node2 The current remaining resource amount of the MEM resource is 28*2GHz, the current remaining resource amount of the MEM resource is 100G, the current remaining resource amount of the GPU resource is 2*1600, and the current remaining resource amount of the Eth resource is 0.

表1Table 1

硬件hardware	node 0node 0	node 1node 1	node 2node 2
CPUCPU	282GHz282GHz	282GHz282GHz	282GHz282GHz
MemMem	100G100G	100G100G	100G100G
GPUGPU	2160021600	00	2160021600
EthEth	Mellanox 25Gb/sMellanox 25Gb/s	I40e 20Gb/sI40e 20Gb/s	00

另外，表2示意出了一种第二配置信息，包括5类云主机集群。其中，云主机集群A为CPU服务为主的云主机集群，该类云主机集群的CPU占用较大；云主机集群B为内存服务为主的云主机集群，该类云主机集群的内存占用较大；云主机集群C为GPU运算型的云主机集群，该类云主机集群几乎不适用网络；云主机集群D为网络宽带服务为主的云主机集群，该类云主机集群的网络宽带占用较大；云主机集群E为普通云主机集群。以表2第一行为例进行说明，表2第一行表征云主机集群A包括6台云主机，每台云主机所需的硬件资源包括：CPU资源“目标资源占比50％；目标资源量8*2GHz”、MEM资源“目标资源占比30％；目标资源量8GB”、GPU资源“0”以及NET资源“目标资源占比20％；目标资源量1Gb/s”。In addition, Table 2 shows a type of second configuration information, including 5 types of cloud host clusters. Among them, cloud host cluster A is a cloud host cluster with CPU service as the main component, and this type of cloud host cluster has a large CPU usage; cloud host cluster B is a cloud host cluster with main memory service and has a larger memory usage Large; cloud host cluster C is a GPU computing-type cloud host cluster, which is almost inapplicable to the network; cloud host cluster D is a cloud host cluster mainly for network broadband services, and the network bandwidth of this type of cloud host cluster is relatively large. Large; cloud host cluster E is a common cloud host cluster. Take the first row of Table 2 as an example. The first row of Table 2 represents that cloud host cluster A includes 6 cloud hosts, and the hardware resources required by each cloud host include: CPU resources "target resources account for 50%; target resource amount 8*2GHz", MEM resources "target resources account for 30%; target resources 8GB", GPU resources "0" and NET resources "target resources account for 20%; target resources 1Gb/s".

表2Table 2

下面将对前述S106进行示例性说明，本申请实施例提供了一种基于第一配置信息和第二配置信息，从各个物理主机中确定目标云主机集群中每个云主机对应的目标节点的一些实施方式，参见如下步骤1至步骤3：The foregoing S106 will be exemplarily described below. The embodiment of the present application provides a method for determining some of the target nodes corresponding to each cloud host in the target cloud host cluster from each physical host based on the first configuration information and the second configuration information. For implementation, see the following steps 1 to 3:

步骤1，根据第二配置信息中各个硬件资源的目标资源占比的大小，确定目标云主机集群的主资源。以上述表2所示的第二配置信息为例，可以得知云主机集群A中CPU资源的目标占比最大，可将CPU资源确定为云主机集群A的主资源；同理，可将MEM资源确定为云主机集群B的主资源，将GPU资源确定为云主机集群确定为云主机集群C的主资源，将NET资源确定为云主机集群D的主资源，将CPU资源或MEM资源确定为云主机集群E的主资源。Step 1: Determine the main resource of the target cloud host cluster according to the proportion of the target resource of each hardware resource in the second configuration information. Taking the second configuration information shown in Table 2 above as an example, it can be known that the target proportion of CPU resources in cloud host cluster A is the largest, and the CPU resources can be determined as the main resource of cloud host cluster A; for the same reason, MEM can be set The resource is determined as the main resource of cloud host cluster B, the GPU resource is determined as the main resource of cloud host cluster C, the NET resource is determined as the main resource of cloud host cluster D, and the CPU resource or MEM resource is determined as The main resource of cloud host cluster E.

步骤2，从各个物理主机中选取包含有主资源的物理主机。以上述表1所示的第一配置信息为例，可知，假设云主机集群按资源使用优先级由高到低的顺序为“C、D、A、B、E”，则首先确定云主机集群C为目标云主机集群，可知物理主机node0和node2中均包含GPU资源。Step 2: Select the physical host that contains the main resource from each physical host. Taking the first configuration information shown in Table 1 above as an example, it can be seen that if the cloud host cluster is "C, D, A, B, E" in descending order of resource usage priority, then the cloud host cluster is determined first C is the target cloud host cluster. It can be seen that both physical hosts node0 and node2 contain GPU resources.

步骤3，根据选取的物理主机中主资源的当前剩余资源量的大小，确定目标云主机集群中每个云主机对应的目标节点。在一种实施方式中，在确定云主机对应的目标节点时，可以优先选择当前剩余资源量最大的物理主机作为云主机对应的目标节点，在一些实现方式中，可参照如下步骤3.1至步骤3.5，在其他可能的方式中也可以是从包含有主资源的物理主机中随机选取物理主机作为云主机对应的目标节点：Step 3: Determine the target node corresponding to each cloud host in the target cloud host cluster according to the current remaining resource amount of the main resource in the selected physical host. In one embodiment, when determining the target node corresponding to the cloud host, the physical host with the largest amount of current remaining resources may be selected as the target node corresponding to the cloud host. In some implementation manners, the following steps 3.1 to 3.5 can be referred to , In other possible ways, it can also randomly select a physical host from the physical host containing the main resource as the target node corresponding to the cloud host:

步骤3.1，从目标云主机集群中随机选取云主机。可以理解的，因为目标云主机集群中每个云主机的需求均相同，因此在确定各个云主机对应的目标节点时，可以随机从目标云主机集群中选择云主机，确定该云主机对应的目标节点，直至确定目标云主机集群中各个云主机对应的目标节点。应当注意的是，在新创建云主机(也可理解为在初始状态下确定云主机对应的目标节点)时，方可从云主机集群中随机选取云主机，而在云主机运行过程中，可以按照各个云主机的实际资源占比选择需要进行迁移的云主机，例如，当实际资源占比超过预设阈值时确定对该云主机进行迁移。Step 3.1, randomly select a cloud host from the target cloud host cluster. It is understandable that because the requirements of each cloud host in the target cloud host cluster are the same, when determining the target node corresponding to each cloud host, the cloud host can be randomly selected from the target cloud host cluster to determine the target corresponding to the cloud host Node until the target node corresponding to each cloud host in the target cloud host cluster is determined. It should be noted that when a cloud host is newly created (it can also be understood as determining the target node corresponding to the cloud host in the initial state), the cloud host can be randomly selected from the cloud host cluster, and during the operation of the cloud host, you can The cloud host that needs to be migrated is selected according to the actual resource proportion of each cloud host. For example, when the actual resource proportion exceeds a preset threshold, the cloud host is determined to be migrated.

步骤3.2，将选取的物理主机中主资源的当前剩余资源量最大的物理主机，确定为选取的云主机对应的目标节点。以表1为例，上述物理主机node0、node1和node2中GPU的资源均为2*1600，因此可以从主资源的当前剩余资源量最大的物理主机中随机确定该云主机对应的目标节点，也可以是从主资源的当前剩余资源量最大的物理主机中选取序号最大或最小的物理主机作为该云主机对应的目标节点。例如，可以是将物理主机node0和物理主机node2中随机一个物理主机确定为该云主机对应的目标节点，也可以是从物理主机node0和物理主机node2中选取序号最小的物理主机(即物理主机node0)确定为该云主机对应的目标节点。Step 3.2: Determine the physical host with the largest amount of current remaining resources of the main resource among the selected physical hosts as the target node corresponding to the selected cloud host. Taking Table 1 as an example, the GPU resources in the above physical hosts node0, node1, and node2 are all 2*1600. Therefore, the target node corresponding to the cloud host can be randomly determined from the physical host with the largest amount of remaining resources of the main resource. It may be that the physical host with the largest or smallest sequence number is selected from the physical host with the largest amount of current remaining resources of the primary resource as the target node corresponding to the cloud host. For example, a random physical host from physical host node0 and physical host node2 can be determined as the target node corresponding to the cloud host, or the physical host with the smallest serial number (ie physical host node0) can be selected from physical host node0 and physical host node2. ) Is determined as the target node corresponding to the cloud host.

步骤3.3，计算当前剩余资源量与选取的云主机中目标资源量之间的差值。其中，当前剩余资源量为选取的云主机对应的目标节点中当前剩余的资源量。可以是分别针对每个硬件资源，计算该硬件资源的当前剩余资源量与云主机中该硬件资源的目标资源量之间的差值，示例性的，物理主机node0中GPU资源的当前剩余资源量与目标节点中GPU资源的目标资源量之间的差值为2*1600-50％*1600＝1.5*1600。同理，物理主机node0中CPU资源的当前剩余量与云主机中CPU资源的目标资源量之间的差值为8*2GHz-20％*4*2GHz＝27.2*2GHz，即CPU资源对应的差值为27.2*2GHz。MEM资源对应的差值为100G-30％*8G＝97.6G，NET资源对应的差值为25Gb/s。Step 3.3: Calculate the difference between the current remaining resource amount and the target resource amount in the selected cloud host. Wherein, the current remaining resource amount is the current remaining resource amount in the target node corresponding to the selected cloud host. It may be for each hardware resource to calculate the difference between the current remaining resource amount of the hardware resource and the target resource amount of the hardware resource in the cloud host, for example, the current remaining resource amount of the GPU resource in the physical host node0 The difference with the target resource amount of the GPU resource in the target node is 2*1600-50%*1600=1.5*1600. Similarly, the difference between the current remaining amount of CPU resources in the physical host node0 and the target amount of CPU resources in the cloud host is 8*2GHz-20%*4*2GHz=27.2*2GHz, which is the difference corresponding to the CPU resources The value is 27.2*2GHz. The difference corresponding to MEM resources is 100G-30%*8G=97.6G, and the difference corresponding to NET resources is 25Gb/s.

步骤3.4，基于差值更新选取的云主机对应的目标节点中主资源的当前剩余资源量。在实际应用中，可将计算得到的个硬件资源对应的差值更新为目标节点中各硬件资源的当前剩余资源量，可参见表3所示。Step 3.4: Update the current remaining resource amount of the main resource in the target node corresponding to the selected cloud host based on the difference. In practical applications, the calculated difference value corresponding to each hardware resource can be updated to the current remaining resource amount of each hardware resource in the target node, as shown in Table 3.

表3table 3

硬件hardware	node 0node 0	node 1node 1	node 2node 2
CPUCPU	27.22GHz27.22GHz	282GHz282GHz	282GHz282GHz
MemMem	97.6G97.6G	100G100G	100G100G
GPUGPU	1.516001.51600	00	2160021600
EthEth	25Gb/s25Gb/s	20Gb/s20Gb/s	00

步骤3.5，从目标云主机集群的其余云主机中随机选取下一云主机，将选取的物理主机中主资源的当前剩余资源量最大的物理主机，确定为选取的下一云主机对应的目标节点，直至确定目标云主机集群中每个云主机对应的目标节点。例如，基于表3可知，物理主机node2中GPU资源的当前剩余资源量较大，此时可将物理主机node2确定为该下一云主机对应的目标节点，并计算物理主机node2中各个硬件资源的当前资源剩余量与该下一云主机中各个硬件资源的当前资源剩余量的差值，并基于该差值对物理主机node2的当前剩余资源量进行更新。同理，按照上述过程依次确定云主机集群C中每个云主机集群对应的目标节点，确定云主机集群C中云主机C1对应物理主机node0、云主机C2对应物理主机node2、云主机C3对应物理主机node0、云主机C4对应物理主机node2、云主机C5对应物理主机node0、云主机C6对应物理主机node2，也即物理主机node0对应云主机集群C中的3个云主机，物理主机node2对应云主机集群C中的3个云主机，前文中的云主机C1-C6为云主机集群C中的6台云主机。同时还可以得到如表4所示的物理主机中每个硬件资源的当前资源剩余量。Step 3.5, randomly select the next cloud host from the remaining cloud hosts in the target cloud host cluster, and determine the physical host with the largest amount of current remaining resources of the main resource among the selected physical hosts as the target node corresponding to the selected next cloud host , Until the target node corresponding to each cloud host in the target cloud host cluster is determined. For example, based on Table 3, it can be seen that the current remaining resources of the GPU resources in the physical host node2 are relatively large. At this time, the physical host node2 can be determined as the target node corresponding to the next cloud host, and the value of each hardware resource in the physical host node2 can be calculated. The difference between the current remaining amount of resources and the current remaining amount of each hardware resource in the next cloud host, and the current remaining amount of resources of the physical host node2 is updated based on the difference. In the same way, according to the above process, determine the target node corresponding to each cloud host cluster in cloud host cluster C, determine that cloud host C1 in cloud host cluster C corresponds to physical host node0, cloud host C2 corresponds to physical host node2, and cloud host C3 corresponds to physical Host node0, cloud host C4 correspond to physical host node2, cloud host C5 corresponds to physical host node0, cloud host C6 corresponds to physical host node2, that is, physical host node0 corresponds to 3 cloud hosts in cloud host cluster C, and physical host node2 corresponds to cloud host There are 3 cloud hosts in cluster C. The cloud hosts C1-C6 in the previous article are the 6 cloud hosts in cloud host cluster C. At the same time, the current remaining amount of resources of each hardware resource in the physical host as shown in Table 4 can also be obtained.

表4Table 4

硬件hardware	node 0node 0	node 1node 1	node 2node 2
CPUCPU	25.62GHz25.62GHz	282GHz282GHz	25.62GHz25.62GHz
MemMem	92.8G92.8G	100G100G	92.8G92.8G
GPUGPU	0.516000.51600	00	0.516000.51600
EthEth	25Gb/s25Gb/s	20Gb/s20Gb/s	00

在实际应用中，确定云主机集群C中每个云主机对应的目标节点后，可以按照云主机集群的资源使用优先级确地下一云主机集群中各个云主机对应的目标节点，假设云主机集群按资源使用优先级由高到低的顺序为“C、D、A、B、E”，应在上述表4的基础上进一步确定云主机集群D中各个云主机对应的目标节点，在一些实施例中，可按照上述步骤3.1 至步骤3.5依次确定云主机集群D中各个云主机对应的目标节点。云主机集群D的主资源为NET资源，而物理主机node0和node1中包含NET资源，由于物理主机node0中NET资源的当前剩余资源量较大，因此云主机D1对应物理主机node0，即将云主机D1对应的目标节点为物理主机node0，且此时物理主机node0中NET资源的当前资源剩余量为(25Gb/s-50％*10Gb/s＝20Gb/s)，物理主机node0和物理主机node1的NET资源的当前资源剩余量相同，则可以选择物理主机node0或物理主机node1作为云主机D2对应的目标节点，假设云主机D2对应物理主机node0，则此时物理主机node0中NET资源的当前剩余量为(20Gb/s-50％*10Gb/s＝15Gb/s)，物理主机node0中NET资源的当前剩余资源量小于物理主机node1中NET资源的当前剩余资源量，因此云主机D3对应物理主机node1，依次类推，可以确定云主机D4对应物理主机node0、云主机D5对应物理主机node1以及云主机D6对应物理主机node0，也即物理主机node0对应云主机集群D中的4个云主机，物理主机node1对应云主机集群D中的2个云主机，前文中的云主机D1-D6为云主机集群D中的6台云主机。同时可以得到如表5所示的物理主机中每个硬件资源的当前资源剩余量。In practical applications, after determining the target node corresponding to each cloud host in the cloud host cluster C, you can determine the target node corresponding to each cloud host in the next cloud host cluster according to the resource usage priority of the cloud host cluster. Assume that the cloud host cluster According to the order of resource usage priority from high to low, "C, D, A, B, E", the target node corresponding to each cloud host in the cloud host cluster D should be further determined on the basis of Table 4 above. In some implementations In an example, the target node corresponding to each cloud host in the cloud host cluster D can be determined in sequence according to the above steps 3.1 to 3.5. The main resource of cloud host cluster D is NET resources, and physical hosts node0 and node1 contain NET resources. Since the current remaining resources of NET resources in physical host node0 are relatively large, cloud host D1 corresponds to physical host node0, that is, cloud host D1 The corresponding target node is the physical host node0, and the current remaining amount of NET resources in the physical host node0 is (25Gb/s-50%*10Gb/s=20Gb/s), the NET of the physical host node0 and the physical host node1 If the current resource remaining amount of the resources is the same, you can select physical host node0 or physical host node1 as the target node corresponding to cloud host D2. Assuming that cloud host D2 corresponds to physical host node0, the current remaining amount of NET resources in physical host node0 is (20Gb/s-50%*10Gb/s=15Gb/s), the current remaining resource amount of the NET resource in the physical host node0 is less than the current remaining resource amount of the NET resource in the physical host node1, so the cloud host D3 corresponds to the physical host node1, By analogy, it can be determined that cloud host D4 corresponds to physical host node0, cloud host D5 corresponds to physical host node1, and cloud host D6 corresponds to physical host node0, that is, physical host node0 corresponds to 4 cloud hosts in cloud host cluster D, and physical host node1 corresponds to There are two cloud hosts in the cloud host cluster D. The cloud hosts D1-D6 in the previous article are the 6 cloud hosts in the cloud host cluster D. At the same time, the current remaining amount of resources of each hardware resource in the physical host as shown in Table 5 can be obtained.

表5table 5

硬件hardware	node 0node 0	node 1node 1	node 2node 2
CPUCPU	22.42GHz22.42GHz	26.42GHz26.42GHz	25.62GHz25.62GHz
MemMem	88.0G88.0G	97.6G97.6G	92.8G92.8G
GPUGPU	0.516000.51600	00	0.516000.51600
EthEth	5Gb/s5Gb/s	10Gb/s10Gb/s	00

继续按照上述步骤3.1至步骤3.5确定下一云主机集群中每个云主机对应的目标节点，其中，云主机集群A的主资源为CPU资源，则可以确定物理主机node0对应云主机集群A中1个云主机，物理主机node1对应云主机集群A中3个云主机，物理主机node2对应云主机集群A中2个云主机，同时得到如表6所示的物理主机中每个硬件资源的当前资源剩余量。Continue to follow the above steps 3.1 to 3.5 to determine the target node corresponding to each cloud host in the next cloud host cluster, where the main resource of the cloud host cluster A is the CPU resource, then it can be determined that the physical host node0 corresponds to 1 in the cloud host cluster A A cloud host, physical host node1 corresponds to 3 cloud hosts in cloud host cluster A, physical host node2 corresponds to 2 cloud hosts in cloud host cluster A, and obtains the current resources of each hardware resource in the physical host as shown in Table 6 remaining.

表6Table 6

硬件hardware	node 0node 0	node 1node 1	node 2node 2
CPUCPU	18.42GHz18.42GHz	14.42GHz14.42GHz	17.62GHz17.62GHz
MemMem	85.6G85.6G	90.4G90.4G	88.0G88.0G
GPUGPU	0.516000.51600	00	0.516000.51600
EthEth	4.8Gb/s4.8Gb/s	9.0Gb/s9.0Gb/s	00

继续按照上述步骤3.1至步骤3.5确定下一云主机集群中每个云主机对应的目标节点，其中，云主机集群B的主资源为MEM资源，则可以确定物理主机node0对应云主机集群 B中2个云主机，物理主机node1对应云主机集群B中2个云主机，物理主机node2对应云主机集群B中2个云主机，且由于物理主机node2上没有NET资源，因此物理主机node 2上的云主机跨节点访问物理主机node 1上的NET资源，同时得到如表7所示的物理主机中每个硬件资源的当前资源剩余量。Continue to follow the above steps 3.1 to 3.5 to determine the target node corresponding to each cloud host in the next cloud host cluster, where the main resource of the cloud host cluster B is the MEM resource, then it can be determined that the physical host node0 corresponds to 2 in the cloud host cluster B A cloud host, physical host node1 corresponds to 2 cloud hosts in cloud host cluster B, physical host node2 corresponds to 2 cloud hosts in cloud host cluster B, and because there is no NET resource on physical host node2, the cloud on physical host node 2 The host accesses the NET resource on the physical host node 1 across nodes, and at the same time obtains the current remaining amount of resources of each hardware resource in the physical host as shown in Table 7.

表7Table 7

硬件hardware	node 0node 0	node 1node 1	node 2node 2
CPUCPU	16.02GHz16.02GHz	12.02GHz12.02GHz	15.22GHz15.22GHz
MemMem	53.6G53.6G	58.4G58.4G	56.0G56.0G
GPUGPU	0.516000.51600	00	0.516000.51600
EthEth	4.4Gb/s4.4Gb/s	8.2Gb/s8.2Gb/s	00

继续按照上述步骤3.1至步骤3.5确定下一云主机集群中每个云主机对应的目标节点，其中，云主机集群E的主资源为CPU资源或MEM资源，则可以确定物理主机node0对应云主机集群E中2个云主机，物理主机node1对应云主机集群E中1个云主机，物理主机node2对应云主机集群E中3个云主机，且由于物理主机node2上没有NET资源，物理主机node 2上的云主机跨节点访问物理主机node 1上的NET资源，同时得到如表8所示的物理主机中每个硬件资源的当前资源剩余量。Continue to follow the above steps 3.1 to 3.5 to determine the target node corresponding to each cloud host in the next cloud host cluster, where the main resource of the cloud host cluster E is CPU resources or MEM resources, then it can be determined that the physical host node0 corresponds to the cloud host cluster There are 2 cloud hosts in E, physical host node1 corresponds to 1 cloud host in cloud host cluster E, physical host node2 corresponds to 3 cloud hosts in cloud host cluster E, and because physical host node2 does not have NET resources, physical host node 2 is on The cloud host accesses the NET resources on the physical host node 1 across nodes, and at the same time obtains the current remaining resource amount of each hardware resource in the physical host as shown in Table 8.

表8Table 8

硬件hardware	node 0node 0	node 1node 1	node 2node 2
CPUCPU	13.22GHz13.22GHz	10.62GHz10.62GHz	11.02GHz11.02GHz
MemMem	50.8G50.8G	57.0G57.0G	51.8G51.8G
GPUGPU	0.516000.51600	00	0.516000.51600
EthEth	1.4Gb/s1.4Gb/s	2.2Gb/s2.2Gb/s	00

在一种实施方式中，本申请实施例在根据实际使用量调整目标云主机集群的目标节点，以对目标云主机集群进行负载均衡时，在一种实施方式中，可以根据实际资源占比从运行中的云主机中选择需要进行迁移的云主机，并确定该云主机待迁移的物理主机，进而将该云主机迁移至该待迁移的物理主机，以实现对云主机集群的负载均衡，在一些实施例中，可以参照如下步骤a至步骤c：In an implementation manner, when adjusting the target node of the target cloud host cluster according to actual usage in the embodiment of the present application to load balance the target cloud host cluster, in an implementation manner, the actual resource ratio may be adjusted from Select the cloud host that needs to be migrated from the running cloud hosts, and determine the physical host to be migrated from the cloud host, and then migrate the cloud host to the physical host to be migrated to achieve load balancing of the cloud host cluster. In some embodiments, the following steps a to c can be referred to:

步骤a，针对每个目标节点，根据该目标节点对应的多个云主机中每个硬件资源的实际使用量和该目标节点的每个硬件资源的当前剩余资源量，计算该目标节点中每个硬件资源的资源使用率。本申请实施例提供了一种步骤a的一实现方式，如下步骤a1至步骤a3所示：Step a: For each target node, calculate each target node according to the actual usage of each hardware resource in the multiple cloud hosts corresponding to the target node and the current remaining resource amount of each hardware resource of the target node The resource usage rate of hardware resources. The embodiment of the present application provides an implementation manner of step a, as shown in the following steps a1 to a3:

步骤a1，计算该目标节点对应的多个云主机中每个硬件资源的实际使用量和该目标节点中每个硬件资源的标定资源量的比值，得到各个云主机中每个硬件资源的临时资源占比。其中，目标节点中硬件资源的标定资源量可以理解为目标节点中该硬件资源的可使用资源量量，一般而言，可使用资源量量与目标资源量相同。在一种实施方式中可以按照如下公式计算临时资源占比w_t _i： Step a1: Calculate the ratio of the actual usage of each hardware resource in the multiple cloud hosts corresponding to the target node to the calibrated resource amount of each hardware resource in the target node to obtain the temporary resource of each hardware resource in each cloud host Accounted for. Among them, the calibrated resource amount of the hardware resource in the target node can be understood as the usable resource amount of the hardware resource in the target node. Generally speaking, the usable resource amount is the same as the target resource amount. In an implementation manner, the temporary resource proportion w_t _i can be calculated according to the following formula:

其中，i表示硬件资源，w_t _i为i所表示的硬件资源的临时资源占比。

Among them, i represents the hardware resource, and w_t _i is the temporary resource proportion of the hardware resource represented by i.

例如，云主机集群A中CPU资源的目标资源量为8*2GHz，实际使用量为6*2GHz，则按照上述公式得到云主机集群A中CPU资源的临时资源占比为：

同理，可按照上述公式计算云主机集群A中其余硬件资源的临时资源占比，假设其余硬件资源的实际使用量与标定资源量相同，也即w_t _i＝100％。另外，还可按照上述公式分别计算出各个云主机集群中每个硬件资源的临时资源占比。 For example, if the target resource amount of CPU resources in cloud host cluster A is 8*2GHz, and the actual usage is 6*2GHz, the temporary resource proportion of CPU resources in cloud host cluster A is obtained according to the above formula:

In the same way, the temporary resource proportion of the remaining hardware resources in the cloud host cluster A can be calculated according to the above formula. It is assumed that the actual usage of the remaining hardware resources is the same as the calibrated resource amount, that is, w_t _i =100%. In addition, the proportion of temporary resources of each hardware resource in each cloud host cluster can also be calculated separately according to the above formula.

步骤a2，针对每个云主机，对该云主机中每个硬件资源的临时资源占比进行归一化处理，得到该云主机每个硬件资源的实际资源占比。在一种实施方式中，可以按照如下公式对同一云主机各硬件资源的临时资源占比进行归一化处理，得到该云主机中每个硬件资源的实际资源占比w _i：

其中，n表示硬件资源的数量，j为取值范围为[1,n]的正整数，w_t _j为第j个硬件资源的临时资源占比。 Step a2: For each cloud host, normalize the temporary resource proportion of each hardware resource in the cloud host to obtain the actual resource proportion of each hardware resource of the cloud host. In an implementation manner, the temporary resource proportion of each hardware resource of the same cloud host can be normalized according to the following formula to obtain the actual resource proportion w _i of each hardware resource in the cloud host:

Among them, n represents the number of hardware resources, j is a positive integer with a value range of [1, n], and w_t _j is the proportion of temporary resources of the jth hardware resource.

步骤a3，根据该目标节点对应的各个云主机中每个硬件资源的实际资源占比和目标资源量，计算该目标节点中每个硬件资源的资源使用率。在一种实施方式中，可以分别计算云主机中该硬件资源的实际资源占比和实际资源量的第一乘积，以及目标资源占比和目标资源量的第二乘积，并计算第一乘积与第二乘积的比值，得到该硬件资源的资源使用率。Step a3: Calculate the resource usage rate of each hardware resource in the target node according to the actual resource proportion of each hardware resource in each cloud host corresponding to the target node and the target resource amount. In one embodiment, the actual resource proportion of the hardware resource in the cloud host and the first product of the actual resource amount, and the second product of the target resource proportion and the target resource amount can be calculated separately, and the first product and the target resource amount can be calculated separately. The ratio of the second product is the resource utilization rate of the hardware resource.

本申请实施例还提供了步骤a的一实施方式，如下步骤a4-a6所示：The embodiment of the present application also provides an implementation manner of step a, as shown in the following steps a4-a6:

步骤a4，针对该目标节点的每个硬件资源，计算该目标节点对应的每个云主机使用该硬件资源的实际使用量和该目标节点中该硬件资源的标定资源量的比值，得到每个云主机的每个硬件资源的临时资源占比。其中，目标节点中硬件资源的标定资源量可以理解为目标节点中该硬件资源的可使用资源量，一般而言，可使用资源量与该目标节点已经分配的硬件资源量相等。在一种实施方式中可以按照如下公式计算临时资源占比w_t _i： Step a4: For each hardware resource of the target node, calculate the ratio of the actual usage amount of the hardware resource used by each cloud host corresponding to the target node to the calibrated resource amount of the hardware resource in the target node to obtain each cloud The percentage of temporary resources of each hardware resource of the host. Among them, the calibrated resource amount of the hardware resource in the target node can be understood as the usable resource amount of the hardware resource in the target node. Generally speaking, the usable resource amount is equal to the hardware resource amount allocated by the target node. In an implementation manner, the temporary resource proportion w_t _i can be calculated according to the following formula:

例如，假设目标节点中CPU资源的标定资源量为8*2GHz，该目标节点对应于两个云主机，为方便描述假设记为云主机1和云主机2，如果云主机1中CPU资源的实际使用量为4*2GHz，云主机2中CPU资源的实际使用量为6*2GHz，则可以计算得到该目标节点的CPU资源中，云主机1的临时资源占比为

其中w_t ₁为云主机1的临时资源占比，云主机2的临时资源占比

其中w_t ₂为云主机2的临时资源占比。 For example, suppose the calibrated resource amount of the CPU resources in the target node is 8*2GHz, and the target node corresponds to two cloud hosts. For the convenience of description, suppose it is recorded as cloud host 1 and cloud host 2. If the actual CPU resources in cloud host 1 The usage is 4*2GHz, and the actual usage of the CPU resources in cloud host 2 is 6*2GHz, you can calculate the CPU resources of the target node, and cloud host 1’s temporary resources account for

Where w_t ₁ is the proportion of temporary resources of cloud host 1, and the proportion of temporary resources of cloud host 2

Among them, w_t ₂ is the proportion of temporary resources of cloud host 2.

步骤a5，针对目标节点的每个硬件资源，对该目标节点对应的每个云主机中该硬件资源的临时资源占比进行归一化处理，得到每个云主机中该硬件资源的实际资源占比。Step a5: For each hardware resource of the target node, normalize the temporary resource proportion of the hardware resource in each cloud host corresponding to the target node to obtain the actual resource proportion of the hardware resource in each cloud host Compare.

在一种实施方式中，可以按照如下公式对同一云主机的临时资源占比进行归一化处理，得到该云主机中的每个硬件资源的实际资源占比w _i：

其中，n表示该目标节点对应的云主机的数目，j为取值范围为[1,n]的正整数，w_t _j为该目标节点对应的第j个云主机中的该硬件资源的临时资源占比，w _i为目标节点对应的第i个云主机中的该硬件资源的实际资源占比。 In an implementation manner, the temporary resource proportion of the same cloud host can be normalized according to the following formula to obtain the actual resource proportion w _i of each hardware resource in the cloud host:

Among them, n represents the number of cloud hosts corresponding to the target node, j is a positive integer with a value range of [1,n], w_t _{j is} the temporary resource of the hardware resource in the jth cloud host corresponding to the target node The proportion, w _i is the actual resource proportion of the hardware resource in the i-th cloud host corresponding to the target node.

仍以前述步骤a4的示例为例，则可以计算得到云主机1中的CPU资源的实际资源占比为

云主机2中的CPU资源资源的实际资源占比为

Still taking the example of the foregoing step a4 as an example, the actual resource proportion of the CPU resources in the cloud host 1 can be calculated as

The actual resource ratio of the CPU resource resources in cloud host 2 is

步骤a6，根据该目标节点对应的各个云主机中的每个硬件资源的实际资源占比和标定资源量，计算该目标节点中每个硬件资源的资源使用率。在一种实施方式中，可以分别计算该硬件资源对应的每个云主机中该硬件资源的实际资源占比和实际资源量的第三乘积，以及该云主机中的该硬件资源的目标资源占比和目标资源量的第四乘积，并计算第三乘积与第四乘积的比值，得到该硬件资源的资源使用率。Step a6: Calculate the resource usage rate of each hardware resource in the target node according to the actual resource proportion and the calibrated resource amount of each hardware resource in each cloud host corresponding to the target node. In an implementation manner, the third product of the actual resource proportion of the hardware resource and the actual resource amount in each cloud host corresponding to the hardware resource can be calculated separately, and the target resource proportion of the hardware resource in the cloud host The ratio is the fourth product of the target resource amount, and the ratio of the third product to the fourth product is calculated to obtain the resource utilization rate of the hardware resource.

步骤b，如果该目标节点中硬件资源的资源使用率大于或等于硬件资源对应的预设阈值，根据每个物理主机中硬件资源的当前剩余资源量的大小，确定该目标节点对应的每个云主机待迁移的目标节点。其中，若硬件资源的使用率大于或等于该硬件资源对应的预设阈值，则说明该硬件资源使用率超标，此时即可启动负载均衡机制，例如，当物理主机中CPU资源的资源使用率达到60％时开始进行负载均衡。在一种实施方式中，可以优先选择当前剩余资源量较大的物理主机作为云主机待迁移的目标节点。例如，当物理主机node0中CPU资源的资源使用率为59％，即使其它物理主机的CPU资源的资源使用率几乎为0，也不进行跨节点迁移，仅仅重复统计资源使用情况，更新资源占比；当物理主机node0中CPU资源的资源使用率大于60％(例如，70％)，则按照CPU资源的当前剩余资源量由大到小的顺序对物理主机进行排序，选择CPU资源的当前剩余资源量最大的物理主机作为云主机待迁移的目标节点，并将实际资源占比最大的云主机迁移至该物理主机，其中，该物理主机必须拥有CPU资源。假如准备迁移到的目标节点当前剩余的资源量小于被选中的云主机的目标资源量，或者目标云主机会因此超出监控指标，则选择实际资源占比次大的云主机进行拟迁移，依次类推，直到物理云主机节点的资源监控指标低于指定指标。若所有云主机都不满足迁移的条件，且资源使用率超过指定监控指标，则上报监控事件，人工介入。Step b: If the resource usage rate of the hardware resource in the target node is greater than or equal to the preset threshold corresponding to the hardware resource, determine each cloud corresponding to the target node according to the current remaining resource amount of the hardware resource in each physical host The target node of the host to be migrated. Among them, if the utilization rate of the hardware resource is greater than or equal to the preset threshold corresponding to the hardware resource, it means that the utilization rate of the hardware resource exceeds the standard, and the load balancing mechanism can be started at this time. For example, when the resource utilization rate of the CPU resource in the physical host is When it reaches 60%, load balancing starts. In an implementation manner, a physical host with a larger amount of current remaining resources may be preferentially selected as the target node of the cloud host to be migrated. For example, when the resource usage rate of the CPU resource in the physical host node0 is 59%, even if the resource usage rate of the CPU resources of other physical hosts is almost 0, no cross-node migration will be performed, and only repeated statistics of resource usage and update the resource ratio ; When the resource utilization rate of the CPU resource in the physical host node0 is greater than 60% (for example, 70%), the physical hosts are sorted according to the current remaining resource amount of the CPU resource in ascending order, and the current remaining resource of the CPU resource is selected The physical host with the largest amount is used as the target node of the cloud host to be migrated, and the cloud host with the largest actual resource proportion is migrated to the physical host, where the physical host must have CPU resources. If the current remaining resource amount of the target node to be migrated to is less than the target resource amount of the selected cloud host, or the target cloud host will therefore exceed the monitoring index, select the cloud host with the second largest actual resource proportion for the proposed migration, and so on , Until the resource monitoring index of the physical cloud host node is lower than the specified index. If all cloud hosts do not meet the migration conditions, and the resource usage exceeds the specified monitoring index, the monitoring event is reported and manual intervention is required.

步骤c，将该目标节点对应的每个云主机迁移至各个云主机待迁移的目标节点，以对目标云主机集群进行负载均衡。Step c: Migrate each cloud host corresponding to the target node to the target node of each cloud host to be migrated, so as to perform load balancing on the target cloud host cluster.

在一种实施方式中，为尽量避免将云主机迁移至某个物理主机后该物理主机出现超标(即超出监控指标)的情况，本申请实施例可以计算各个物理主机中每个硬件资源的预估资源占比，进而基于该预估资源占比判断是否进行云主机迁移，在一些实施例中，(1)如果云主机对应多个目标节点，基于云主机中每个硬件资源的目标资源量，计算将云主机迁移至各个物理主机后，各个物理主机中每个硬件资源的预估资源占比。其中，预估资源占比也即先假设将云主机迁移至该物理主机后，该物理主机可能的资源占比。在一些实施方式中，计算时可参照前述计算实际资源占比的方式计算预估资源占比。(2)针对每个硬件资源，如果物理主机中该硬件资源的预估资源占比小于该硬件资源对应的预设阈值，将云主机迁移至物理主机。In an implementation manner, in order to avoid as much as possible after the cloud host is migrated to a physical host, the physical host exceeds the standard (that is, exceeds the monitoring index), the embodiment of the present application can calculate the forecast of each hardware resource in each physical host. Estimate the proportion of resources, and then determine whether to perform cloud host migration based on the estimated proportion of resources. In some embodiments, (1) If the cloud host corresponds to multiple target nodes, based on the target resource amount of each hardware resource in the cloud host Calculate the estimated resource proportion of each hardware resource in each physical host after the cloud host is migrated to each physical host. Among them, the estimated resource proportion is to first assume the possible resource proportion of the physical host after the cloud host is migrated to the physical host. In some embodiments, the estimated resource proportion can be calculated by referring to the aforementioned method of calculating the actual resource proportion during calculation. (2) For each hardware resource, if the estimated resource proportion of the hardware resource in the physical host is less than the preset threshold corresponding to the hardware resource, the cloud host is migrated to the physical host.

本申请实施例还可以实现跨节点负载均衡，在一些实施方式中，物理主机node 2中云主机集群A需要跨节点访问物理主机node 1的NET资源。为减少跨节点访问资源带来的延时，需要在运行过程中不断检查物理主机node 0和物理主机node 1的NET资源的资源使用率，在资源使用率不超过指定指标的前提下，依然可以进行云主机迁移，以消除跨节点访问资源带来的延迟。The embodiments of the present application can also implement cross-node load balancing. In some implementations, the cloud host cluster A in the physical host node 2 needs to access the NET resources of the physical host node 1 across nodes. In order to reduce the delay caused by cross-node access to resources, it is necessary to constantly check the resource utilization rate of the NET resources of the physical host node 0 and physical host node 1 during the operation process, and the resource utilization rate does not exceed the specified index. Perform cloud host migration to eliminate the delay caused by cross-node access to resources.

为便于对上述实施例提供的云主机集群的负载均衡方法进行理解，本申请实施例提供了另一种云主机集群的负载均衡方法，参见图2所示的另一种云主机集群的负载均衡方法的流程示意图，该方法主要包括以下步骤S202至步骤S210：In order to facilitate the understanding of the load balancing method of the cloud host cluster provided in the foregoing embodiment, the embodiment of the present application provides another load balancing method of the cloud host cluster. Refer to another load balancing method of the cloud host cluster shown in FIG. 2 A schematic flow chart of the method, the method mainly includes the following steps S202 to S210:

步骤S202，初始状态下，根据预置资源配置和负载均衡策略为云主机集群分配物理主机。其中，预置资源配置包括上述第一配置信息和第二配置信息。负载均衡策略可以包括：将云主机中目标资源量最大的资源类型作为主资源；优先配置特殊类型的云主机，并设定各个云主机的资源使用优先级，其中，特殊类型可以是指用户根据实际需求或经验指定的类型；当主资源满足条件，仍然无法确定将云主机绑定的目标物理主机，即无法确定云主机绑定到哪个物理主机时，可以利用次要依赖资源(也即，目标资源量第二大的硬件资源)作为判断依据，不断重复，直到确定将云主机绑定的物理主机。Step S202: In an initial state, physical hosts are allocated to the cloud host cluster according to the preset resource configuration and load balancing strategy. Wherein, the preset resource configuration includes the above-mentioned first configuration information and second configuration information. The load balancing strategy can include: taking the resource type with the largest target resource amount among the cloud hosts as the main resource; prioritizing the configuration of special types of cloud hosts, and setting the priority of each cloud host’s resource usage, where the special type can refer to the user according to The type specified by actual needs or experience; when the primary resource meets the conditions, it is still unable to determine the target physical host to which the cloud host is bound, that is, when it is impossible to determine which physical host the cloud host is bound to, the secondary dependent resource (that is, the target The hardware resource with the second largest resource amount) is used as the judgment basis, and it is repeated continuously until the physical host to which the cloud host is bound is determined.

步骤S204，在云主机运行过程中监听云主机的实际使用量，基于实际资源量计算云主机的资源使用率(也即，资源占比)。本申请实施例在基于实际资源量计算云主机的资源使用量时，支持权重决策模型，根据硬件资源的实际资源占比w _i，进行资源优化配置，其实现过程可参见前述步骤a，此处不再赘述。 Step S204: Monitor the actual usage of the cloud host during the operation of the cloud host, and calculate the resource usage rate (that is, the resource ratio) of the cloud host based on the actual resource amount. When calculating the resource usage of the cloud host based on the actual resource amount, the embodiment of the application supports the weight decision model, and _{optimizes the resource allocation according to the actual resource proportion w i of the} hardware resource. The implementation process can be referred to the aforementioned step a, here No longer.

步骤S206，基于资源使用率判断是否进行负载均衡。如果是，执行步骤S208；如果否，执行步骤S204。例如，当资源使用率大于预设阈值时，确定执行负载均衡，当资源使用率小于预设阈值时，不执行负载均衡。Step S206: Determine whether to perform load balancing based on the resource usage rate. If yes, go to step S208; if no, go to step S204. For example, when the resource usage rate is greater than the preset threshold, load balancing is determined to be performed, and when the resource usage rate is less than the preset threshold, load balancing is not performed.

步骤S208，在云主机运行过程中进行负载均衡。其实现方式可参见前述步骤b对云主机进行负载均衡，此处不再赘述。In step S208, load balancing is performed during the operation of the cloud host. For the implementation method, refer to the foregoing step b to perform load balancing on the cloud host, which will not be repeated here.

步骤S210，在运行过程中对跨节点访问的硬件资源进行负载均衡。在一种实施方式中，假设物理主机node 2中云主机集群A需要跨节点访问物理主机node 1的NET资源。可以运行过程中不断检查物理主机node 0和物理主机node 1的NET资源的资源使用率，在资源使用率不超过指定指标的前提下，依然可以进行云主机迁移，以消除跨节点访问资源带来的延迟。Step S210: Perform load balancing on hardware resources accessed across nodes during the running process. In an implementation manner, it is assumed that the cloud host cluster A in the physical host node 2 needs to access the NET resources of the physical host node 1 across nodes. The resource usage rate of the NET resources of the physical host node 0 and the physical host node 1 can be continuously checked during the operation. On the premise that the resource usage rate does not exceed the specified index, cloud host migration can still be carried out to eliminate cross-node access to resources. Delay.

上述实施例可以通过配置cgroup(control group，控制组群)以调用内核迁移功能，从而实现上述过程完成负载均衡，在另一种实施方式中，可以基于上述实施例对Numad工具的内核进行优化改造，从而实现上述负载均衡过程。In the above embodiment, the kernel migration function can be invoked by configuring a cgroup (control group) to implement the above process to complete load balancing. In another implementation manner, the kernel of the Numad tool can be optimized and modified based on the above embodiment , So as to achieve the above load balancing process.

综上所述，本申请实施例提供的上述云主机集群的负载均衡方法，至少具有以下特点：In summary, the load balancing method for the cloud host cluster provided by the embodiments of the present application has at least the following characteristics:

(1)将除CPU资源和MEM资源以外的其它硬件资源纳入负载均衡的考虑范围，可从整体协调资源配置。(1) Include hardware resources other than CPU resources and MEM resources into the consideration of load balancing, and coordinate resource allocation as a whole.

(2)考虑到硬件资源属性具有差异，如果不进行区分地无差别处理，负载均衡会出现不合理的状况，达不到对硬件资源的高效利用，因此本申请实施例提出利用各个硬件资源的当前剩余资源量确定云主机的目标节点，同时结合运行过程中的负载均衡，可以有效提高硬件资源的利用率。(2) Considering the differences in the attributes of hardware resources, if the indiscriminate processing is not differentiated, the load balancing will appear unreasonable, and the efficient use of hardware resources will not be achieved. Therefore, the embodiments of the present application propose the use of various hardware resources. The current remaining resources determine the target node of the cloud host, and at the same time, combined with load balancing during operation, the utilization of hardware resources can be effectively improved.

对于上述实施例提供的云主机集群的负载均衡方法，本申请实施例还提供了一种云主机集群的负载均衡装置，该装置应用于控制服务器，控制服务器存储有多个物理主机的第一配置信息、多个云主机集群的第二配置信息和各个云主机集群的资源使用优先级，每个物理主机均包括多个硬件资源，云主机集群包括基于物理主机提供硬件资源搭建的多个云主机，所述第一配置信息包括各个所述物理主机中每个所述硬件资源的当前剩余资源量，所述第二配置信息包括所述目标云主机集群中每个云主机分别所需的硬件资源的目标资源占比和目标资源量，参见图3所示的一种云主机集群的负载均衡装置的结构示意图，该装置主要包括以下模块：Regarding the load balancing method for a cloud host cluster provided in the foregoing embodiment, an embodiment of the present application also provides a load balancing device for a cloud host cluster. The device is applied to a control server, and the control server stores first configurations of multiple physical hosts. Information, the second configuration information of multiple cloud host clusters, and the resource usage priority of each cloud host cluster. Each physical host includes multiple hardware resources. The cloud host cluster includes multiple cloud hosts built based on physical hosts to provide hardware resources. The first configuration information includes the current remaining resource amount of each of the hardware resources in each of the physical hosts, and the second configuration information includes the hardware resources required by each cloud host in the target cloud host cluster. For the target resource ratio and target resource amount, refer to the schematic structural diagram of a load balancing device for a cloud host cluster shown in Figure 3. The device mainly includes the following modules:

集群确定模块302，设置为按照资源使用优先级从各个云主机集群中确定目标云主机集群。The cluster determining module 302 is configured to determine a target cloud host cluster from each cloud host cluster according to the priority of resource use.

节点确定模块304，设置为基于第一配置信息和第二配置信息，从各个物理主机中确定目标云主机集群中每个云主机对应的目标节点，以通过目标节点为目标云主机集群提供目标资源量的硬件资源。The node determination module 304 is configured to determine the target node corresponding to each cloud host in the target cloud host cluster from each physical host based on the first configuration information and the second configuration information, so as to provide target resources for the target cloud host cluster through the target node The amount of hardware resources.

使用量监听模块306，设置为监听目标云主机集群中每个硬件资源的实际使用量。The usage monitoring module 306 is set to monitor the actual usage of each hardware resource in the target cloud host cluster.

节点调整模块308，设置为根据实际使用量调整目标云主机集群的目标节点，以对目标云主机集群进行负载均衡。The node adjustment module 308 is configured to adjust the target node of the target cloud host cluster according to the actual usage, so as to perform load balancing on the target cloud host cluster.

本申请实施例提供的上述云主机集群的负载均衡装置，基于预先存储的第一配置信息和第二配置信息确定目标云主机集群中每个云主机对应的目标节点，并基于云主机集群中每个硬件资源的实际使用量调整目标云主机集群的目标节点，从而较好地实现目标云主机集群的负载均衡，相较于相关技术中的负载均衡方法容易存在资源抢占和资源浪费的情况，本申请实施例通过基于目标云主机集群中每个硬件资源的实际使用量实时对目标节点进行调整，从而较好地利用了目标节点提供的硬件资源，有效缓解了资源抢占和资源浪费等情况，进而有效提高了云主机负载均衡的合理性。The load balancing device for the cloud host cluster provided by the embodiment of the present application determines the target node corresponding to each cloud host in the target cloud host cluster based on the pre-stored first configuration information and second configuration information, and is based on each cloud host cluster in the cloud host cluster. The actual usage of each hardware resource adjusts the target node of the target cloud host cluster, so as to better realize the load balancing of the target cloud host cluster. Compared with the load balancing method in related technologies, it is prone to resource preemption and resource waste. The application embodiment adjusts the target node in real time based on the actual usage of each hardware resource in the target cloud host cluster, thereby making better use of the hardware resources provided by the target node, effectively alleviating resource preemption and resource waste, and then Effectively improve the rationality of cloud host load balancing.

在一种实施方式中，上述节点确定模块304还设置为：根据第二配置信息中每个硬件资源的目标资源占比的大小，确定目标云主机集群的主资源；从各个物理主机中选取包含有主资源的物理主机；根据选取的物理主机中主资源的当前剩余资源量的大小，确定目标云主机集群中每个云主机对应的目标节点。In one embodiment, the above-mentioned node determining module 304 is further configured to: determine the main resource of the target cloud host cluster according to the proportion of the target resource of each hardware resource in the second configuration information; A physical host with a main resource; according to the current remaining resource amount of the main resource in the selected physical host, the target node corresponding to each cloud host in the target cloud host cluster is determined.

在一种实施方式中，上述节点确定模块304还设置为：从目标云主机集群中随机选取云主机；将选取的物理主机中主资源的当前剩余资源量最大的物理主机，确定为选取的云主机对应的目标节点；计算选取的云主机对应的目标节点中主资源的当前剩余资源量，与选取的云主机中主资源的目标资源量之间的差值；基于差值更新选取的云主机对应的目标节点中主资源的当前剩余资源量；从目标云主机集群的其余云主机中随机选取下一云主机，将选取的物理主机中主资源的当前剩余资源量最大的物理主机，确定为选取的下一云主机对应的目标节点，直至确定目标云主机集群中每个云主机对应的目标节点。In one embodiment, the above-mentioned node determination module 304 is further configured to: randomly select a cloud host from the target cloud host cluster; determine the physical host with the largest amount of current remaining resources of the main resource among the selected physical hosts as the selected cloud The target node corresponding to the host; calculate the difference between the current remaining resource amount of the main resource in the target node corresponding to the selected cloud host and the target resource amount of the main resource in the selected cloud host; update the selected cloud host based on the difference The current remaining resource amount of the main resource in the corresponding target node; randomly select the next cloud host from the remaining cloud hosts in the target cloud host cluster, and determine the physical host with the largest amount of current remaining resources of the main resource among the selected physical hosts as Select the target node corresponding to the next cloud host until the target node corresponding to each cloud host in the target cloud host cluster is determined.

在一种实施方式中，上述节点调整模块308还设置为：针对每个目标节点，根据该目标节点对应的多个云主机中每个硬件资源的实际使用量和每个硬件资源的当前剩余资源量，计算该目标节点中每个硬件资源的资源使用率；如果该目标节点中硬件资源的资源使用率大于或等于硬件资源对应的预设阈值，根据每个物理主机中硬件资源的当前剩余资源量的大小，确定该目标节点对应的每个云主机待迁移的目标节点；将该目标节点对应的每个云主机迁移至该云主机待迁移的目标节点，以对目标云主机集群进行负载均衡。In an implementation manner, the aforementioned node adjustment module 308 is further configured to: for each target node, according to the actual usage of each hardware resource in the multiple cloud hosts corresponding to the target node and the current remaining resources of each hardware resource Calculate the resource usage rate of each hardware resource in the target node; if the resource usage rate of the hardware resource in the target node is greater than or equal to the preset threshold corresponding to the hardware resource, according to the current remaining resources of the hardware resource in each physical host Determine the target node of each cloud host corresponding to the target node to be migrated; migrate each cloud host corresponding to the target node to the target node of the cloud host to be migrated, so as to load balance the target cloud host cluster .

在一种实施方式中，上述节点调整模块308还设置为：计算该目标节点对应的多个云主机中每个硬件资源的实际使用量和该目标节点中每个硬件资源的标定资源量的比值，得到各个云主机中每个硬件资源的临时资源占比；针对每个云主机，对该云主机中每个硬件资源的临时资源占比进行归一化处理，得到该云主机每个硬件资源的实际资源占比；根据该目标节点对应的各个云主机中每个硬件资源的实际资源占比和目标资源量，计算该目标节点中各个硬件资源的资源使用率。In one embodiment, the above-mentioned node adjustment module 308 is further configured to: calculate the ratio of the actual usage of each hardware resource in the multiple cloud hosts corresponding to the target node to the calibrated resource amount of each hardware resource in the target node , Obtain the temporary resource proportion of each hardware resource in each cloud host; for each cloud host, normalize the temporary resource proportion of each hardware resource in the cloud host to obtain each hardware resource of the cloud host According to the actual resource proportion and target resource amount of each hardware resource in each cloud host corresponding to the target node, calculate the resource utilization rate of each hardware resource in the target node.

在一种实施方式中，上述节点调整模块308还设置为：分别计算该目标节点对应的每个云主机中每个所述硬件资源的实际使用量和该目标节点中每个所述硬件资源的标定资源量的比值，得到各个所述云主机中每个所述硬件资源的临时资源占比；In one embodiment, the aforementioned node adjustment module 308 is further configured to calculate the actual usage of each hardware resource in each cloud host corresponding to the target node and the actual usage of each hardware resource in the target node. Calibrating the ratio of the amount of resources to obtain the proportion of temporary resources of each of the hardware resources in each of the cloud hosts;

在一种实施方式中，上述装置还包括迁移模块，设置为：如果云主机对应多个目标节点，基于云主机中每个硬件资源的目标资源量，计算将云主机迁移至各个物理主机后，各个物理主机中每个硬件资源的预估资源占比；针对每个硬件资源，如果物理主机中该硬件资源的预估资源占比小于该硬件资源对应的预设阈值，将云主机迁移至物理主机。In one embodiment, the above-mentioned device further includes a migration module, which is configured to: if the cloud host corresponds to multiple target nodes, calculate based on the target resource amount of each hardware resource in the cloud host, after the cloud host is migrated to each physical host, The estimated resource proportion of each hardware resource in each physical host; for each hardware resource, if the estimated resource proportion of the hardware resource in the physical host is less than the preset threshold corresponding to the hardware resource, the cloud host is migrated to the physical Host.

在一种实施方式中，上述硬件资源包括CPU资源、内存资源、FPU资源、FPGA资源、GPU资源和网络资源中的一种或多种。In an embodiment, the aforementioned hardware resources include one or more of CPU resources, memory resources, FPU resources, FPGA resources, GPU resources, and network resources.

本申请实施例所提供的装置，其实现原理及产生的技术效果和前述方法实施例相同，为简要描述，装置实施例部分未提及之处，可参考前述方法实施例中相应内容。The implementation principles and technical effects of the device provided in the embodiment of the application are the same as those of the foregoing method embodiment. For a brief description, for the parts not mentioned in the device embodiment, please refer to the corresponding content in the foregoing method embodiment.

本申请实施例提供了一种服务器，在一些实施例中，该服务器包括处理器和存储装置；存储装置上存储有计算机程序，计算机程序在被所述处理器运行时执行如上所述实施方式的任一项所述的方法。The embodiments of the present application provide a server. In some embodiments, the server includes a processor and a storage device; a computer program is stored on the storage device. Any one of the methods.

图4为本申请实施例提供的一种服务器的结构示意图，该服务器100包括：处理器40，存储器41，总线42和通信接口43，所述处理器40、通信接口43和存储器41通过总线42连接；处理器40用于执行存储器41中存储的可执行模块，例如计算机程序。4 is a schematic structural diagram of a server provided by an embodiment of the application. The server 100 includes a processor 40, a memory 41, a bus 42 and a communication interface 43. The processor 40, the communication interface 43 and the memory 41 pass through the bus 42. Connection; The processor 40 is used to execute an executable module stored in the memory 41, such as a computer program.

其中，存储器41可能包含高速随机存取存储器(RAM，Random Access Memory)，也可能还包括非不稳定的存储器(non-volatile memory)，例如至少一个磁盘存储器。通过至少一个通信接口43(可以是有线或者无线)实现该***网元与至少一个其他网元之间的通信连接，可以使用互联网，广域网，本地网，城域网等。The memory 41 may include a high-speed random access memory (RAM, Random Access Memory), and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the system network element and at least one other network element is realized through at least one communication interface 43 (which may be wired or wireless), and the Internet, a wide area network, a local network, a metropolitan area network, etc. may be used.

总线42可以是ISA总线、PCI总线或EISA总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示，图4中仅用一个双向箭头表示，但并不表示仅有一根总线或一种类型的总线。The bus 42 may be an ISA bus, a PCI bus, an EISA bus, or the like. The bus can be divided into an address bus, a data bus, a control bus, and so on. For ease of presentation, only one bidirectional arrow is used to indicate in FIG. 4, but it does not mean that there is only one bus or one type of bus.

其中，存储器41用于存储程序，所述处理器40在接收到执行指令后，执行所述程序，前述本申请实施例任一实施例揭示的流过程定义的装置所执行的方法可以应用于处理器40中，或者由处理器40实现。The memory 41 is used to store a program, and the processor 40 executes the program after receiving an execution instruction. The method executed by the device for stream process definition disclosed in any of the foregoing embodiments of the present application can be applied to processing In the device 40, or implemented by the processor 40.

处理器40可能是一种集成电路芯片，具有信号的处理能力。在实现过程中，上述方法的各步骤可以通过处理器40中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器40可以是通用处理器，包括中央处理器(Central Processing Unit，简称CPU)、网络处理器(Network Processor，简称NP)等；还可以是数字信号处理器(Digital Signal Processing，简称DSP)、专用集成电路(Application Specific Integrated Circuit，简称ASIC)、现成可编程门阵列(Field-Programmable Gate Array，简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器，闪存、只读存储器，可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器41，处理器40读取存储器41中的信息，结合其硬件完成上述方法的步骤。The processor 40 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above method can be completed by an integrated logic circuit of hardware in the processor 40 or instructions in the form of software. The aforementioned processor 40 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; it may also be a digital signal processor (Digital Signal Processing, DSP for short). ), Application Specific Integrated Circuit (ASIC for short), Field-Programmable Gate Array (FPGA for short) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The storage medium is located in the memory 41, and the processor 40 reads the information in the memory 41, and completes the steps of the above method in combination with its hardware.

在本申请提供的又一实施例中，还提供了一种计算机可读存储介质，该计算机可读存储介质内存储有计算机程序，所述计算机程序被处理器执行时实现上述任一云主机集群的负载均衡方法的步骤。In yet another embodiment provided in this application, a computer-readable storage medium is also provided. The computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it implements any of the above-mentioned cloud host clusters. The steps of the load balancing method.

本申请实施例所提供的可读存储介质的计算机程序产品，包括存储了程序代码的计算机可读存储介质，所述程序代码包括的指令可用于执行前面方法实施例中所述的方法，其实现过程可参见前述方法实施例，在此不再赘述。The computer program product of the readable storage medium provided by the embodiment of the present application includes a computer readable storage medium storing program code. The instructions included in the program code can be used to execute the method described in the previous method embodiment, which implements For the process, refer to the foregoing method embodiment, which will not be repeated here.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对相关技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application essentially or the part that contributes to the related technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including several The instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes. .

工业实用性Industrial applicability

基于本申请实施例提供的云主机集群的负载均衡方法、装置及服务器，可以基于预先存储的第一配置信息和第二配置信息确定目标云主机集群中每个云主机对应的目标节点，并基于云主机集群中每个硬件资源的实际使用量调整目标云主机集群的目标节点，从而较好地实现目标云主机集群的负载均衡，相较于相关技术中的负载均衡方法容易存在资源抢占和资源浪费的情况，本申请实施例通过基于目标云主机集群中每个硬件资源的实际使用量实时对目标节点进行调整，从而较好地利用了目标节点提供的硬件资源，有效缓解了资源抢占和资源浪费等情况，进而有效提高了云主机负载均衡的合理性。Based on the load balancing method, device, and server of the cloud host cluster provided by the embodiments of the present application, the target node corresponding to each cloud host in the target cloud host cluster can be determined based on the pre-stored first configuration information and second configuration information, and based on The actual usage of each hardware resource in the cloud host cluster adjusts the target node of the target cloud host cluster, so as to better realize the load balancing of the target cloud host cluster. Compared with the load balancing method in related technologies, it is prone to resource preemption and resources In the case of waste, the embodiment of the present application adjusts the target node in real time based on the actual usage of each hardware resource in the target cloud host cluster, thereby making better use of the hardware resources provided by the target node, and effectively alleviating resource preemption and resource preemption. Waste and other situations effectively improve the rationality of cloud host load balancing.

最后应说明的是：以上所述实施例，仅为本申请的具体实施方式，用以说明本申请的技术方案，而非对其限制，本申请的保护范围并不局限于此，尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化，或者对其中部分技术特征进行等同替换；而这些修改、变化或者替换，并不使相应技术方案的本质脱离本申请实施例技术方案的精神和范围，都应涵盖在本发明的保护范围之内。因此，本申请的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that the above-mentioned embodiments are only specific implementations of this application, which are used to illustrate the technical solution of this application, rather than limit it. The scope of protection of this application is not limited to this, although referring to the foregoing The embodiments describe the application in detail, and those of ordinary skill in the art should understand that any person skilled in the art can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in this application. Or it may be easily conceived of changes, or equivalent replacements of some of the technical features; and these modifications, changes or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application, and should be covered by the present invention Within the scope of protection. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

一种云主机集群的负载均衡方法，应用于控制服务器，所述控制服务器存储有多个物理主机的第一配置信息、多个云主机集群的第二配置信息和各个所述云主机集群的资源使用优先级，每个所述物理主机均包括一个或多个硬件资源，所述云主机集群包括基于所述物理主机提供所述硬件资源搭建的多个云主机，所述第一配置信息包括各个所述物理主机中每个所述硬件资源的当前剩余资源量，所述第二配置信息包括所述云主机集群中每个云主机分别所需的硬件资源的目标资源占比和目标资源量，所述方法包括：A load balancing method for a cloud host cluster is applied to a control server, the control server stores first configuration information of multiple physical hosts, second configuration information of multiple cloud host clusters, and resources of each of the cloud host clusters Using priority, each of the physical hosts includes one or more hardware resources, the cloud host cluster includes multiple cloud hosts built based on the physical hosts providing the hardware resources, and the first configuration information includes each The current remaining resource amount of each of the hardware resources in the physical host, and the second configuration information includes the target resource proportion and target resource amount of the hardware resources separately required by each cloud host in the cloud host cluster, The method includes:

按照所述资源使用优先级从各个所述云主机集群中确定目标云主机集群；Determining a target cloud host cluster from each of the cloud host clusters according to the resource use priority;

基于所述第一配置信息和所述第二配置信息，从各个所述物理主机中确定所述目标云主机集群中每个所述云主机对应的物理主机，作为目标节点，以通过所述目标节点为所述目标云主机集群提供所述目标资源量的硬件资源；Based on the first configuration information and the second configuration information, a physical host corresponding to each cloud host in the target cloud host cluster is determined from each of the physical hosts as a target node to pass the target The node provides hardware resources of the target resource amount for the target cloud host cluster;

监听所述目标云主机集群中每个所述硬件资源的实际使用量；Monitor the actual usage of each of the hardware resources in the target cloud host cluster;

根据所述实际使用量调整所述目标云主机集群的目标节点，以对所述目标云主机集群进行负载均衡。Adjust the target node of the target cloud host cluster according to the actual usage amount to perform load balancing on the target cloud host cluster.
根据权利要求1所述的方法，其中，所述基于所述第一配置信息和所述第二配置信息，从各个所述物理主机中确定所述目标云主机集群中每个所述云主机对应的物理主机，作为目标节点的步骤，包括：The method according to claim 1, wherein said determining from each of said physical hosts that each said cloud host in said target cloud host cluster corresponds to said first configuration information and said second configuration information The steps of the physical host as the target node include:

根据所述第二配置信息中每个所述硬件资源的目标资源占比的大小，确定所述目标云主机集群的主资源；Determine the main resource of the target cloud host cluster according to the size of the target resource proportion of each of the hardware resources in the second configuration information;

根据所述第一配置信息，从各个所述物理主机中选取包含有所述主资源的物理主机；According to the first configuration information, select a physical host containing the primary resource from each of the physical hosts;

根据选取的物理主机中所述主资源的当前剩余资源量的大小，确定所述目标云主机集群中每个所述云主机对应的目标节点。Determine the target node corresponding to each cloud host in the target cloud host cluster according to the size of the current remaining resource amount of the main resource in the selected physical host.
根据权利要求2所述的方法，其中，所述根据选取的物理主机中所述主资源的当前剩余资源量的大小，确定所述目标云主机集群中每个所述云主机对应的目标节点的步骤，包括：The method according to claim 2, wherein said determining the value of the target node corresponding to each cloud host in the target cloud host cluster according to the size of the current remaining resource amount of the main resource in the selected physical host The steps include:

从所述目标云主机集群中随机选取云主机；Randomly selecting a cloud host from the target cloud host cluster;

将选取的物理主机中所述主资源的当前剩余资源量最大的物理主机，确定为选取的云主机对应的目标节点；Determine the physical host with the largest amount of current remaining resources of the main resource among the selected physical hosts as the target node corresponding to the selected cloud host;

计算当前剩余资源量与选取的云主机的目标资源量之间的差值，其中，所述当前剩余资源量为选取的云主机对应的目标节点当前剩余的资源量；Calculating the difference between the current remaining resource amount and the target resource amount of the selected cloud host, where the current remaining resource amount is the current remaining resource amount of the target node corresponding to the selected cloud host;

基于所述差值更新选取的云主机对应的目标节点的当前剩余资源量；Update the current remaining resource amount of the target node corresponding to the selected cloud host based on the difference;

从所述目标云主机集群的其余云主机中随机选取下一云主机，将选取的物理主机中所述主资源的当前剩余资源量最大的物理主机，确定为选取的下一云主机对应的目标节点，直至确定所述目标云主机集群中每个所述云主机对应的目标节点。The next cloud host is randomly selected from the remaining cloud hosts in the target cloud host cluster, and the physical host with the largest amount of current remaining resources of the main resource among the selected physical hosts is determined as the target corresponding to the selected next cloud host Node until the target node corresponding to each cloud host in the target cloud host cluster is determined.
根据权利要求1所述的方法，其中，所述根据所述实际使用量调整所述目标云主机集群的目标节点，以对所述目标云主机集群进行负载均衡的步骤，包括：The method according to claim 1, wherein the step of adjusting the target node of the target cloud host cluster according to the actual usage to perform load balancing on the target cloud host cluster comprises:

针对每个目标节点，根据该目标节点对应的多个云主机中每个所述硬件资源的实际使用量和每个所述硬件资源的当前剩余资源量，计算该目标节点中每个所述硬件资源的资源使用率；For each target node, calculate each hardware resource in the target node according to the actual usage amount of each hardware resource in the multiple cloud hosts corresponding to the target node and the current remaining resource amount of each hardware resource Resource utilization rate of resources;

如果该目标节点中所述硬件资源的资源使用率大于或等于所述硬件资源对应的预设阈值，根据每个所述物理主机中所述硬件资源的当前剩余资源量的大小，确定该目标节点对应的每个所述云主机待迁移的目标节点；If the resource usage rate of the hardware resource in the target node is greater than or equal to the preset threshold corresponding to the hardware resource, determine the target node according to the current remaining resource amount of the hardware resource in each physical host Corresponding target node to be migrated for each cloud host;

将该目标节点对应的每个云主机迁移至该云主机待迁移的目标节点，以对所述目标云主机集群进行负载均衡。Each cloud host corresponding to the target node is migrated to the target node of the cloud host to be migrated, so as to perform load balancing on the target cloud host cluster.
根据权利要求4所述的方法，其中，所述根据该目标节点对应的多个云主机中每个所述硬件资源的实际使用量和每个所述硬件资源的当前剩余资源量，计算该目标节点中每个所述硬件资源的资源使用率的步骤，包括：The method according to claim 4, wherein the target node is calculated based on the actual usage amount of each hardware resource and the current remaining resource amount of each hardware resource in the plurality of cloud hosts corresponding to the target node The steps of the resource usage rate of each of the hardware resources in the node include:

计算该目标节点对应的多个云主机中每个所述硬件资源的实际使用量和该目标节点中每个所述硬件资源的标定资源量的比值，得到各个所述云主机中每个所述硬件资源的临时资源占比；Calculate the ratio of the actual usage of each of the hardware resources in the multiple cloud hosts corresponding to the target node to the calibrated resource amount of each of the hardware resources in the target node to obtain each of the cloud hosts The proportion of temporary resources of hardware resources;

针对每个云主机，对该云主机中每个所述硬件资源的临时资源占比进行归一化处理，得到该云主机每个所述硬件资源的实际资源占比；For each cloud host, normalize the temporary resource proportion of each of the hardware resources in the cloud host to obtain the actual resource proportion of each of the hardware resources of the cloud host;

根据该目标节点对应的各个所述云主机中每个所述硬件资源的实际资源占比和目标资源量，计算该目标节点中每个所述硬件资源的资源使用率。Calculate the resource usage rate of each hardware resource in the target node according to the actual resource proportion and target resource amount of each hardware resource in each of the cloud hosts corresponding to the target node.
根据权利要求4所述的方法，其中，所述根据该目标节点对应的多个云主机中每个所述硬件资源的实际使用量和每个所述硬件资源的当前剩余资源量，计算该目标节点中每个所述硬件资源的资源使用率的步骤，包括：The method according to claim 4, wherein the target node is calculated based on the actual usage amount of each hardware resource and the current remaining resource amount of each hardware resource in the plurality of cloud hosts corresponding to the target node The steps of the resource usage rate of each of the hardware resources in the node include:

分别计算该目标节点对应的每个云主机中每个所述硬件资源的实际使用量和该目标节点中每个所述硬件资源的标定资源量的比值，得到各个所述云主机中每个所述硬件资源的临时资源占比；Calculate the ratio of the actual usage of each hardware resource in each cloud host corresponding to the target node to the calibrated resource amount of each hardware resource in the target node, and obtain each cloud host The proportion of temporary resources of the hardware resources;

针对该目标节点的每个所述硬件资源，对每个云主机中该硬件资源的临时占比进行归一化处理，得到该云主机中该硬件资源的实际资源占比；For each hardware resource of the target node, normalize the temporary proportion of the hardware resource in each cloud host to obtain the actual resource proportion of the hardware resource in the cloud host;

根据该目标节点对应的各个所述云主机中每个所述硬件资源的实际资源占比和所属硬件资源的标定资源量，计算该目标节点中每个所述硬件资源的资源使用率。Calculate the resource usage rate of each hardware resource in the target node according to the actual resource proportion of each hardware resource in each cloud host corresponding to the target node and the calibrated resource amount of the hardware resource to which it belongs.
根据权利要求1所述的方法，其中，所述方法还包括：The method according to claim 1, wherein the method further comprises:

如果所述云主机对应多个目标节点，基于所述云主机中每个所述硬件资源的目标资源量，计算将所述云主机迁移至各个所述物理主机后，各个所述物理主机中每个所述硬件资源的资源占比，作为预估资源占比；If the cloud host corresponds to multiple target nodes, based on the target resource amount of each of the hardware resources in the cloud host, calculate that after the cloud host is migrated to each of the physical hosts, each of the physical hosts The resource proportion of each of the hardware resources is used as the estimated resource proportion;

针对每个硬件资源，如果所述物理主机中该硬件资源的预估资源占比小于该硬件资源对应的预设阈值，将所述云主机迁移至所述物理主机。For each hardware resource, if the estimated resource proportion of the hardware resource in the physical host is less than the preset threshold corresponding to the hardware resource, the cloud host is migrated to the physical host.
根据权利要求1所述的方法，其中，所述硬件资源包括CPU资源、GPU资源、FPU资源、FPGA资源、内存资源和网络资源中的一种或多种。The method according to claim 1, wherein the hardware resources include one or more of CPU resources, GPU resources, FPU resources, FPGA resources, memory resources, and network resources.
一种云主机集群的负载均衡装置，应用于控制服务器，所述控制服务器存储有多个物理主机的第一配置信息、多个云主机集群的第二配置信息和各个所述云主机集群的资源使用优先级，每个所述物理主机均包括多个硬件资源，所述云主机集群包括基于所述物理主机提供所述硬件资源搭建的多个云主机，所述第一配置信息包括各个所述物理主机中每个所述硬件资源的当前剩余资源量，所述第二配置信息包括所述云主机集群中每个云主机分别所需的硬件资源的目标资源占比和目标资源量，所述装置包括：A load balancing device for a cloud host cluster is applied to a control server. The control server stores first configuration information of multiple physical hosts, second configuration information of multiple cloud host clusters, and resources of each of the cloud host clusters Using priority, each of the physical hosts includes multiple hardware resources, the cloud host cluster includes multiple cloud hosts built based on the physical hosts providing the hardware resources, and the first configuration information includes each of the The current remaining resource amount of each of the hardware resources in the physical host, the second configuration information includes the target resource proportion and the target resource amount of the hardware resources separately required by each cloud host in the cloud host cluster, and The device includes:

集群确定模块，设置为按照所述资源使用优先级从各个所述云主机集群中确定目标云主机集群；A cluster determining module, configured to determine a target cloud host cluster from each of the cloud host clusters according to the resource use priority;

节点确定模块，设置为基于所述第一配置信息和所述第二配置信息，从各个所述物理主机中确定所述目标云主机集群中每个所述云主机对应的物理主机，作为目标节点，以通过所述目标节点为所述目标云主机集群提供所述目标资源量的硬件资源；A node determining module, configured to determine a physical host corresponding to each cloud host in the target cloud host cluster from each of the physical hosts based on the first configuration information and the second configuration information, as a target node , To provide the target cloud host cluster with hardware resources of the target resource amount through the target node;

使用量监听模块，设置为监听所述目标云主机集群中每个所述硬件资源的实际使用量；The usage monitoring module is set to monitor the actual usage of each hardware resource in the target cloud host cluster;

节点调整模块，设置为根据所述实际使用量调整所述目标云主机集群的所述目标节点，以对所述目标云主机集群进行负载均衡。The node adjustment module is configured to adjust the target node of the target cloud host cluster according to the actual usage amount to perform load balancing on the target cloud host cluster.
一种服务器，包括处理器和存储器；A server including a processor and a memory;

所述存储器上存储有计算机程序，所述计算机程序在被所述处理器运行时执行如权利要求1至8任一项所述的方法。A computer program is stored on the memory, and the computer program executes the method according to any one of claims 1 to 8 when the computer program is run by the processor.
一种计算机存储介质，用于储存为权利要求1至8任一项所述方法所用的计算机软件指令。A computer storage medium for storing computer software instructions used in the method of any one of claims 1 to 8.