CN115701585A

CN115701585A - Instance migration method and device and related equipment

Info

Publication number: CN115701585A
Application number: CN202110880996.XA
Authority: CN
Inventors: 马富达; 林维博
Original assignee: Huawei Cloud Computing Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2021-08-02
Filing date: 2021-08-02
Publication date: 2023-02-10

Abstract

The application provides an example migration method, an example migration device and related equipment, wherein the method is applied to a computing management platform, the computing management platform is used for scheduling examples running on a plurality of computing equipment, and the method comprises the following steps: acquiring computing resource use conditions of the plurality of computing devices; determining optimization objectives for the plurality of computing devices; determining constraints for the plurality of computing devices, wherein the constraints for each computing device indicate capacity constraints for various types of instances running on that computing device; obtaining an instance migration scheme according to the computing resource use conditions, the optimization targets and the constraint conditions of the plurality of computing devices; migrating instances running on the plurality of computing devices according to the instance migration scheme. The method realizes the management of the instance inventory by adding the capacity constraint, and effectively improves the utilization rate of the computing resources.

Description

Instance migration method and device and related equipment

Technical Field

The present application relates to the field of cloud computing technologies, and in particular, to a method and an apparatus for instance migration and a related device.

Background

Cloud service providers generally provide computing resources of various specifications to tenants, and the computing resources generally exist in the form of virtual machines or containers. In order to improve the utilization rate of computing resources, a cloud service provider may mix and deploy virtual machines of different types and different SLAs on the same host, which is also called mixed deployment.

Over time, the cloud platform may gradually generate resource fragments, thereby causing a situation that allocable amount and utilization rate of resources are reduced. Therefore, by migrating the instances, that is, migrating the instances from one computing device to another computing device, the arrangement of the computing resources can be realized, thereby reducing the generation of resource fragments and improving the utilization rate of the computing resources. The migration mode is usually a hot migration, that is, migration of an instance is performed without interrupting service operation. Therefore, the execution frequency of the thermomigration operation should not be too high.

However, existing example migration schemes do not account for mixed deployment scenarios at the time of deployment. Therefore, how to provide the instance migration method in the mixed deployment scenario becomes an urgent problem to be solved.

Disclosure of Invention

The application provides an example migration method, an example migration device and related equipment, which can solve the problem.

In a first aspect, a method for instance migration is provided, where the method includes: acquiring computing resource use conditions of the plurality of computing devices; determining optimization objectives for the plurality of computing devices; determining constraints for the plurality of computing devices, wherein the constraints for each computing device indicate capacity constraints for various types of instances running on that computing device; obtaining an instance migration scheme according to the computing resource use conditions, the optimization targets and the constraint conditions of the plurality of computing devices; migrating instances running on the plurality of computing devices according to the instance migration scheme.

According to the method, the capacity constraint is added in the optimization process, the combination of the management and the mixed deployment of the instance inventory is realized, and the utilization rate of computing resources is effectively improved.

In some possible implementations, the capacity constraint indicates a proportion of computing resources occupied by the various types of instances to computing resources of the computing device.

In some possible implementations, different types of instances running on the plurality of computing devices occupy different types of computing resources at different proportions of the computing resources.

In some possible implementations, the different types of instances running on the multiple computing devices include a shared type and an exclusive type, where the shared type indicates that the vcpus included in the instances are bound to the CPUs on the multiple computing devices, and the exclusive type indicates that the vcpus included in the instances are not bound to the CPUs on the multiple computing devices.

In some possible implementations, the optimization objectives include a minimum number of running computing devices and/or a maximum number of issuable instances of a specified instance specification, and a minimum migration time and/or a minimum number of migration instances; wherein the running computing device number indicates the number of computing devices occupied by the running instance after migration is completed, the issuable amount of the specified instance specification indicates the number of the specified instance specification that can be issued after migration is completed, the migration time indicates the sum of the time required to migrate an instance, and the number of migration instances indicates the number of instances to migrate.

In some possible implementations, the constraints include any one or more of: the method comprises the following steps of affinity constraint, load type clustering constraint and running duration clustering constraint, wherein the affinity constraint indicates the position relation among computing devices where each instance is located, the load type clustering constraint indicates the constraint determined according to the load types of the instances, and the running duration clustering constraint indicates the constraint determined according to the remaining running duration.

In some possible implementation manners, before obtaining the instance migration solution according to the computing resource usage, the optimization goal, and the constraint condition of each computing device at the current time, the method further includes: obtaining the residual running time of the instances running on each computing device at the current moment according to the computing resource use condition and the prediction model of each computing device at the historical moment; and determining the running time clustering constraint according to the residual running time of the running examples on each computing device at the current moment.

In a second aspect, a computing management platform is provided, the computing management platform comprising: the interaction unit is used for acquiring the use conditions of the computing resources of the plurality of computing devices; a processing unit to determine optimization objectives for the plurality of computing devices; determining constraints for the plurality of computing devices, wherein the constraints for each computing device indicate capacity constraints for various types of instances running on that computing device; obtaining an instance migration scheme according to the computing resource use conditions, the optimization targets and the constraint conditions of the plurality of computing devices; migrating instances running on the plurality of computing devices according to the instance migration scheme.

In some possible implementations, the optimization objectives include a minimum number of running computing devices and/or a maximum number of issuable instances of a specified instance specification, and a minimum migration time and/or a minimum number of migration instances; the running computing device number indicates the number of computing devices occupied by running instances after migration is completed, the issuable amount of the specified instance specification indicates the number of the specified instance specification which can be issued after migration is completed, the migration time indicates the sum of time required for migrating the instances, and the migration instance number indicates the number of the instances to be migrated.

In some possible implementations, the constraints further include any one or more of: the method comprises the following steps of affinity constraint, load type clustering constraint and running duration clustering constraint, wherein the affinity constraint indicates the position relation among computing devices where each instance is located, the load type clustering constraint indicates the constraint determined according to the load types of the instances, and the running duration clustering constraint indicates the constraint determined according to the remaining running duration.

In some possible implementation manners, the processing unit is further configured to obtain, according to the usage conditions of the computing resources of each computing device at the historical time and the prediction model, a remaining operation time of the instance running on each computing device at the current time; and determining the running time clustering constraint according to the residual running time of the running examples on each computing device at the current moment.

In a third aspect, a cluster of computing devices is provided, the computing devices comprising a processor and a memory; the processor is configured to execute the instructions stored by the memory to cause the computing device to implement the method as provided in the first aspect above or any possible implementation manner of the first aspect.

In a fourth aspect, there is provided a computer program product comprising a computer program that, when read and executed by a computing device, causes the computing device to perform the method as provided above for the first aspect or any possible implementation manner of the first aspect.

In a fifth aspect, there is provided a non-transitory computer readable storage medium having stored thereon instructions for implementing a method as provided in the first aspect above or any possible implementation manner of the first aspect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below.

FIG. 1 is a schematic illustration of an example migration system to which the present application relates;

FIG. 2 is a flow diagram of an example migration method to which the present application relates;

FIG. 3 is a schematic illustration of a configuration interface to which the present application relates;

FIG. 4 is a schematic diagram of a computing management platform to which the present application relates;

FIG. 5 is a schematic diagram of a computing device to which the present application relates;

FIG. 6 is a schematic diagram of a cluster of computing devices to which the present application relates;

FIG. 7 is a schematic diagram of yet another cluster of computing devices to which the present application relates;

FIG. 8 is a schematic diagram of yet another cluster of computing devices to which the present application relates.

Detailed Description

Reference will first be made to some terms referred to in this application.

Computing resources: the computing resources include resources needed to perform computing operations. Such as Central Processing Unit (CPU) or Graphics Processing Unit (GPU) resources, storage resources (memory space), network resources, etc.

Example (c): a cloud service provider provides an instance service, an instance being a computing unit with computing resources that can run the tenant's tasks. Common examples include a Virtual Machine (VM), a Container (Container), or a bare metal server.

Example types: the various types of computing resources contained in the different types of instances have different proportions or different Quality of Service (QoS).

For example types that include different types of computing resources, examples that include processor resources and memory resources are given. An instance containing 24 virtual central processing units (vcpus) and 48 gigabytes (gigabytes, giB) is referred to as a processor-enhanced instance. While the example containing 24vCPU and 192GiB is referred to as a memory enhanced example. Further, the example containing, for example, 24vCPU and 96GiB is referred to as a universal example.

Examples of different types include shared and exclusive. Wherein the exclusive instance indicates that the vCPU contained by the instance is not bound to the CPUs on the multiple computing devices, and is a generic term for a family of instance specifications with high performance, stable computing power, and balanced network performance. These examples have unique and stable computing, storage, and network resources in the specification family design. The exclusive example is very suitable for being used in an enterprise scene with high requirements on business stability.

The shared instance indicates that the vCPU contained by the instance is bound with CPUs on the plurality of computing devices, and is an instance type oriented to general middle and small websites or individuals. Shared instances place more emphasis on resource utilization than unshared instances, and thus share more of the computing resources available to shared instances, but at a relatively lower cost.

It should be noted that, the platform administrator may design the partition rule of the instance types and the names of the instances of the types according to needs.

Virtual machine: a complete computer system with complete hardware system functionality, operating in a completely isolated environment, simulated by software. Work that can be done in a physical computer can be implemented in a VM. When creating a VM in a computer, a part of the hard disk and the memory capacity of the physical machine need to be used as the hard disk and the memory capacity of the virtual machine. Each VM has a separate hard disk and operating system, and can be operated like a physical machine.

A cloud service provider typically provides instances of multiple specifications, multiple service-level agreements (SLAs), which are generally in the form of VMs or containers, to tenants. Instances are typically of the same type deployed on the same computing device.

In some possible implementations, mixed deployments may also be made on the same computing device. That is, different types of instances are deployed on the same computing device. Taking a VM as an example, in order to improve resource utilization, a cloud service provider may mix and deploy VMs of different types and different QoS on the same computing device.

Wherein, when the example type is determined according to the proportional relation of various types of computing resources contained in the example. Taking processor resources and storage resources as an example, the VMs of different types indicate different types determined according to the proportional relationship between the number of included vcpus and the number of GiB.

For example, in a computing device containing 8vCPU and 12GiB, one VM1 containing 2vCPU and 6GiB computing resources and one VM2 containing 6vCPU and 6GiB computing resources may be deployed simultaneously. Wherein, VM1 is a memory enhanced instance, and VM2 is a general-purpose instance.

Thus, examples of different types indicate examples of different types determined by the ratio between the resources including processor resources, storage resources (memory space), network resources, etc.

In some possible implementations, when the instance type is determined based on QoS, the different types of VMs indicate VMs of different QoS.

For example, in a computing device containing 8vCPU and 12GiB, one VM1 containing 2vCPU and 6GiB computing resources and one VM2 containing 6vCPU and 6GiB computing resources may be deployed simultaneously. Wherein, VM1 is an exclusive example, and VM2 is a shared example.

During use, the tenant can create and delete instances as needed. It should be noted that, in the prior art, the operations of tenant creation and deletion have great randomness. That is, the creation and deletion of instances running on a computing resource is highly random.

Thus, over time, there may be instances where the computing resources contained by each computing device are underutilized. That is, resource fragmentation may gradually occur in a part of the computing device, resulting in a situation where the allocable amount and utilization rate of the computing resource are reduced. Wherein the resource fragmentation indicates an unoccupied portion of the computing resources contained in the computing device. Therefore, an optimization goal of a typical scenario of the present application is to increase the utilization of computing resources.

For example, 3 VMs containing 2 vCPUs and 4GiB, respectively, have been created in a computing device containing 8 vCPUs and 16 GiB. At this point, the unoccupied computing resources (2 vCPU and 4 GiB) in the computing device are resource shards. Further, if there are 3 computing devices as described above at the same time, that is, there are 3 computing devices respectively containing resource fragments of 2vCPU and 4 GiB. When the computing management platform for managing the three computing devices receives a creation request for creating a VM containing 6vCPU and 12GiB, the creation request cannot be satisfied because there is no computing device containing 6vCPU and 12GiB computing resources. However, the computing management platform can obtain a computing device containing unoccupied 6 vcpus and 12GiB computing resources by migrating a part of VMs running on the three computing devices. Further, the creation request described above may then be satisfied.

As another example, 4 VMs containing 2 vCPUs and 2GiB, respectively, have been created in a computing device containing 8 vCPUs and 16 GiB. At this time, the unoccupied computing resource (8 GiB) in the computing device is a resource fragment.

It can be seen that the VM containing 2vCPU and 2GiB is a generic example. For the above situation, the utilization rate of the computing resources can be improved by deleting part of the general-purpose instances in the computing device and creating some instances (such as memory enhanced instances) with higher requirements on the storage resources.

Generally, instances can be migrated in order to achieve optimization goals. And the computing management platform can migrate the instances by utilizing an optimization algorithm according to the collected resource allocation and resource occupation conditions of each computing device. Specifically, an instance migration scheme can be obtained by using an optimization algorithm according to an optimization target, constraint conditions, and resource allocation and resource occupation conditions of each computing device. Furthermore, the instances running on the computing devices are migrated according to the instance migration scheme, so that an optimization target set by platform management personnel can be realized.

In particular, the optimization objective may be one or more of the following parameters: the minimum number of computing devices running, the maximum number of issuable for a given instance specification, the minimum migration time, the minimum number of virtual machines migrated, etc.

The constraint may then be one or more of the following conditions: and the resource usage amount of each computing device after the migration does not exceed the preset amount and the affinity relationship between each instance. The affinity relationship between the instances indicates that part of the instances need to be deployed on the same computing device, or part of the instances need to be deployed on different computing devices.

It is contemplated that the computing resources contained by the computing device and the computing resources contained by the instances do not necessarily have an integer multiple relationship. That is, when only multiple instances of the same type are running on a computing device, there are situations where the computing resources contained in the computing device cannot be fully utilized. Thus, a computing device supporting a hybrid deployment may more fully utilize the computing resources it contains.

However, there is a lack of consideration for hybrid deployment scenarios during the formulation of existing instance migration schemes. When optimization is performed without defining the deployment location of instances, it may happen that instances of the same type are deployed on a certain computing device. Therefore, the instance migration scheme obtained by the optimization method does not meet the requirements of hybrid deployment, and the advantages of hybrid deployment are not provided, namely, the resource utilization rate is further improved. In addition, platform managers in a hybrid deployment scenario may not have mastery of the inventory of various types of instances, as compared to a single deployment that deploys only instances of the same type on the same computing device. That is, it is difficult for platform managers to manage various types of instances as needed.

For a cloud vendor, based on data of tenant historical purchase instances, the sales ratio of different types of instances can be determined. In other words, in general, the proportion of the demands of the tenants for the different types of instances to the total demand is relatively fixed in a time period. Therefore, in a scenario in which a computing device supports hybrid deployment, if constraints on migration rules of different types of instances can be added to the constraints, demand management on the various types of instances can be achieved. Further, the utilization rate of the whole computing resource can be improved.

In view of the above, the present application provides an example migration method 100. The following briefly describes an application scenario related to the embodiment of the present application with reference to fig. 1.

As shown in fig. 1, a computing management platform 200 is connected to a tenant and a plurality of computing devices, respectively, for scheduling instances running on the plurality of computing devices. The tenant may implement the creation and deletion of instances by sending a request to the computing management platform 200. Meanwhile, the computing management platform 200 manages at least N computing devices, where each computing device has a certain number of instances running thereon. For example, computing device 1 has three instances running thereon, computing device 2 has two instances running thereon, and computing device N has one instance running thereon.

After obtaining the usage of the computing resources of each computing device, the computing management platform 200 may also predict the remaining operating time of each instance. And obtaining an instance migration scheme by using an optimization algorithm according to the optimization target, the constraint condition, the computing resource use condition of each computing device and the predicted value of the remaining operation duration of each instance. Further, portions of the instances may be migrated in accordance with the instance migration scheme. Wherein, the constraint condition includes capacity constraint. The capacity constraint indicates a preset capacity constraint for multiple types of instances running on the computing devices, and the capacity constraint indicates a proportion of computing resources occupied by the multiple types of instances to the computing resources of the computing devices.

For ease of understanding, an example migration method 100 is described below in conjunction with FIG. 2.

S100: computing management platform 200 determines whether an instance migration trigger condition is satisfied.

When multiple instances are running on a computing device managed by computing management platform 200, and when a trigger condition is satisfied, an instance migration operation will be performed.

In some possible implementations, the trigger condition may be that a ratio of an actual available number to a theoretical available number of a certain type of instance is less than or equal to an available threshold. The actual available quantity of a type of instance indicates the quantity of the type of instance that can be actually issued by the computing management platform 200 at the current time, and the theoretical available quantity indicates a value that the sum of the resources of the fragmented resources included in the computing management platform 200 at the current time is greater than the value of the computing resources required by the type of instance.

For example, when computing management platform 200 manages 3 computing devices, each computing device contains a shard of resources with computing resources of 2vCPU and 4GiB, respectively. For example 1, which contains 3vCPU and 6GiB, none of the computing devices currently can provide the computing resources required by example 1. That is, the actual available number of example 1 is 0. And the total amount of the resources of the fragmented resources contained in the computing management platform 200 is 6vCPU and 12GiB. That is, the theoretically available number of example 1 is equal to (6 vCPU and 12 GiB) to (3 vCPU and 6 GiB), i.e., the ratio is 2.

It should be noted that, taking the theoretical available quantity as an example, the processor resources in the resource sum of the fragmented resources should be compared with the processor resources included in example 1, and the memory resources in the resource sum of the fragmented resources should be compared with the memory resources included in example 1. And the final calculation should be obtained by rounding down the smaller of the two ratios. Similarly, the actual available quantity is obtained according to the method.

Further, after the actual available number and the theoretical available number are obtained, by comparing the relation between the ratio of the actual available number and the theoretical available number of the type example (example 1) and the available threshold, it can be determined whether the trigger condition is satisfied. Wherein the available threshold value can be set according to the requirement.

Specifically, when the ratio of the two is smaller than or equal to the available threshold, it is considered that the trigger condition is satisfied, and the process goes to S102. And conversely, when the ratio of the actual available quantity to the theoretical available quantity is larger than the available threshold value, the triggering condition is considered to be not met, and the actual available quantity and the theoretical available quantity of each type of instance are continuously monitored.

Alternatively, the trigger condition may be a timed trigger. That is, the instance migration is performed according to a preset time and frequency. For example, the instance migration is performed once per 24 o 'clock per day, depending on the platform administrator's settings.

S102: the computing management platform 200 obtains computing resource usage for each computing device.

When the computing management platform 200 determines in S100 that the trigger condition is satisfied, the computing resource usage of each computing device will be acquired.

Specifically, the usage of the computing resources of each computing device includes the amount of resources included in each computing device, the number of instances included in each computing device, and the amount of resources included in each instance.

Optionally, the usage of the computing resource of each computing device further includes parameters such as a CPU utilization of each computing device, a vCPU utilization of each instance, and the like.

After the above parameters of each instance are obtained, the instances may be divided by using a classifier. The classifier can be constructed based on a machine learning method and the like.

For example, the instances may be divided in the classifier according to load type. Taking the parameter as vCPU utilization as an example, when vCPU utilization of an instance is higher than a preset utilization threshold, the instance may be determined as a high-load instance. Conversely, when the vCPU utilization of an instance is above a preset utilization threshold, the instance may be determined to be a low load instance. Further, multiple levels of classification of load types for instances may be achieved by setting multiple utilization thresholds.

After the load type of the instance is determined, constraints may be constructed according to the load type. The specific construction method will be described below.

In some possible implementations, the computing resource usage for which the package cycle instance was purchased may also be obtained. Specifically, the usage of the computing resources of the packet cycle instances includes the computing device where each packet cycle instance is located, the amount of the resources included in each packet cycle instance, and the expected usage duration of each packet cycle instance. The pack period example indicates an example of a purchase pattern providing a fixed period such as a pack month and a pack year.

After the usage of the computing resources of the computing devices is obtained, the data may be stored in the storage unit of the computing management platform 200.

S104: the computing management platform 200 obtains the remaining operating duration of each instance according to the historical computing resource usage data of the tenant corresponding to each instance.

The calculation resource usage of each calculation device obtained in S102 is stored to the calculation management platform in tenant units, and the historical calculation resource usage data in tenant units can be obtained. Further, according to the historical computing resource usage data of the tenant corresponding to each instance, the predicted value of the running time of each instance running on the computing device can be obtained based on the prediction model.

As previously mentioned, the historical computing resource usage data includes the amount of resources each computing device contains, the number of instances each computing device contains, and the amount of resources each of the instances contains. Optionally, parameters such as CPU utilization of each computing device, vCPU utilization of each instance, and the like are also included. Further, the historical computing resource usage data also comprises parameters such as the running time length, the purchasing time and the purchasing frequency of the purchased instances of each tenant.

The running duration of the purchased examples of each tenant history and any one or more of the following parameters: and the purchase time of the historically purchased instances of each tenant, the purchase frequency of the historically purchased instances of each tenant and the purchase time of the instances in the current running state are used as the input of the prediction model, and the predicted value of the running time of each instance can be obtained based on the prediction model.

Wherein the run length of each instance indicates the length of time that each instance takes from the purchase time to the expected release time. Further, the remaining operation duration of each instance may be obtained according to the time of the current time and the operation duration of each instance. I.e. the length of time from the current moment to the expected release time.

In some possible implementations, the remaining run-time type of each instance may be determined according to the obtained remaining run-time length of each instance. For example, an instance may be classified as a short period instance when its remaining run length is less than 2 hours. As another example, an instance may be classified as a long period instance when its remaining run length exceeds 24 hours. The platform manager can set the division rules of the remaining runtime types of the instances and the names of the types as required.

In some possible implementations, the results obtained by the predictive model may also output the remaining runtime type of the instance.

It should be noted that S104 is an optional step.

S106: the computing management platform 200 obtains optimization objectives and constraints.

Platform administrators may set optimization goals and constraints on the computing management platform 200 to obtain an instance migration solution.

In some possible implementations, the platform administrator may upload the optimization objectives and the constraints by uploading a script. Wherein, the script comprises optimization targets and constraint conditions.

In some possible implementations, the computing management platform 200 provides a configuration interface through which platform management personnel can upload optimization objectives and constraints. Fig. 3 illustrates a configuration interface 300.

The configuration interface 300 includes:

by selecting several optimization objectives shown in the configuration interface 300, the platform administrator can combine the objectives to obtain an optimization objective function, as shown in the optimization objective setting control 301.

FIG. 3 illustrates an optimization goal in which the number of running computing devices is minimal, indicating that the number of computing devices occupied by the instances that run after migration is complete is minimal.

The maximum number of instances that can be released for a given instance specification indicates that the number of instances specified by the platform administrator that can theoretically be released after migration is complete is maximized. Platform administrators may accomplish the specification of instance specifications by clicking on this optimization goal (not shown in FIG. 3). The instances managed by the computing management platform can be divided into multiple instance specification families according to the service scenes and the use scenes. Further, one instance specification family is divided into multiple instance specifications according to the configuration of processor resources, memory resources, and the like. Furthermore, instances of the same specification are of the same type of instance in this application. And the instances with different specifications may belong to the same type of instance or different types of instances.

In some possible implementations, the optimization goal may also be to maximize the amount of issuable for a given instance type.

The shortest migration time indicates that the sum of the times required to migrate instances from one computing device to another computing device is the shortest. Among them, a common migration scenario is thermal migration. The live migration indicates that the running state of the whole instance is completely saved, and the running state can be quickly restored to the original hardware platform or even different hardware platforms. After recovery, the instance is still running smoothly and the tenant does not perceive any differences. There are many factors that affect the time of migration. For example, the speed and efficiency of communication between two computing devices is related to the performance of the source and target servers.

The least number of migration instances indicates the least number of instances that need to be migrated throughout the migration process.

In some possible implementations, the platform administrator may also set other optimization goals as needed. Specifically, the setting can be performed by setting in a custom box in optimization target setting control 301.

It should be noted that the optimization objective function may include one or more of the above optimization objectives.

Constraint setting control 302 in which platform administrator can configure constraints as needed. Wherein the constraint condition comprises a capacity constraint.

Capacity constraints, which indicate the capacity allocated among the computing devices for one or more types of instances. For example, a platform administrator may first select a computing device from a plurality of computing devices to configure in a selection box on the left side. Further, the type and proportion of instances running on the selected computing device may be set. In particular, the platform administrator may select at least one instance type in the instance type selection control 303. At the same time, the scale of the selected instance may also be set in the scale setting control 304. It should be noted that the sum of the proportions corresponding to each instance type set by the platform administrator in the proportion control 304 should be less than or equal to 100%. Platform administrators may add or delete instance types and their proportions as needed using modification controls 305.

In some possible implementations, the platform administrator may also add soft constraints in the capacity constraints. The soft constraint indicates that a floating range (not shown in fig. 3) is set for the scale corresponding to the instance type selected in the instance type selection control 304. For example, after the platform administrator sets the instance types that can be run on the computing device 1 and the proportions corresponding to the instance types in the interface shown in fig. 3, the platform administrator may also set floating proportions for the instance types.

For example, instance type 1 and instance type 2 that may be run on computing device 1 are set in the interface shown in FIG. 3, where the proportion of each type is 50%. Further, when the platform administrator sets the float ratio of instance type 1 on computing device 1 to 10%, it indicates that at most 60% of the resources on computing device 1 may be allocated to instance type 1 when preparing the instance migration solution.

In some possible implementations, the platform administrator does not need to set the instance type and scale on each computing device one by one. The platform administrator may set the instance types and proportions of the computing devices in batches. Specifically, all the computing devices may be configured in a unified manner by a full selection method.

For another example, the instance migration system may receive a computing device classification set by a platform administrator in advance. That is, all computing devices may be divided into a number of computing device groups according to the computing device classifications. Further, the platform administrator may configure the instance types and their proportions in units of computing device groups when setting capacity constraints.

In some possible implementations, the platform administrator may obtain the instance types and proportions on each computing device using a capacity control recommendation model. Wherein, the capacity control recommendation model can be established based on an optimization algorithm. Wherein, the objective function of the optimization algorithm can be set according to the requirement.

Alternatively, the capacity control recommendation model may be built based on a machine learning model. Wherein the machine learning model may be trained using historical data. For example, the historical data includes the number of computing devices at the historical time, the number of instances running on each computing device at the historical time, and the capacity control situation.

After obtaining the instance type and scale on each computing device using the capacity control recommendation model described above, the platform administrator may perform S108 based on this parameter.

Specifically, the manner described above for obtaining the type and scale of instances on each computing device based on the capacity control recommendation model may be obtained by the platform administrator clicking a "use recommendation" button (not shown in FIG. 3) in the configuration interface 300. That is, after the platform administrator clicks the "use recommended value" button, the computing management platform 200 will automatically generate a capacity constraint.

In some possible implementations, the capacity constraint may also be a separate interface from the configuration interface 300. That is, the platform administrator may set the capacity constraint in another configuration interface. Further, constraint setting control 302 can be a separate configuration interface from configuration interface 300.

Affinity constraints indicate the positional relationship between the computing devices on which the instances reside. Specifically, an affinity state and a counter-affinity state are included. Where affinity indicates that both instances need to be deployed on the same computing device. Conversely, the anti-affinity state indicates that the two instances need to be deployed on two different computing devices, respectively. The platform administrator may click on the affinity constraint and enter the numbers of instances belonging to the affinity state and the anti-affinity state (not shown in FIG. 3).

And clustering constraints including an operation duration clustering constraint and a load type clustering constraint. Specifically, the platform administrator may proceed with further configuration by clicking on the clustering constraint (not shown in FIG. 3).

And clustering constraints according to load types, wherein the constraint conditions are formulated according to the load types of the instances. In particular, it may be that instances specifying the same load type run on the same computing device. That is, instances of different load types may not run on the same computing device. Alternatively, the type of load running on the same computing device may also be specified. For example, it may be specified that highly loaded instances should all be run on the same computing device.

The run-time clustering constraint indicates that a constraint condition is made based on the remaining run-time or the remaining run-time type obtained in S104.

In some possible implementations, constraints may be formulated based on the remaining run lengths of the instances. For example, it may be specified that the variance of the remaining run lengths of instances running on the same computing device should be less than the remaining run length threshold. Thereby ensuring that the release times of instances running on the same computing device are substantially consistent to avoid creating excessive resource fragmentation. Wherein, the remaining operation time threshold can be set according to the requirement.

In some possible implementations, constraints may be formulated based on the remaining runtime types of the instances. For example, it may be specified that the remaining runtime types of instances running on the same computing device are consistent. Therefore, the release time of the instances running on the same computing device is basically consistent, so that excessive resource fragments are avoided, and the resource utilization rate is improved.

In some possible implementations, the platform administrator may also set other constraints in the custom box in the constraint setting control 302 as needed.

And a cancel control 306, wherein the platform manager can terminate the setting operation by clicking the cancel control 306 in the process of setting the parameters.

And a determining control 307, wherein after the platform manager completes the setting of the parameters, the platform manager can finish uploading the parameters by clicking the determining control 307.

S108: the calculation management platform 200 obtains an instance migration scheme according to the optimization objective, the constraint condition, the calculation resource usage of each calculation device, and the remaining operation duration of each instance.

The computing management platform 200 may obtain a solution to the optimization problem, i.e., an instance migration solution, based on the optimization method according to the optimization objective and the constraint condition obtained in S106 and the usage of the computing resource of each computing device obtained in S102. Common optimization methods include gradient descent methods, particle swarm algorithms, genetic algorithms, and the like.

S110: the computing management platform 200 performs the instance migration according to an instance migration scheme.

After obtaining the instance migration scheme in S108, the computing management platform may perform the migration operation of the instance according to the instance migration scheme.

In some possible implementations, before the computing management platform performs the migration operation of the instance according to the instance migration scheme, the platform manager may further confirm the instance migration scheme. For example, when the optimization solution generated in S106 is not unique, that is, there are multiple instance migration schemes, it is necessary for the platform administrator to select one instance migration scheme from the multiple instance migration schemes for execution.

Optionally, the platform administrator may also determine whether to implement the instance migration scheme according to the result of the confirmation.

The present application further provides a computing management platform 200, as shown in fig. 4, including:

an interaction unit 202, configured to receive the actual available number and the theoretical available number of each instance in S100. In S102, the acquisition of the usage of the computing resources of the respective computing devices is received by the interaction unit 202. The interaction unit 202 is further configured to receive the optimization objective and the constraint condition uploaded by the platform administrator in S106. Optionally, the selection and confirmation of the instance migration scheme by the platform administrator in S110 are also received by the interaction unit 202.

A storage unit 204, configured to store the usage of the computing resources of each computing device acquired in S102. The instance migration scenario selected or confirmed by the platform administrator in S110 is also stored in the storage unit 204.

A processing unit 206, configured to determine in S100 whether the instance migration trigger condition is satisfied. In S104, the processing unit 206 is configured to obtain the remaining operation duration of each instance according to the historical computing resource usage data of the tenant corresponding to each instance. Specifically, the operation of obtaining the predicted value of the running time length of each instance running on the computing device based on the prediction model is performed by the processing unit 206 according to the historical computing resource usage data of the tenant corresponding to each instance. Also, the processing unit 206 is used to build and train a predictive model. In S108, the operation of obtaining the instance migration scheme is also performed by the processing unit 206 according to the optimization objective, the constraint condition, the usage of the computing resources of each computing device, and the remaining operating time of each instance.

Optionally, in S106, the establishment and training of the capacity control recommendation model are also performed by the processing unit 206.

The present application also provides a computing device 400. As shown in fig. 5, the computing device includes: a bus 402, a processor 404, a memory 406, and a communication interface 408. The processor 404, memory 406, and communication interface 408 communicate over a bus 402. Computing device 400 may be a server or a terminal device. It should be understood that the present application is not limited to the number of processors, memories in the computing device 400.

The bus 402 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one line is shown in FIG. 5, but this does not represent only one bus or one type of bus. Bus 404 may include a path that transfers information between components of computing device 400 (e.g., memory 406, processor 404, communication interface 408).

The processor 404 may include any one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Micro Processor (MP), or a Digital Signal Processor (DSP).

The memory 406 may include volatile memory (volatile memory), such as Random Access Memory (RAM). The processor 404 may also include non-volatile memory (non-volatile memory), such as read-only memory (ROM), flash memory, a Hard Disk Drive (HDD), or a Solid State Drive (SSD). The memory 406 has executable program code stored therein that is executed by the processor 404 to implement the example migration method 100 described above. Specifically, the memory 406 stores instructions for the computing management platform 200 to perform the example migration method 100.

The communication interface 403 enables communication between the computing device 400 and other devices or communication networks using transceiver modules such as, but not limited to, network interface cards, transceivers, and the like.

The embodiment of the application also provides a computing device cluster. As shown in fig. 6, the cluster of computing devices includes at least one computing device 400. The memory 406 of one or more computing devices 400 in the cluster of computing devices may have stored therein instructions for the same computing management platform 200 to perform the example migration method 100.

In some possible implementations, one or more computing devices 400 in the cluster of computing devices may also be used to execute portions of the instructions used by the computing management platform 104 to perform the instance migration method 100. In other words, a combination of one or more computing devices 400 may collectively execute the instructions that the computing management platform 104 uses to perform the instance migration method 100.

It is noted that the memory 406 in different computing devices 400 in a computing device cluster may store different instructions for performing portions of the functionality of the computing management platform 104.

Fig. 7 shows one possible implementation. As shown in fig. 7, two computing devices 400A and 400B are connected via a communication interface 408. Memory in computing device 400A holds instructions for performing the functions of interaction unit 202 and processing unit 206. Memory in computing device 400B has stored thereon instructions for performing the functions of storage unit 204. In other words, the memory 406 of the computing devices 400A and 400B collectively store instructions for the computing management platform 200 to perform the instance migration method 100.

The manner of connection between the clusters of computing devices illustrated in fig. 7 may be such that a large amount of computing device operational data needs to be stored in view of the example migration method 100 provided herein. Thus, consider the memory function being performed by computing device 400B.

It should be understood that the functionality of computing device 400A shown in fig. 7 may also be performed by multiple computing devices 400. Likewise, the functionality of computing device 400B may be performed by multiple computing devices 400.

In some possible implementations, one or more computing devices in a cluster of computing devices may be connected over a network. Wherein the network may be a wide area network or a local area network, etc. Fig. 8 shows one possible implementation. As shown in fig. 8, two computing devices 400C and 400D are connected via a network. In particular, connections are made to the network through communication interfaces in the respective computing devices. In this type of possible implementation, the memory 406 in the computing device 400C holds instructions to execute the interaction unit 202. Also, instructions to execute the storage unit 204 and the processing unit 206 are stored in the memory 406 in the computing device 400D.

The connection manner between the computing device clusters shown in fig. 8 may be considered that the example migration method 100 provided in the present application needs to store a large amount of computing device operating data and execute a large amount of computation to formulate an example migration scheme, and therefore, functions implemented by the storage unit 204 and the processing unit 206 are considered to be executed by the computing device 400D.

It should be understood that the functionality of computing device 400C shown in fig. 8 may also be performed by multiple computing devices 400. Likewise, the functionality of computing device 400D may be performed by multiple computing devices 400.

The embodiment of the application also provides a computer readable storage medium. The computer-readable storage medium can be any available medium that a computing device can store or a data storage device, such as a data center, that contains one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), among others. The computer-readable storage medium includes instructions that direct a computing device to execute the application described above for computing management platform 200 for performing instance migration method 100.

The embodiment of the application also provides a computer program product containing instructions. The computer program product may be a software or program product containing instructions capable of being run on a computing device or stored in any available medium. The computer program product, when executed on at least one computer device, causes the at least one computer device to perform the example migration method 100 described above.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. An instance migration method applied to a computing management platform for scheduling instances running on a plurality of computing devices, the method comprising:

acquiring the use conditions of the computing resources of the plurality of computing devices;

determining optimization objectives for the plurality of computing devices;

determining constraints for the plurality of computing devices, wherein the constraints for each computing device indicate capacity constraints for various types of instances running on that computing device;

obtaining an instance migration scheme according to the computing resource use conditions, the optimization targets and the constraint conditions of the plurality of computing devices;

migrating instances running on the plurality of computing devices according to the instance migration scheme.

2. The method of claim 1, wherein the capacity constraint indicates a proportion of computing resources occupied by the various types of instances to computing resources of the computing device.

3. The method of claim 1 or 2, wherein different types of instances running on the plurality of computing devices occupy different types of computing resources in different proportions.

4. The method of claim 1 or 2, wherein the different types of instances running on the plurality of computing devices include a shared type and an exclusive type, wherein the shared type indicates that the vcpus contained by the instances are bound to the CPUs on the plurality of computing devices, and wherein the exclusive type indicates that the vcpus contained by the instances are not bound to the CPUs on the plurality of computing devices.

5. The method of any of claims 1 to 4, wherein the optimization objectives include minimizing the number of running computing devices and/or maximizing issuable of specified instance specifications, and minimizing migration time and/or minimizing the number of migration instances;

the running computing device number indicates the number of computing devices occupied by running instances after migration is completed, the issuable amount of the specified instance specification indicates the number of the specified instance specification which can be issued after migration is completed, the migration time indicates the sum of time required for migrating the instances, and the migration instance number indicates the number of the instances to be migrated.

6. A method according to any one of claims 1 to 5, wherein the constraints include any one or more of:

the method comprises the following steps of affinity constraint, load type clustering constraint and running duration clustering constraint, wherein the affinity constraint indicates the position relation among computing devices where each instance is located, the load type clustering constraint indicates the constraint determined according to the load types of the instances, and the running duration clustering constraint indicates the constraint determined according to the remaining running duration.

7. The method according to claim 6, wherein before obtaining the instance migration solution according to the computing resource usage, the optimization objective, and the constraint condition of each computing device at the current time, the method further comprises:

obtaining the residual running time of the instances running on each computing device at the current moment according to the computing resource use condition and the prediction model of each computing device at the historical moment;

and determining the running time clustering constraint according to the residual running time of the running examples on each computing device at the current moment.

8. A computing management platform, the computing management platform comprising:

the interaction unit is used for acquiring the use conditions of the computing resources of the plurality of computing devices;

a processing unit to determine optimization objectives for the plurality of computing devices; determining constraints for the plurality of computing devices, wherein the constraints for each computing device indicate capacity constraints for various types of instances running on that computing device; obtaining an instance migration scheme according to the computing resource use conditions, the optimization targets and the constraint conditions of the plurality of computing devices; migrating instances running on the plurality of computing devices according to the instance migration scheme.

9. The computing management platform of claim 8, wherein the capacity constraint indicates a proportion of computing resources occupied by the instances of the various types to computing resources of the computing device.

10. The computing management platform of claim 8 or 9, wherein different types of instances running on the plurality of computing devices occupy different types of computing resources at different proportions.

11. The computing management platform of claim 8 or 9, wherein the different types of instances running on the plurality of computing devices include shared and exclusive, wherein the shared indicates that vcpus contained by the instances are bound to CPUs on the plurality of computing devices, and wherein the exclusive indicates that vcpus contained by the instances are not bound to CPUs on the plurality of computing devices.

12. The computing management platform according to any one of claims 8 to 11, wherein the optimization objectives include a minimum number of running computing devices and/or a maximum number of issuable for a specified instance specification, and a minimum migration time and/or a minimum number of migration instances; the running computing device number indicates the number of computing devices occupied by running instances after migration is completed, the issuable amount of the specified instance specification indicates the number of the specified instance specification which can be issued after migration is completed, the migration time indicates the sum of time required for migrating the instances, and the migration instance number indicates the number of the instances to be migrated.

13. The computing management platform of any of claims 8 to 12, wherein the constraints further comprise any one or more of: the method comprises the following steps of affinity constraint, load type clustering constraint and running duration clustering constraint, wherein the affinity constraint indicates the position relation among computing devices where each instance is located, the load type clustering constraint indicates the constraint determined according to the load types of the instances, and the running duration clustering constraint indicates the constraint determined according to the remaining running duration.

14. The computing management platform according to claim 13, wherein the processing unit is further configured to obtain a remaining runtime of an instance running on each computing device at a current time according to the usage of computing resources of each computing device at a historical time and the prediction model; and determining the running time clustering constraint according to the residual running time of the running examples on each computing device at the current moment.

15. A cluster of computing devices comprising at least one computing device, each computing device comprising a processor and a memory;

the processor of the at least one computing device is to execute instructions stored in the memory of the at least one computing device to cause the cluster of computing devices to perform the method of any of claims 1 to 7.

16. A computer program product comprising instructions which, when executed by a cluster of computer devices, cause the cluster of computer devices to perform the method of any one of claims 1 to 7.

17. A computer readable storage medium comprising computer program instructions which, when executed by a cluster of computing devices, perform the method of any of claims 1 to 7.