WO2017128507A1

WO2017128507A1 - Decentralized resource scheduling method and system

Info

Publication number: WO2017128507A1
Application number: PCT/CN2016/076997
Authority: WO
Inventors: 孙利军
Original assignee: 中兴通讯股份有限公司
Priority date: 2016-01-29
Filing date: 2016-03-22
Publication date: 2017-08-03
Also published as: CN107025136A

Abstract

The present invention provides a decentralized resource scheduling method and system. The decentralized resource scheduling method comprises: receiving a job by using of a cluster access node; by using a resource scheduling policy node and according to a current resource load status, a job feature and a user feature of the job, obtaining corresponding resource scheduling node information; and if there is no resource scheduling node corresponding to the resource scheduling node information, generating the resource scheduling node, and scheduling resources for the job by using the resource scheduling node. In the solution, the resource scheduling node is generated dynamically, so that a concurrent job capacity limit can be eliminated, and a job bearing capacity higher than that of joint clusters is achieved. In addition, since the resource scheduling node is generated dynamically, dynamic adjustment can be performed according to conditions such as an actual total number of resources of a cluster, data locality, and tenant features, thereby realizing dynamic load balancing, meeting performance and tenant feature requirements, and better solving the problem of supporting a large number of concurrent jobs in the prior art.

Description

一种去中心化资源调度方法及***Decentralized resource scheduling method and system

技术领域Technical field

本发明涉及并行计算技术领域，特别是指一种去中心化资源调度方法及***。The present invention relates to the field of parallel computing technologies, and in particular to a decentralized resource scheduling method and system.

背景技术Background technique

当前已经进入了大数据时代，大数据不光带来了信息产业的创新，还推动了传统产业自我价值的重新定位。Nowadays, it has entered the era of big data. Big data not only brings innovation in the information industry, but also promotes the reorientation of the self-value of traditional industries.

目前的大数据计算，主要有Yarn，Spark，Storm等。这些计算框架大部分都是主从结构。为了解决单点故障问题，会启动一个备资源调度节点以便主资源调度节点发生故障时能接管服务。但目前的架构中，同一时间，主备资源调度节点只有一个能正常提供服务。这样一来就会存在一个问题：当大量的作业需要在计算框架中并发运行时，资源调度节点面临的冲击将会很大，很可能出现内存溢出，作业资源调度异常等各种问题。The current big data calculations are mainly Yarn, Spark, Storm, etc. Most of these computational frameworks are master-slave structures. In order to solve the single point of failure problem, a standby resource scheduling node is started so that the primary resource scheduling node can take over the service when it fails. However, in the current architecture, only one of the active and standby resource scheduling nodes can provide services at the same time. As a result, there will be a problem: when a large number of jobs need to run concurrently in the computing framework, the resource scheduling node will face a large impact, and it is likely to have various problems such as memory overflow and abnormal operation of the operation resources.

从大数据平台的应用发展趋势来看，目前主要有两种方向:一是基于开源的基础上自研大数据平台供内部使用；二是实现数据平台的物理支撑，以租户的方式提供大数据服务给众多的小厂商。这两种方向都会遇到上述的大量作业并发运行的问题。对于内部使用的数据平台来说还可以通过限制作业并发运行的数目来避免问题，但是对于以服务方式提供的数据平台来说，大量作业并发运行将是***所必备的功能。From the application development trend of big data platform, there are two main directions: one is based on open source, based on self-developed big data platform for internal use; the other is to realize the physical support of data platform and provide big data in the form of tenants. Service to many small manufacturers. Both of these directions will encounter problems with the above-mentioned large number of concurrent operations. For the internal use of the data platform, you can also avoid the problem by limiting the number of concurrent operations, but for the data platform provided by the service, a large number of concurrent operations will be a necessary function of the system.

大量作业并发运行的问题尚未在各大数据计算框架中引起足够的重视,但是随着提供服务的大数据平台的推广,使用服务的租户的增长,大量作业并发的问题将很快成为一个亟待解决的问题。目前暂时也没有发现完整、***化的方案提出,大概可行的方案有如下几种:The problem of concurrent operation of a large number of operations has not attracted enough attention in the big data computing framework, but with the promotion of the big data platform providing services, the growth of tenants using services, the problem of a large number of concurrent operations will soon become an urgent solution. The problem. At present, no complete and systematic proposal has been found. The feasible solutions are as follows:

现有方法1：提升资源调度节点的机器性能,扩大资源调度角色的CPU，内存，网络资源。这样可以有限地提升资源调度节点的数据吞吐能力。Existing method 1: Improve the machine performance of the resource scheduling node, and expand the CPU, memory, and network resources of the resource scheduling role. This can limit the data throughput of the resource scheduling node to a limited extent.

缺点：为了满足硬件的高配置要求，代价将会很高昂。而当并发作业数上万时，该节点的负载将会非常高，很容易出现异常，且主备切换也需要花费更多的时间。Disadvantages: In order to meet the high configuration requirements of the hardware, the cost will be very high. When the number of concurrent jobs is tens of thousands, the load on the node will be very high, and it is prone to abnormalities, and the active/standby switchover also takes more time.

现有方法2：使用多个大数据集群联合，在其上封装一个接口层，将作业负载均衡地分配到各集群上。该方法能较好地实现集群能力的横向扩展。Existing Method 2: Use multiple big data cluster federations, encapsulate an interface layer on it, and distribute the workload to each cluster in a balanced manner. This method can better achieve the horizontal expansion of cluster capabilities.

缺点：由于集群之间资源是隔离的，租户最多只能获得一个集群的所有资源，而集群联合的总资源往往是远远大于某一个集群的。Disadvantages: Because the resources between the clusters are isolated, the tenants can only get all the resources of one cluster, and the total resources of the clusters are often much larger than one cluster.

集群的数据源(通常是HDFS)往往是采用联邦方式实现的横向扩展。当作业需要处理的数据是跨越多个集群时，作业由于只在一个集群上得到资源调度，对于集群外的数据全都需要远程获取，本地性不够，加大网络负载。 Clustered data sources (usually HDFS) are often scaled out in a federated manner. When the data to be processed by the job spans multiple clusters, the jobs are only scheduled to be resourced on one cluster. For the data outside the cluster, all the data needs to be obtained remotely. The locality is insufficient and the network load is increased.

由于存在多个集群，租户的资源使用率不好控制，普通租户和Vip租户的资源使用率如何避免集群差异的影响会比较复杂。Because there are multiple clusters, the resource usage of tenants is not well controlled. How to avoid the impact of cluster differences on the resource usage of ordinary tenants and VIP tenants is more complicated.

现有方法3：使用多个大数据集群联合,并提供一个作业控制模块。当收到提交的作业时将作业按策略分解为多个作业并分发到相应的数据集群进行计算。由于多个集群联合，资源调度模块有多个，作业的并发能力得到扩展。Existing Method 3: Use multiple big data cluster federations and provide a job control module. When a submitted job is received, the job is broken down into multiple jobs by policy and distributed to the corresponding data cluster for calculation. Since multiple clusters are federated, there are multiple resource scheduling modules, and the concurrent capacity of the job is expanded.

缺点：作业控制模块对作业的分解将依赖于作业的具体业务，导致作业控制模块的逻辑有依赖性。Disadvantages: The decomposition of the job by the job control module will depend on the specific business of the job, resulting in a dependency on the logic of the job control module.

作业由于被分解成了多个子作业，如果业务逻辑还需要对数据进行汇总的话，势必还要一个汇总作业。复杂性比较高。Since the job is broken down into multiple sub-jobs, if the business logic still needs to summarize the data, it is bound to have a summary job. The complexity is relatively high.

发明内容Summary of the invention

本发明实施例的目的在于提供一种去中心化资源调度方法及***，解决现有技术中大量作业并发的问题。The purpose of the embodiments of the present invention is to provide a decentralized resource scheduling method and system, which solves the problem of a large number of concurrent operations in the prior art.

为了解决上述技术问题，本发明实施例提供一种去中心化资源调度方法，包括：In order to solve the above technical problem, an embodiment of the present invention provides a decentralized resource scheduling method, including:

利用集群接入节点接收作业；Receiving jobs using a cluster access node;

利用资源调度策略节点根据当前资源负载情况和所述作业中的作业特征和用户特征得到对应的资源调度节点信息；Using the resource scheduling policy node to obtain corresponding resource scheduling node information according to the current resource load situation and the job feature and the user feature in the job;

若不存在与所述资源调度节点信息相对应的资源调度节点，则生成所述资源调度节点，利用所述资源调度节点为所述作业调度资源。If there is no resource scheduling node corresponding to the resource scheduling node information, generate the resource scheduling node, and use the resource scheduling node to schedule resources for the job.

可选地，在所述利用集群接入节点接收作业之前，所述去中心化资源调度方法还包括：Optionally, before the receiving the job by using the cluster access node, the decentralized resource scheduling method further includes:

配置预设数量预启动的所述资源调度节点；Configuring a preset number of pre-launched resource scheduling nodes;

在***启动时，启动配置的所述预设数量的所述资源调度节点。When the system is started, the preset number of the resource scheduling nodes configured are started.

可选地，在所述利用所述资源调度节点为所述作业调度资源之后，所述去中心化资源调度方法还包括：Optionally, after the using the resource scheduling node to schedule resources for the job, the decentralized resource scheduling method further includes:

若预设时间段内生成的所述资源调度节点没有为另一作业调度资源，则自动关闭生成的所述资源调度节点。If the resource scheduling node generated in the preset time period does not schedule resources for another job, the generated resource scheduling node is automatically closed.

可选地，所述利用资源调度策略节点根据当前资源负载情况和所述作业中的作业特征和用户特征得到对应的资源调度节点信息的步骤包括：Optionally, the step of using the resource scheduling policy node to obtain corresponding resource scheduling node information according to the current resource loading situation and the job feature and the user feature in the job includes:

根据所述作业特征得到所述作业的数据本地性较好的资源调度节点集合；Obtaining, according to the job feature, a resource scheduling node set with better data locality of the job;

根据所述用户特征得到资源的限制约束； Obtaining a resource constraint according to the user feature;

根据预设策略，结合所述当前资源负载情况、资源调度节点集合和资源的限制约束得到对应的资源调度节点信息。According to the preset policy, the corresponding resource scheduling node information is obtained according to the current resource load situation, the resource scheduling node set, and the resource constraint constraint.

可选地，所述生成所述资源调度节点的步骤包括：Optionally, the step of generating the resource scheduling node includes:

利用所述资源调度策略节点随机选取一个隶属于所述资源调度节点的作业运算节点，并通知该作业运算节点启动一个容器来运行所述资源调度节点。The resource scheduling policy node randomly selects a job operation node that belongs to the resource scheduling node, and notifies the job operation node to start a container to run the resource scheduling node.

可选地，所述利用所述资源调度节点为所述作业调度资源的步骤包括：Optionally, the step of using the resource scheduling node to schedule resources for the job includes:

利用所述资源调度节点向资源汇报节点进行注册，并接收所述资源汇报节点根据预定规则汇报的空闲资源；Using the resource scheduling node to register with the resource reporting node, and receiving the idle resource reported by the resource reporting node according to a predetermined rule;

利用所述资源调度节点对作业控制节点根据切分的所述作业申请的资源进行调度。The resource scheduling node is used to schedule the job control node according to the cut-off resource of the job application.

可选地，在所述利用所述资源调度节点对作业控制节点根据切分的所述作业申请的资源进行调度之前，所述去中心化资源调度方法还包括：Optionally, the decentralized resource scheduling method further includes: before the scheduling, by the resource scheduling node, the scheduling, by the job control node, according to the categorized resource of the job application, the decentralized resource scheduling method further includes:

利用所述资源调度节点通知与所述空闲资源对应的作业运算节点启动容器运行所述作业控制节点。And using the resource scheduling node to notify the job operation node corresponding to the idle resource to start a container to run the job control node.

可选地，所述去中心化资源调度方法还包括：Optionally, the decentralized resource scheduling method further includes:

利用资源调度概览节点实时获取资源负载情况。The resource scheduling situation is obtained in real time by using the resource scheduling overview node.

利用作业控制节点将调度的资源分配给所述作业中具体的任务，并通知与调度的所述资源相对应的作业运算节点启动容器运行所述任务。The scheduled resource is allocated to the specific task in the job by the job control node, and the job computing node corresponding to the scheduled resource is started to start the container to run the task.

可选地，在所述通知与调度的所述资源相对应的作业运算节点启动容器运行所述任务之后，所述去中心化资源调度方法还包括：Optionally, the decentralized resource scheduling method further includes: after the job operation node that is configured to notify the resource that is scheduled to start the container to run the task, the decentralized resource scheduling method further includes:

在所述容器中的任务执行完毕后，利用所述作业运算节点通知所述作业控制节点关闭所述容器。After the task in the container is executed, the job operation node is notified by the job operation node to close the container.

可选地，在所述利用所述作业运算节点通知所述作业控制节点关闭所述容器之后，所述去中心化资源调度方法还包括：Optionally, after the using the job operation node to notify the job control node to close the container, the decentralized resource scheduling method further includes:

在所述作业中的所有任务均执行完毕后，利用所述作业控制节点通知所述资源调度节点资源释放，并向所述资源调度节点申请关闭所述作业控制节点。After all the tasks in the job are executed, the job control node is used to notify the resource scheduling node of resource release, and the resource scheduling node is requested to close the job control node.

在所述资源重启后，利用资源调度概览节点根据所述资源的限制约束重新与对应的所述资源调度节点建立关系。After the resource is restarted, the resource scheduling overview node is re-consistent with the corresponding resource according to the resource constraint constraint The resource scheduling node establishes a relationship.

可选地，所述在所述资源重启后，利用资源调度概览节点根据所述资源的限制约束重新与对应的所述资源调度节点建立关系的步骤包括：Optionally, the step of reestablishing a relationship between the resource scheduling overview node and the corresponding resource scheduling node according to the resource constraint constraint after the resource is restarted includes:

在所述资源重启后，启动所述资源的作业控制节点和资源汇报节点；After the resource is restarted, the job control node and the resource reporting node of the resource are started;

利用所述资源汇报节点通知所述资源调度概览节点所述资源可用；Notifying, by the resource reporting node, that the resource scheduling overview node is available;

利用所述资源调度概览节点依据所述资源的限制约束查找到对应的所述资源调度节点；Using the resource scheduling overview node to find the corresponding resource scheduling node according to the resource constraint constraint;

利用所述资源调度节点向所述资源的资源汇报节点进行注册，并接收所述资源的资源汇报节点根据预定规则汇报的空闲资源。The resource scheduling node is registered with the resource reporting node of the resource, and receives the idle resource reported by the resource reporting node of the resource according to a predetermined rule.

本发明实施例还提供了一种去中心化资源调度***，包括：The embodiment of the invention further provides a decentralized resource scheduling system, including:

接收模块，设置为利用集群接入节点接收作业；a receiving module, configured to receive a job by using a cluster access node;

第一处理模块，设置为利用资源调度策略节点根据当前资源负载情况和所述作业中的作业特征和用户特征得到对应的资源调度节点信息；a first processing module, configured to use the resource scheduling policy node to obtain corresponding resource scheduling node information according to the current resource load situation and the job feature and the user feature in the job;

第二处理模块，设置为若不存在与所述资源调度节点信息相对应的资源调度节点，则生成所述资源调度节点，利用所述资源调度节点为所述作业调度资源。And a second processing module, configured to: if there is no resource scheduling node corresponding to the resource scheduling node information, generate the resource scheduling node, and use the resource scheduling node to schedule resources for the job.

可选地，所述去中心化资源调度***还包括：Optionally, the decentralized resource scheduling system further includes:

配置模块，设置为在所述接收模块执行操作之前，配置预设数量预启动的所述资源调度节点；a configuration module, configured to configure a preset number of pre-launched resource scheduling nodes before the receiving module performs an operation;

启动模块，设置为在***启动时，启动配置的所述预设数量的所述资源调度节点。The startup module is configured to initiate the configured number of the resource scheduling nodes of the configuration when the system is started.

关闭模块，设置为在所述第二处理模块执行操作之后，若预设时间段内生成的所述资源调度节点没有为另一作业调度资源，则自动关闭生成的所述资源调度节点。The module is closed, and is configured to automatically shut down the generated resource scheduling node if the resource scheduling node generated in the preset time period does not schedule resources for another job after the second processing module performs an operation.

可选地，所述第一处理模块包括：Optionally, the first processing module includes:

第一处理子模块，设置为根据所述作业特征得到所述作业的数据本地性较好的资源调度节点集合；a first processing submodule, configured to obtain, according to the job feature, a resource scheduling node set with better data locality of the job;

第二处理子模块，设置为根据所述用户特征得到资源的限制约束；a second processing submodule, configured to obtain a resource constraint according to the user feature;

第三处理子模块，设置为根据预设策略，结合所述当前资源负载情况、资源调度节点集合和资源的限制约束得到对应的资源调度节点信息。The third processing submodule is configured to obtain corresponding resource scheduling node information according to the preset policy, combining the current resource load situation, the resource scheduling node set, and the resource constraint constraint.

可选地，所述第二处理模块包括：Optionally, the second processing module includes:

第四处理子模块，设置为利用所述资源调度策略节点随机选取一个隶属于所述资源调度节点的作业运算节点，并通知该作业运算节点启动一个容器来运行所述资源调度节点。a fourth processing submodule, configured to randomly select one of the resource scheduling by using the resource scheduling policy node The node's job operation node notifies the job operation node to start a container to run the resource scheduling node.

第五处理子模块，设置为利用所述资源调度节点向资源汇报节点进行注册，并接收所述资源汇报节点根据预定规则汇报的空闲资源；a fifth processing submodule, configured to use the resource scheduling node to register with the resource reporting node, and receive an idle resource reported by the resource reporting node according to a predetermined rule;

调度子模块，设置为利用所述资源调度节点对作业控制节点根据切分的所述作业申请的资源进行调度。The scheduling sub-module is configured to use the resource scheduling node to schedule, by the job control node, the resource of the job request according to the segmentation.

第一通知模块，设置为所述调度子模块执行操作之前，利用所述资源调度节点通知与所述空闲资源对应的作业运算节点启动容器运行所述作业控制节点。The first notification module is configured to notify the job operation node corresponding to the idle resource to start the container to run the job control node by using the resource scheduling node before the operation of the scheduling sub-module.

获取模块，设置为利用资源调度概览节点实时获取资源负载情况。The acquisition module is set to use the resource scheduling overview node to obtain the resource load situation in real time.

第三处理模块，设置为所述第二处理模块执行操作之后，利用作业控制节点将调度的资源分配给所述作业中具体的任务，并通知与调度的所述资源相对应的作业运算节点启动容器运行所述任务。a third processing module, configured to allocate, by the job control node, the scheduled resource to a specific task in the job after the operation is performed by the second processing module, and notify the job operation node corresponding to the scheduled resource to start The container runs the task.

第二通知模块，设置为所述第三处理模块执行操作之后，在所述容器中的任务执行完毕后，利用所述作业运算节点通知所述作业控制节点关闭所述容器。The second notification module is configured to notify the job control node to close the container by using the job operation node after the task in the container is executed after the third processing module performs an operation.

第四处理模块，设置为所述第二通知模块执行操作之后，在所述作业中的所有任务均执行完毕后，利用所述作业控制节点通知所述资源调度节点资源释放，并向所述资源调度节点申请关闭所述作业控制节点。a fourth processing module, configured to: after the second notification module performs an operation, after the all tasks in the job are executed, use the job control node to notify the resource scheduling node of resource release, and send the resource to the resource The scheduling node requests to close the job control node.

建立模块，设置为所述第二处理模块执行操作之后，在所述资源重启后，利用资源调度概览节点根据所述资源的限制约束重新与对应的所述资源调度节点建立关系。After the module is configured to perform the operation, after the resource is restarted, the resource scheduling overview node re-establishes a relationship with the corresponding resource scheduling node according to the resource constraint constraint.

可选地，所述建立模块包括：Optionally, the establishing module includes:

启动子模块，设置为在所述资源重启后，启动所述资源的作业控制节点和资源汇报节点；Activating a submodule, configured to start a job control node and a resource reporting node of the resource after the resource is restarted;

通知子模块，设置为利用所述资源汇报节点通知所述资源调度概览节点所述资源可用；Notifying a sub-module, configured to notify the resource scheduling overview node that the resource is available by using the resource reporting node;

查找子模块，设置为利用所述资源调度概览节点依据所述资源的限制约束查找到对应的所述资源调度节点；Locating a submodule, configured to use the resource scheduling overview node to find a corresponding one according to a constraint constraint of the resource The resource scheduling node;

第六处理子模块，设置为利用所述资源调度节点向所述资源的资源汇报节点进行注册，并接收所述资源的资源汇报节点根据预定规则汇报的空闲资源。The sixth processing submodule is configured to use the resource scheduling node to register with the resource reporting node of the resource, and receive the idle resource reported by the resource reporting node of the resource according to a predetermined rule.

在本发明实施例中，还提供了一种计算机存储介质，该计算机存储介质可以存储有执行指令，该执行指令用于执行上述去中心化资源调度方法。In an embodiment of the present invention, a computer storage medium is further provided, where the computer storage medium may store an execution instruction, where the execution instruction is used to execute the decentralized resource scheduling method.

本发明实施例的上述技术方案的有益效果如下：The beneficial effects of the above technical solutions of the embodiments of the present invention are as follows:

上述方案中，所述去中心化资源调度方法通过动态生成资源调度节点，能消除作业并发容量的限制，达到比集群联合更强的作业承载能力；并且由于资源调度节点是动态生成的，所以可以根据集群实际资源总量，数据本地性，租户特定等情况动态调整，实现动态负载均衡，性能及租户特性等需求，较好的解决了现有技术中大量作业并发的问题。In the foregoing solution, the decentralized resource scheduling method can dynamically generate a resource scheduling node, which can eliminate the limitation of the concurrent capacity of the operation, and achieve a stronger job carrying capacity than the clustering joint; and since the resource scheduling node is dynamically generated, According to the actual resources of the cluster, data locality, tenant specific conditions, etc., dynamic load balancing, performance and tenant characteristics, etc., better solve the problem of a large number of concurrent operations in the prior art.

附图说明DRAWINGS

图1为本发明实施例一的去中心化资源调度方法流程示意图；1 is a schematic flowchart of a decentralized resource scheduling method according to Embodiment 1 of the present invention;

图2为本发明实施例一的各节点连接关系示意图；2 is a schematic diagram of connection relationships of nodes according to Embodiment 1 of the present invention;

图3为本发明实施例一的作业提交到动态资源调度节点流程示意图；3 is a schematic flowchart of submitting a job to a dynamic resource scheduling node according to Embodiment 1 of the present invention;

图4为本发明实施例一的资源汇报节点向多个资源调度节点汇报空闲资源流程示意图；4 is a schematic flowchart of a resource reporting node reporting idle resources to multiple resource scheduling nodes according to Embodiment 1 of the present invention;

图5为本发明实施例一的宕机资源重启参与调度流程示意图；FIG. 5 is a schematic flowchart of a restarting participation scheduling process of a downtime resource according to Embodiment 1 of the present invention; FIG.

图6为本发明实施例一的资源调度节点高可用性流程示意图；6 is a schematic diagram of a high availability process of a resource scheduling node according to Embodiment 1 of the present invention;

图7为本发明实施例二的去中心化资源调度***构成示意图。FIG. 7 is a schematic structural diagram of a decentralized resource scheduling system according to Embodiment 2 of the present invention.

具体实施方式detailed description

为使本发明要解决的技术问题、技术方案和优点更加清楚，下面将结合附图及具体实施例进行详细描述。The technical problems, the technical solutions, and the advantages of the present invention will be more clearly described in the following description.

本发明针对现有的技术中大量作业并发的问题，提供了多种解决方案，具体如下：The present invention provides a plurality of solutions to the problems of a large number of concurrent operations in the prior art, as follows:

实施例一Embodiment 1

如图1所示，本发明实施例一提供的去中心化资源调度方法包括：As shown in FIG. 1, the decentralized resource scheduling method provided in Embodiment 1 of the present invention includes:

步骤11：利用集群接入节点接收作业； Step 11: receiving the job by using the cluster access node;

步骤12：利用资源调度策略节点根据当前资源负载情况和所述作业中的作业特征和用户特征得到对应的资源调度节点信息；Step 12: The resource scheduling policy node obtains corresponding resource scheduling node information according to the current resource load situation and the job feature and the user feature in the job;

步骤13：若不存在与所述资源调度节点信息相对应的资源调度节点，则生成所述资源调度节点，利用所述资源调度节点为所述作业调度资源。Step 13: If there is no resource scheduling node corresponding to the resource scheduling node information, generate the resource scheduling node, and use the resource scheduling node to schedule resources for the job.

本发明实施例一提供的所述去中心化资源调度方法通过动态生成资源调度节点，能消除作业并发容量的限制，达到比集群联合更强的作业承载能力；并且由于资源调度节点是动态生成的，所以可以根据集群实际资源总量，数据本地性，租户特定等情况动态调整，实现动态负载均衡，性能及租户特性等需求，较好的解决了现有技术中大量作业并发的问题。The decentralized resource scheduling method provided by the first embodiment of the present invention can dynamically generate a resource scheduling node, can eliminate the limitation of the concurrent capacity of the operation, and achieve a stronger job carrying capacity than the cluster joint; and because the resource scheduling node is dynamically generated Therefore, the dynamic resource balance, the performance of the tenant, and the like can be dynamically adjusted according to the total resource resources of the cluster, the data locality, and the tenant specific conditions, thereby better solving the problem of a large number of concurrent operations in the prior art.

为了进一步提高运行性能，节省时间，在所述利用集群接入节点接收作业之前，所述去中心化资源调度方法还包括：配置预设数量预启动的所述资源调度节点；在***启动时，启动配置的所述预设数量的所述资源调度节点。In order to further improve the running performance and save time, the decentralized resource scheduling method further includes: configuring a preset number of pre-launched resource scheduling nodes before the system is started; Initiating the configured number of the resource scheduling nodes of the configuration.

进一步的，在所述利用所述资源调度节点为所述作业调度资源之后，所述去中心化资源调度方法还包括：若预设时间段内生成的所述资源调度节点没有为另一作业调度资源，则自动关闭生成的所述资源调度节点。Further, after the resource scheduling node uses the resource scheduling node to schedule resources for the job, the decentralized resource scheduling method further includes: if the resource scheduling node generated in the preset time period is not scheduled for another job The resource automatically closes the generated resource scheduling node.

具体的，所述利用资源调度策略节点根据当前资源负载情况和所述作业中的作业特征和用户特征得到对应的资源调度节点信息的步骤包括：根据所述作业特征得到所述作业的数据本地性较好的资源调度节点集合；根据所述用户特征得到资源的限制约束；根据预设策略，结合所述当前资源负载情况、资源调度节点集合和资源的限制约束得到对应的资源调度节点信息。Specifically, the step of using the resource scheduling policy node to obtain corresponding resource scheduling node information according to the current resource loading situation and the job feature and the user feature in the job includes: obtaining data locality of the job according to the job feature. a better resource scheduling node set; obtaining resource constraint constraints according to the user feature; and corresponding resource scheduling node information according to the preset resource, combining the current resource load situation, the resource scheduling node set, and the resource constraint constraint.

其中，所述生成所述资源调度节点的步骤包括：利用所述资源调度策略节点随机选取一个隶属于所述资源调度节点的作业运算节点，并通知该作业运算节点启动一个容器来运行所述资源调度节点。The step of generating the resource scheduling node includes: randomly selecting, by the resource scheduling policy node, a job computing node that belongs to the resource scheduling node, and notifying the job computing node to start a container to run the resource. Scheduling nodes.

考虑到本申请中一个作业运算节点可能对应于多个资源调度节点，本发明实施例中，所述利用所述资源调度节点为所述作业调度资源的步骤包括：利用所述资源调度节点向资源汇报节点进行注册，并接收所述资源汇报节点根据预定规则(涉及各资源调度节点的负载、权重)汇报的空闲资源；利用所述资源调度节点对作业控制节点根据切分的所述作业申请的资源进行调度。In the embodiment of the present invention, in the embodiment of the present invention, the step of using the resource scheduling node to schedule resources for the job includes: using the resource scheduling node to allocate resources to the resource. The reporting node performs registration, and receives an idle resource reported by the resource reporting node according to a predetermined rule (loads and weights related to each resource scheduling node); and the job scheduling node uses the job scheduling node to apply for the job according to the segmentation Resources are scheduled.

进一步的，在所述利用所述资源调度节点对作业控制节点根据切分的所述作业申请的资源进行调度之前，所述去中心化资源调度方法还包括：利用所述资源调度节点通知与所述空闲资源对应的作业运算节点启动容器运行所述作业控制节点。Further, the decentralized resource scheduling method further includes: using the resource scheduling node to notify the user of the resource scheduling node, before using the resource scheduling node to schedule the resource of the job request according to the segmentation The job operation node corresponding to the idle resource starts the container to run the job control node.

为了便于得到当前资源负载情况，本发明实施例中，所述去中心化资源调度方法还包括：利用资源调度概览节点实时获取资源负载情况。In order to facilitate the current resource load situation, in the embodiment of the present invention, the decentralized resource scheduling method further includes: using a resource scheduling overview node to obtain a resource load situation in real time.

为了方案的完整性，在所述利用所述资源调度节点为所述作业调度资源之后，所述去中心化资源调度方法还包括：利用作业控制节点将调度的资源分配给所述作业中具体的任务，并通知与调度的所述资源相对应的作业运算节点启动容器运行所述任务。For the integrity of the solution, after the resource scheduling node is used to schedule resources for the job, the going The method for scheduling a cardiac resource further includes: assigning, by the job control node, the scheduled resource to a specific task in the job, and notifying the job computing node corresponding to the scheduled resource to start the container to run the task.

进一步的，在所述通知与调度的所述资源相对应的作业运算节点启动容器运行所述任务之后，所述去中心化资源调度方法还包括：在所述容器中的任务执行完毕后，利用所述作业运算节点通知所述作业控制节点关闭所述容器。Further, after the job operation node corresponding to the scheduled resource starts to run the task, the decentralized resource scheduling method further includes: after the task in the container is executed, using The job computing node notifies the job control node to close the container.

更进一步的，在所述利用所述作业运算节点通知所述作业控制节点关闭所述容器之后，所述去中心化资源调度方法还包括：在所述作业中的所有任务均执行完毕后，利用所述作业控制节点通知所述资源调度节点资源释放，并向所述资源调度节点申请关闭所述作业控制节点。Further, after the using the job operation node to notify the job control node to close the container, the decentralized resource scheduling method further includes: after all tasks in the job are executed, using The job control node notifies the resource scheduling node of resource release, and requests the resource scheduling node to close the job control node.

考虑到资源可能出现宕机的情况，在所述利用所述资源调度节点为所述作业调度资源之后，所述去中心化资源调度方法还包括：在所述资源重启后，利用资源调度概览节点根据所述资源的限制约束重新与对应的所述资源调度节点建立关系。The decentralized resource scheduling method further includes: after the resource is restarted, using the resource scheduling overview node, after the resource scheduling node is used to schedule the resource for the job, Reestablishing a relationship with the corresponding resource scheduling node according to the constraint constraint of the resource.

具体的，所述在所述资源重启后，利用资源调度概览节点根据所述资源的限制约束重新与对应的所述资源调度节点建立关系的步骤包括：在所述资源重启后，启动所述资源的作业控制节点和资源汇报节点；利用所述资源汇报节点通知所述资源调度概览节点所述资源可用；利用所述资源调度概览节点依据所述资源的限制约束查找到对应的所述资源调度节点；利用所述资源调度节点向所述资源的资源汇报节点进行注册，并接收所述资源的资源汇报节点根据预定规则(涉及各资源调度节点的负载、权重)汇报的空闲资源。Specifically, after the resource is restarted, the step of reestablishing a relationship with the corresponding resource scheduling node by using the resource scheduling overview node according to the resource constraint constraint includes: starting the resource after the resource restarts The job control node and the resource reporting node; the resource reporting node is used to notify the resource scheduling overview node that the resource is available; and the resource scheduling overview node is used to find the corresponding resource scheduling node according to the resource constraint constraint And using the resource scheduling node to register with the resource reporting node of the resource, and receiving the idle resource reported by the resource reporting node of the resource according to a predetermined rule (including the load and weight of each resource scheduling node).

下面对本发明实施例一提供的去中心化资源调度方法进行具体说明。The decentralized resource scheduling method provided in Embodiment 1 of the present invention is specifically described below.

其中，主要涉及如下三个功能:Among them, it mainly involves the following three functions:

1.集群拓扑的维护:1. Maintenance of the cluster topology:

集群新引入了集群接入代理角色。集群接入代理角色收集集群中所有的计算节点角色的信息，从而维护集群的拓扑结构。计算节点角色是通过心跳和集群接入代理角色保持连接的。集群接入代理角色还承担作业提交端的接入功能，当收到提交的作业时，根据作业特点和租户特点将作业提交到已存在的资源调度节点上或新建相应的资源调度节点。The cluster newly introduced the cluster access agent role. The cluster access agent role collects information about all compute node roles in the cluster to maintain the topology of the cluster. The compute node role is maintained through the heartbeat and cluster access agent roles. The cluster access agent role also assumes the access function of the job submitting end. When receiving the submitted job, the job is submitted to the existing resource scheduling node or the corresponding resource scheduling node according to the job characteristics and tenant characteristics.

2.资源调度节点的动态创建：2. Dynamic creation of resource scheduling nodes:

在现有的***中资源调度节点是静态配置的,在***启动时就运行起来。而本发明的***中资源调度节点是动态创建的。In existing systems, resource scheduling nodes are statically configured and run when the system boots. The resource scheduling node in the system of the present invention is dynamically created.

当集群接入代理角色发现需要新建资源调度节点时,其会根据该资源调度节点所属的计算节点角色列表中随机抽取一个节点,通知该节点启动一个容器来运行资源调度节点。资源调度节点运行后可以设置策略决定是否在空闲一段时间后自行销毁。When the cluster access agent role finds that a new resource scheduling node needs to be created, it randomly extracts a node according to the computing node role list to which the resource scheduling node belongs, and notifies the node to start a container to run the resource scheduling node. After the resource scheduling node runs, you can set a policy to determine whether to destroy itself after a period of idle time.

3.计算节点资源状态的选择性汇报： 3. Calculate the selective reporting of node resource status:

一个计算节点可能归属于多个资源调度节点,为此其资源状态不能全部汇报给所有的资源调度节点。而应该根据资源调度节点之间的权重，分别汇报不同数量的资源。A computing node may belong to multiple resource scheduling nodes, for which the resource status cannot be reported to all resource scheduling nodes. Instead, different numbers of resources should be reported separately according to the weights between the resource scheduling nodes.

主要涉及如下节点：Mainly related to the following nodes:

首先是现有资源调度方案运行中的已有节点：The first is the existing node in the operation of the existing resource scheduling scheme:

1.资源调度节点：1. Resource scheduling node:

为作业的运行提供资源调度的功能。该节点获取属下节点的资源汇报节点上报的资源,并根据既定的策略规则调度给作业运行。Provides resource scheduling capabilities for the operation of the job. The node obtains the resource reported by the resource reporting node of the subordinate node, and schedules the operation to the job according to the established policy rule.

2.作业控制节点：2. Job Control Node:

资源调度节点收到作业请求后一旦有空闲资源，将会在资源对应的节点上启动一个作业控制节点。后续作业中任务的资源请求，任务的执行和容错等都由该作业控制节点完成。引入该节点的原因是为了降低资源调度节点的负荷，并且可以支持多种多样的作业类型。当作业完成后，该节点可在汇报资源调度节点后销毁。After the resource scheduling node receives the job request, once there is an idle resource, a job control node will be started on the node corresponding to the resource. The resource request of the task in the subsequent job, the execution of the task and the fault tolerance are all completed by the job control node. The reason for introducing this node is to reduce the load of the resource scheduling node and to support a wide variety of job types. When the job is completed, the node can be destroyed after reporting the resource scheduling node.

3.作业运算节点：3. Job computing node:

负责接收作业控制节点的任务运算请求，并在申请的资源容器中运行。Responsible for receiving the task operation request of the job control node and running it in the resource container of the application.

为了实现资源调度角色的动态创建和销毁,及对作业提交端的屏蔽，本方案还需要如下节点：In order to realize the dynamic creation and destruction of the resource scheduling role and the shielding of the job submission end, the solution also needs the following nodes:

4.集群接入节点：4. Cluster access node:

负责收集集群中所有作业运算节点的服务可用状态,并在收到作业提交端请求时,根据作业特点和租户特点将作业提交到已存在的资源调度角色上或新建相应的资源调度角色。Responsible for collecting the service availability status of all job computing nodes in the cluster, and when receiving the job submission request, submit the job to the existing resource scheduling role or create a corresponding resource scheduling role according to the characteristics of the job and the characteristics of the tenant.

5.资源调度概览节点：5. Resource scheduling overview node:

负责获取当前运行中的所有资源调度节点的使用状况,并汇总为整个集群的使用信息。Responsible for obtaining the usage status of all resource scheduling nodes in the current running, and summarizing the usage information of the entire cluster.

6.资源调度策略节点：6. Resource scheduling strategy node:

负责根据作业特点和租户特点，并结合当前各资源调度节点的负载计算作业所对应的资源调度节点,如果该资源调度节点不存在,则先通知作业运算节点去创建一个。Responsible for calculating the resource scheduling node corresponding to the job according to the characteristics of the job and the characteristics of the tenant, and combining the load of the current resource scheduling node. If the resource scheduling node does not exist, first notify the job computing node to create one.

7.资源汇报节点：7. Resource reporting node:

在现有的方案中资源汇报是由作业运算节点代为实现的，但是现在对于同一个作业运算节点来说，可能对应着多个资源调度节点，所以资源应该汇报给哪个资源调度节点将是需要决策的事情。为此本方案采用资源汇报节点结合各资源调度节点的负载，权重等策略方式将资源拆分汇报给各资源调度节点。In the existing scheme, resource reporting is implemented by the job computing node, but now for the same job computing node, it may correspond to multiple resource scheduling nodes, so which resource scheduling node should be reported to the resource scheduling node will need to be decided. Things. To this end, the resource reporting node is combined with the load, weight and other policy modes of each resource scheduling node to report the resource split to each resource scheduling node.

各节点角色之间的串联关系，如图2所示： The concatenation between the roles of each node is shown in Figure 2:

集群接入节点,资源调度概览节点,资源调度策略节点这三个节点存在于集群接入代理角色中。不考虑高可用的话，该角色可以只有一个。资源调度预览节点收集资源汇报节点的服务状态，了解计算节点的服务是否可用，从而得到集群总拓扑图。另外其还接收当前运行中的资源调度节点上报的该调度节点的使用信息，从而汇总得到集群总资源使用情况，和各租户的使用情况。集群接入节点收到作业提交请求,并提交给资源调度策略节点。资源调度策略节点将结合资源调度概览节点反馈的资源负载情况，作业特点及租户特点为作业分配已有的资源调度节点或新建资源调度节点。The cluster access node, resource scheduling overview node, and resource scheduling policy node exist in the cluster access agent role. This role can have only one, regardless of high availability. The resource scheduling preview node collects the service status of the resource reporting node, and knows whether the service of the computing node is available, thereby obtaining a total topology map of the cluster. In addition, it also receives the usage information of the scheduling node reported by the currently running resource scheduling node, thereby collecting the total resource usage of the cluster and the usage of each tenant. The cluster access node receives the job submission request and submits it to the resource scheduling policy node. The resource scheduling policy node will allocate the existing resource scheduling node or the newly created resource scheduling node to the job according to the resource load situation, the job characteristics and the tenant characteristics fed back by the resource scheduling overview node.

资源汇报节点和作业运算节点处于计算节点角色中,且集群中的每个节点上都有一个该角色。资源汇报节点可以和资源调度概览节点交互,以告知该节点服务可用；还可以和多个资源调度节点交互，根据一定的策略将节点上的可用资源汇报给某个资源调度节点，以供其分配到具体的作业。作业运算节点负责接收容器运行请求，以请求中申请的资源运行一个容器，并在容器中运行作业的任务，或作业控制节点，或资源调度节点。The resource reporting node and the job computing node are in the compute node role and have one role on each node in the cluster. The resource reporting node may interact with the resource scheduling overview node to inform the node that the service is available; and may interact with multiple resource scheduling nodes to report available resources on the node to a resource scheduling node for distribution according to a certain policy. Go to specific assignments. The job operation node is responsible for receiving the container operation request, running a container for the resource requested in the request, and running the job task, or the job control node, or the resource scheduling node in the container.

资源调度节点和作业控制节点一般都是临时的，有生命周期的。这两个节点都可以启动多个，并都是在作业运算节点启动的容器中运行。作业被提交后将通过集群接入节点,资源调度策略节点后转交到资源调度节点。资源调度节点再通知作业运算节点启动一个容器来运行作业控制节点,接下来就只负责资源的申请和分配。而作业控制节点运行后将负责作业中任务的运行依赖及容错等一系列事宜。Resource scheduling nodes and job control nodes are generally temporary and have lifecycles. Both of these nodes can be started and run in a container started by the job compute node. After the job is submitted, it will pass through the cluster access node, and the resource scheduling policy node will be forwarded to the resource scheduling node. The resource scheduling node then notifies the job computing node to start a container to run the job control node, and then only responsible for resource application and allocation. After the job control node is running, it will be responsible for a series of tasks such as operational dependencies and fault tolerance of tasks in the job.

当作业提交端提交一个作业后，作业的运行流程主要包括以下步骤：After the job submitter submits a job, the running process of the job mainly includes the following steps:

第一步:各节点的资源汇报节点向资源调度概览节点汇报节点服务可用情况,资源调度概览节点汇总得到集群的资源拓扑图。The first step: the resource reporting node of each node reports the availability of the node service to the resource scheduling overview node, and the resource scheduling overview node summarizes the resource topology map of the cluster.

第二步：集群接入节点收到作业提交请求，将检查作业的特点，获得其数据本地性较好的节点集合,再检查租户的特点，获得资源的限制约束,然后资源调度策略节点结合资源调度概览节点的资源使用情况,将作业分配到相应的资源调度节点上。如该调度节点不存在，资源调度策略节点随机选取该调度节点属下某个节点的作业运算节点,通知其启动一个容器来运行该资源调度节点。启动成功后，作业被转交到资源调度节点，且之后的作业提交端将直接和该资源调度节点联系，以减少集群接入节点的压力。Step 2: Upon receiving the job submission request, the cluster access node checks the characteristics of the job, obtains a set of nodes with better data locality, checks the characteristics of the tenant, obtains the resource constraint, and then combines the resources with the resource scheduling policy node. Scheduling the resource usage of the overview node and assigning the job to the corresponding resource scheduling node. If the scheduling node does not exist, the resource scheduling policy node randomly selects a working operation node of a node under the scheduling node, and notifies it to start a container to run the resource scheduling node. After the startup is successful, the job is forwarded to the resource scheduling node, and the subsequent job submitting end will directly contact the resource scheduling node to reduce the pressure on the cluster access node.

具体如图3所示，包括：Specifically, as shown in Figure 3, it includes:

步骤31：作业提交；Step 31: Submit the job;

步骤32：集群接入节点收到作业提交请求；Step 32: The cluster access node receives the job submission request.

步骤33：资源调度策略节点根据作业特点、租户特点和当前资源负载状况计算出合适的资源调度节点；Step 33: The resource scheduling policy node calculates an appropriate resource scheduling node according to the job characteristics, the tenant characteristics, and the current resource load status.

步骤34：判断计算出的资源调度节点是否已存在，若是，进入步骤35，若否，进入步骤36； Step 34: Determine whether the calculated resource scheduling node already exists, and if so, proceed to step 35, and if no, proceed to step 36;

步骤35：作业提交到合适的资源调度节点；Step 35: Submit the job to the appropriate resource scheduling node;

步骤36：通知该资源调度节点属下某一资源的作业运算节点启动容器来运行资源调度节点；Step 36: Notifying a job operation node of a resource belonging to the resource scheduling node to start a container to run the resource scheduling node;

步骤37：判断该资源调度节点是否运行成功，若是，进入步骤38，若否，返回步骤36；Step 37: Determine whether the resource scheduling node is successfully run, if yes, proceed to step 38, and if not, return to step 36;

步骤38：通知资源调度策略节点，进入步骤35。Step 38: Notify the resource scheduling policy node, and proceed to step 35.

第三步：资源调度节点运行后向其属下的各资源汇报节点注册,各资源汇报节点在节点上有空闲资源时根据负载策略向其上注册的各资源调度节点汇报部分空闲资源。Step 3: After the resource scheduling node is running, the resource reporting nodes are registered with each of the resource reporting nodes, and each resource reporting node reports some idle resources to each resource scheduling node registered thereon according to the load policy when there are idle resources on the node.

具体如图4所示，包括：As shown in Figure 4, it includes:

步骤41：资源有空闲；Step 41: The resource is idle;

步骤42：判断资源是否归属于已有的资源调度节点，若否，进入步骤43，若是，进入步骤44；Step 42: Determine whether the resource belongs to the existing resource scheduling node, if not, proceed to step 43, if yes, proceed to step 44;

步骤43：不处理，结束流程；Step 43: No processing, ending the process;

步骤44：计算各级资源调度列表的已满足资源比例；Step 44: Calculate the proportion of the satisfied resources of the resource scheduling list at each level;

步骤45：根据不同级别的权重换算为统一的已满足资源比例；Step 45: Convert the weights according to different levels into a unified proportion of the satisfied resources;

步骤46：选择已满足资源比例最小的队列分配资源；Step 46: Select a queue allocation resource that has the smallest proportion of resources;

步骤47：计算该级别队列中各资源调度节点的已满足资源比例；Step 47: Calculate the proportion of the satisfied resources of each resource scheduling node in the level queue;

步骤48：选择已满足资源比例最小的资源调度节点分配资源；Step 48: Select a resource scheduling node that has the smallest proportion of resources to allocate resources.

步骤49：判断是否仍有资源空闲，若是，返回步骤44，若否，结束流程。Step 49: Determine whether there is still resource idle, if yes, return to step 44, and if no, end the process.

第四步：资源调度节点在收到属下节点汇报的空闲资源时，将通知该资源对应节点的作业运算节点启动一个容器运行作业控制节点。Step 4: When receiving the idle resource reported by the subordinate node, the resource scheduling node notifies the job operation node of the corresponding node of the resource to start a container operation job control node.

第五步：作业控制节点将根据切分的任务向资源调度节点申请资源。Step 5: The job control node will apply for resources from the resource scheduling node according to the task of the segmentation.

第六步：资源调度节点在收到作业控制节点发来的资源请求时,一旦属下节点有空闲资源，将会将资源分配给作业控制节点。Step 6: When the resource scheduling node receives the resource request sent by the job control node, once the subordinate node has idle resources, the resource will be allocated to the job control node.

第七步：作业控制节点收到资源后将分配到具体的任务，并通知该资源对应节点的作业运算节点启动容器来运行任务。Step 7: After receiving the resource, the job control node will assign a specific task, and notify the job operation node of the corresponding node of the resource to start the container to run the task.

第八步：作业运算节点启动的容器中任务运行完毕后将会通知作业控制节点，并关闭容器。Step 8: After the task in the container started by the job operation node is finished running, the job control node will be notified and the container will be closed.

第九步：作业的所有任务运行完毕后作业控制节点将会通知资源调度节点,并向调度节点申请关闭作业控制节点。 Step 9: After all the tasks of the job have been run, the job control node will notify the resource scheduling node and apply to the scheduling node to close the job control node.

第十步：资源调度节点上的作业全部运行完成后,如是临时资源调度节点,且在一段时间内没有执行新的作业,将会自行关闭。Step 10: After all the operations on the resource scheduling node are completed, if it is a temporary resource scheduling node, and the new job is not executed for a period of time, it will be closed by itself.

对于资源宕机的情况，本方案提供如图5所示的措施，包括：For the case of resource downtime, the program provides the measures shown in Figure 5, including:

步骤51：有资源加入；Step 51: There are resources to join;

步骤52：该资源的作业控制节点和资源汇报节点启动；Step 52: The job control node and the resource reporting node of the resource are started;

步骤53：资源汇报节点通知资源调度概览节点新资源服务可用；Step 53: The resource reporting node notifies the resource scheduling overview node that the new resource service is available;

步骤54：判断属下是否有资源调度节点规模不足；若否，进入步骤55，若是，进入步骤56；Step 54: Determine whether there is insufficient resource scheduling node under the subordinate; if not, proceed to step 55, and if yes, proceed to step 56;

步骤55：该资源等待资源调度概览节点分配给新的资源调度节点；Step 55: The resource waits for the resource scheduling overview node to be allocated to the new resource scheduling node;

步骤56：判断该资源是否可归属于已有的资源调度节点，若是，进入步骤57，若否，返回步骤55；Step 56: Determine whether the resource can be attributed to the existing resource scheduling node, and if so, proceed to step 57, and if not, return to step 55;

步骤57：对应的资源调度节点去新增资源的资源汇报节点注册；Step 57: The corresponding resource scheduling node goes to the resource reporting node of the newly added resource to register;

步骤58：新增资源向对应的资源调度节点汇报可用资源。Step 58: Add a resource to report the available resources to the corresponding resource scheduling node.

为了实现资源调度节点的高可用性，本方案提供如图6所示的措施，包括：In order to achieve high availability of the resource scheduling node, the solution provides the measures shown in FIG. 6, including:

步骤61：作业提交；Step 61: submit the job;

步骤62：集群接入节点收到作业提交请求；Step 62: The cluster access node receives the job submission request.

步骤63：资源调度策略节点根据作业特点、租户特点和当前资源负载状况计算出合适的主资源调度节点；Step 63: The resource scheduling policy node calculates an appropriate primary resource scheduling node according to the job characteristics, the tenant characteristics, and the current resource load status.

步骤64：在主资源调度节点不存在时，先启动该主资源调度节点；Step 64: When the primary resource scheduling node does not exist, first start the primary resource scheduling node;

步骤65：将作业提交到该主资源调度节点；Step 65: Submit the job to the primary resource scheduling node;

步骤66：判断对应的备资源调度节点是否正常，若是，进入步骤67，若否，进入步骤68；Step 66: Determine whether the corresponding standby resource scheduling node is normal, and if yes, go to step 67, if no, go to step 68;

步骤67：与主资源调度节点同步作业状态，并在主资源调度节点出现问题时，提供服务，升级为主资源调度节点，启用下一对应备资源调度节点，返回步骤66；Step 67: Synchronize the job status with the primary resource scheduling node, and provide a service when the primary resource scheduling node has a problem, upgrade the primary resource scheduling node, enable the next corresponding standby resource scheduling node, and return to step 66;

步骤68：在不同机房的资源上启动另一备用资源调度节点，返回步骤66。Step 68: Start another standby resource scheduling node on resources of different equipment rooms, and return to step 66.

综上可知，采用本发明实施例提供的方案，与现有方案相比，节点的扩展性更强，人工干预少,节点动态增加减少可以不停服务；集群计算的本地性较好,可以计算数据所在的节点群归属于一个资源调度节点进行计算；集群的资源利用率高，支持集中集群所有的资源来运行作业；集群的资源控制比较容易实现对多租户的支持。In summary, according to the solution provided by the embodiment of the present invention, compared with the existing solution, the node has more extensibility, less manual intervention, and the node dynamic increase is reduced, and the service can be continuously stopped; the locality of the cluster calculation is good and can be calculated. The node group where the data resides belongs to a resource scheduling node for calculation; the resource utilization of the cluster is high, and all the resources of the centralized cluster are supported to run the operation; the resource control of the cluster is relatively easy to implement support for multi-tenancy.

下面继续对本发明实施例一提供的去中心化资源调度方法可实现的功能进行举例说明。 The following describes the functions that can be implemented by the decentralized resource scheduling method provided by the first embodiment of the present invention.

举例1：多租户资源约束Example 1: Multi-tenant resource constraints

整个计算集群作为一个服务平台对外提供,可以在其上新建许多租户。租户之间可以有不同的优先级(如普通租户，VIP租户等)，不同的优先级租户见资源的使用限制不同。比如限定普通租户最多使用10个节点的资源，VIP租户最多使用100个节点的资源。The entire computing cluster is provided as a service platform, and many new tenants can be built on it. Tenants can have different priorities (such as ordinary tenants, VIP tenants, etc.), and different priority tenants see different resource usage restrictions. For example, the standard tenant is limited to use resources of up to 10 nodes, and the VIP tenant uses resources of up to 100 nodes.

当普通租户提交作业时,经过资源调度策略节点获得的资源调度节点属下最多只有10个节点。当VIP租户提交作业时,经过资源调度策略节点获得的资源调度节点属下最多只有100个节点。这样就可以保证VIP租户所能运行的最大资源要比普通租户大得多。甚至可以新建一种最高优先级的租户其提交作业可以使用到集群中所有节点的资源。而且由于资源调度节点属下的节点数目是可以动态增删的，很容易就可以扩展各租户可用资源最大限制。When a normal tenant submits a job, the resource scheduling node obtained by the resource scheduling policy node has a maximum of only 10 nodes. When a VIP tenant submits a job, the resource scheduling node obtained by the resource scheduling policy node has a maximum of 100 nodes. This ensures that the maximum resources that a VIP tenant can run are much larger than the average tenant. It is even possible to create a new highest priority tenant whose submit jobs can use resources to all nodes in the cluster. Moreover, since the number of nodes under the resource scheduling node can be dynamically added or deleted, it is easy to expand the maximum limit of available resources of each tenant.

由于运行中资源调度节点可能是很多个，因此其所属的节点很有可能存在重复，即一个计算节点对应多个资源调度节点。此时为了实现租户见得资源均衡，可以让计算节点的资源汇报节点在有空闲资源时拆分开向资源调度节点汇报。优先级相同的资源调度节点，保证其分配的资源量大致均衡，高优先级的资源调度节点将比低优先级的获得更多的资源分配量。Since there may be many resource scheduling nodes in operation, it is likely that there are duplicates of the nodes to which they belong, that is, one computing node corresponds to multiple resource scheduling nodes. In this case, in order to achieve tenant resource balance, the resource reporting node of the computing node may be split and reported to the resource scheduling node when there is an idle resource. The resource scheduling nodes with the same priority ensure that the allocated resources are roughly equalized, and the high-priority resource scheduling nodes will obtain more resource allocation than the lower priorities.

举例2：跨机房跨地域的集群部署Example 2: Cluster deployment across the computer room across regions

从整个集群的容错性考虑，将集群的硬件设备跨机房甚至跨地域部署，可以避免类似于停电，断网等一系列的故障，提升整个集群的健壮性。Considering the fault tolerance of the entire cluster, deploying cluster hardware devices across the equipment room or even across the area can avoid a series of faults such as power outages and network disconnection, and improve the robustness of the entire cluster.

为此可以考虑集群接入节点部署多个，作业端提交作业时轮询这几个接入节点的负载，一旦发现某个接入节点可用，且负载最小，就往该接入节点上提交作业。To this end, it can be considered that multiple deployment nodes are deployed, and the load of the several access nodes is polled when the operator submits the job. Once an access node is found to be available and the load is minimal, the operation is submitted to the access node. .

由于接入节点只是用来接收作业申请，并通过资源调度策略节点按照一致的策略分配作业到某个资源调度节点。接入节点之间并不会相互影响，所以多个接入节点的引入不会对集群造成影响，且还能提升集群的接入能力。Since the access node is only used to receive job requests, the resource scheduling policy node assigns jobs to a resource scheduling node according to a consistent policy. The access nodes do not affect each other. Therefore, the introduction of multiple access nodes does not affect the cluster, and the access capability of the cluster can be improved.

当一个机房发生故障后，其他机房不受影响，仍能正常提供服务。该机房节点中可能有运行资源调度节点，这些资源调度节点保证高可用的方法如下：资源调度策略节点给作业分配资源调度节点时,同时选择一个不在同一机房的节点上启动一个备用资源调度节点,备用资源调度节点同步主资源调度节点中的作业状态,当主资源调度节点崩溃时接管过其下所有的作业,同时再向资源调度策略节点申请一个新的资源调度节点作为备用，且该备用资源调度节点在另外的机房节点上。When a computer room fails, other computer rooms are not affected, and services can still be provided normally. There may be a resource scheduling node in the equipment room node. The method for ensuring high availability of the resource scheduling nodes is as follows: when the resource scheduling policy node allocates a resource scheduling node to the job, simultaneously select a node that is not in the same equipment room to start an alternate resource scheduling node. The standby resource scheduling node synchronizes the job status in the primary resource scheduling node, takes over all the jobs when the primary resource scheduling node crashes, and applies to the resource scheduling policy node for a new resource scheduling node as a backup, and the standby resource scheduling The node is on another room node.

举例3：固定资源调度节点数目的大数据计算***Example 3: Big data computing system for the number of fixed resource scheduling nodes

由于资源调度节点如果是***运行中动态生成的话，将会消耗一定的节点启动时间。如果对性能要求较高时，可以考虑固定资源调度节点的数目，并在***启动时将所有资源调度节点全部运行起来。这样的话可以提升一些作业运行的性能，尤其是小作业的性能。Since the resource scheduling node is dynamically generated during system operation, it will consume a certain node startup time. If the performance requirements are high, you can consider the number of fixed resource scheduling nodes, and run all resource scheduling nodes at system startup. This can improve the performance of some jobs, especially the performance of small jobs.

资源调度节点的数目可以是通过配置文件或命令的方式设置，并能动态生效。这样集群硬件扩容时可以不停服务很轻易地就实现集群的扩容。 The number of resource scheduling nodes can be set by means of a configuration file or a command, and can be dynamically activated. In this way, when the cluster hardware is expanded, the cluster can be easily expanded without any service.

举例4：支持多样化的资源调度节点Example 4: Support for diverse resource scheduling nodes

目前的大数据集群支持的资源调度节点只是固定好的一个，最多支持调度策略以插件的方式引入。而随着大数据技术的发展，目前已经出现了各种各样的资源调度节点能够在某些方面得到优化。而这些多样化地资源调度节点之前是无法同时运行于一个大数据集群的。现在通过资源调度节点动态创建功能，可以将多样化地调度节点同时运行起来。客户可以按照自己的喜好去使用相应的资源调度节点。The resource scheduling nodes supported by the current big data cluster are only fixed ones, and the maximum support scheduling policy is introduced by means of plug-ins. With the development of big data technology, various resource scheduling nodes have been optimized in some aspects. These diverse resource scheduling nodes were previously unable to run in a big data cluster at the same time. Now, through the dynamic creation function of the resource scheduling node, the diverse scheduling nodes can be run simultaneously. Customers can use the corresponding resource scheduling nodes according to their own preferences.

综上所述，本发明是要克服现有技术中存在的大量作业并发问题，提升集群的横向扩展能力和作业量扩展能力，并尽可能提供资源的利用率，负载的均衡性。In summary, the present invention overcomes the problems of a large number of operations existing in the prior art, improves the horizontal expansion capability and the capacity expansion capability of the cluster, and provides resource utilization and load balance as much as possible.

因此，提供了一种动态生成资源调度角色的方法，通过该角色的动态生成，能消除作业并发容量的限制，达到比集群联合更强的作业承载能力；并且由于资源调度角色是动态生成的，所以可以根据集群实际资源总量，数据本地性，租户特定等情况动态调整，实现动态负载均衡，性能及租户特性等需求。Therefore, a method for dynamically generating a resource scheduling role is provided. Through the dynamic generation of the role, the limitation of the concurrent capacity of the job can be eliminated, and the job carrying capacity of the cluster is stronger than that of the cluster; and since the resource scheduling role is dynamically generated, Therefore, it can be dynamically adjusted according to the actual resources of the cluster, data locality, and tenant specific conditions to achieve dynamic load balancing, performance, and tenant characteristics.

实施例二Embodiment 2

如图7所示，本发明实施例二提供的去中心化资源调度***包括：As shown in FIG. 7, the decentralized resource scheduling system provided in Embodiment 2 of the present invention includes:

接收模块71，设置为利用集群接入节点接收作业；The receiving module 71 is configured to receive the job by using the cluster access node;

第一处理模块72，设置为利用资源调度策略节点根据当前资源负载情况和所述作业中的作业特征和用户特征得到对应的资源调度节点信息；The first processing module 72 is configured to use the resource scheduling policy node to obtain corresponding resource scheduling node information according to the current resource loading situation and the job feature and the user feature in the job;

第二处理模块73，设置为若不存在与所述资源调度节点信息相对应的资源调度节点，则生成所述资源调度节点，利用所述资源调度节点为所述作业调度资源。The second processing module 73 is configured to generate the resource scheduling node if the resource scheduling node corresponding to the resource scheduling node information does not exist, and use the resource scheduling node to schedule resources for the job.

本发明实施例二提供的所述去中心化资源调度***通过动态生成资源调度节点，能消除作业并发容量的限制，达到比集群联合更强的作业承载能力；并且由于资源调度节点是动态生成的，所以可以根据集群实际资源总量，数据本地性，租户特定等情况动态调整，实现动态负载均衡，性能及租户特性等需求，较好的解决了现有技术中大量作业并发的问题。The decentralized resource scheduling system provided by the second embodiment of the present invention can dynamically generate a resource scheduling node, can eliminate the limitation of the concurrent capacity of the operation, and achieve a stronger job carrying capacity than the clustering joint; and because the resource scheduling node is dynamically generated Therefore, the dynamic resource balance, the performance of the tenant, and the like can be dynamically adjusted according to the total resource resources of the cluster, the data locality, and the tenant specific conditions, thereby better solving the problem of a large number of concurrent operations in the prior art.

为了进一步提高运行性能，节省时间，所述去中心化资源调度***还包括：配置模块，设置为在所述接收模块执行操作之前，配置预设数量预启动的所述资源调度节点；启动模块，设置为在***启动时，启动配置的所述预设数量的所述资源调度节点。In order to further improve the running performance and save time, the decentralized resource scheduling system further includes: a configuration module, configured to configure a preset number of pre-launched resource scheduling nodes before the receiving module performs an operation; The set is configured to initiate the configured number of the resource scheduling nodes at system startup.

进一步的，所述去中心化资源调度***还包括：关闭模块，设置为在所述第二处理模块执行操作之后，若预设时间段内生成的所述资源调度节点没有为另一作业调度资源，则自动关闭生成的所述资源调度节点。Further, the decentralized resource scheduling system further includes: a shutdown module, configured to: after the second processing module performs an operation, if the resource scheduling node generated in the preset time period does not schedule resources for another job , the generated resource scheduling node is automatically closed.

具体的，所述第一处理模块包括：第一处理子模块，设置为根据所述作业特征得到所述作业的数据本地性较好的资源调度节点集合；第二处理子模块，设置为根据所述用户特征得到资源的限制约束；第三处理子模块，设置为根据预设策略，结合所述当前资源负载情况、资源调度节点集合和资源的限制约束得到对应的资源调度节点信息。 Specifically, the first processing module includes: a first processing sub-module, configured to obtain, according to the job feature, a resource scheduling node set with better data locality of the job; and the second processing sub-module is set according to the The user feature is restricted by the resource constraint; the third processing sub-module is configured to obtain the corresponding resource scheduling node information according to the preset resource, the current resource load situation, the resource scheduling node set, and the resource constraint constraint.

其中，所述第二处理模块包括：第四处理子模块，设置为利用所述资源调度策略节点随机选取一个隶属于所述资源调度节点的作业运算节点，并通知该作业运算节点启动一个容器来运行所述资源调度节点。The second processing module includes: a fourth processing sub-module, configured to randomly select a job operation node that belongs to the resource scheduling node by using the resource scheduling policy node, and notify the working computing node to start a container. Running the resource scheduling node.

考虑到本申请中一个作业运算节点可能对应于多个资源调度节点，本发明实施例中，所述第二处理模块包括：第五处理子模块，设置为利用所述资源调度节点向资源汇报节点进行注册，并接收所述资源汇报节点根据预定规则(涉及各资源调度节点的负载、权重)汇报的空闲资源；调度子模块，设置为利用所述资源调度节点对作业控制节点根据切分的所述作业申请的资源进行调度。In the embodiment of the present invention, the second processing module includes: a fifth processing sub-module, configured to use the resource scheduling node to report to the resource by using the resource scheduling node. Performing registration, and receiving an idle resource reported by the resource reporting node according to a predetermined rule (load and weight involved in each resource scheduling node); the scheduling sub-module is configured to use the resource scheduling node to perform segmentation according to the job control node The resources of the job application are scheduled.

进一步的，所述去中心化资源调度***还包括：第一通知模块，设置为所述调度子模块执行操作之前，利用所述资源调度节点通知与所述空闲资源对应的作业运算节点启动容器运行所述作业控制节点。Further, the decentralized resource scheduling system further includes: a first notification module, configured to notify the operation operation node corresponding to the idle resource to start the container operation by using the resource scheduling node before the operation of the scheduling sub-module The job control node.

为了便于得到当前资源负载情况，本发明实施例中，所述去中心化资源调度***还包括：获取模块，设置为利用资源调度概览节点实时获取资源负载情况。In order to facilitate the current resource load situation, in the embodiment of the present invention, the decentralized resource scheduling system further includes: an obtaining module, configured to use the resource scheduling overview node to obtain a resource load situation in real time.

为了方案的完整性，所述去中心化资源调度***还包括：第三处理模块，设置为所述第二处理模块执行操作之后，利用作业控制节点将调度的资源分配给所述作业中具体的任务，并通知与调度的所述资源相对应的作业运算节点启动容器运行所述任务。For the integrity of the solution, the decentralized resource scheduling system further includes: a third processing module, configured to allocate, by the job control node, the scheduled resource to the specific one of the jobs after the second processing module performs an operation The task, and notifying the job computing node corresponding to the scheduled resource to start the container to run the task.

进一步的，所述去中心化资源调度***还包括：第二通知模块，设置为所述第三处理模块执行操作之后，在所述容器中的任务执行完毕后，利用所述作业运算节点通知所述作业控制节点关闭所述容器。Further, the decentralized resource scheduling system further includes: a second notification module, configured to notify the office by using the job operation node after the task in the container is executed after the third processing module performs an operation The job control node closes the container.

更进一步的，所述去中心化资源调度***还包括：第四处理模块，设置为所述第二通知模块执行操作之后，在所述作业中的所有任务均执行完毕后，利用所述作业控制节点通知所述资源调度节点资源释放，并向所述资源调度节点申请关闭所述作业控制节点。Further, the decentralized resource scheduling system further includes: a fourth processing module, configured to use the job control after all tasks in the job are executed after the second notification module performs an operation The node notifies the resource scheduling node of resource release, and requests the resource scheduling node to close the job control node.

考虑到资源可能出现宕机的情况，所述去中心化资源调度***还包括：建立模块，设置为所述第二处理模块执行操作之后，在所述资源重启后，利用资源调度概览节点根据所述资源的限制约束重新与对应的所述资源调度节点建立关系。The decentralized resource scheduling system further includes: an establishing module, configured to: after the second processing module performs an operation, after the resource is restarted, using the resource scheduling overview node according to the The constraint constraint of the resource re-establishes a relationship with the corresponding resource scheduling node.

具体的，所述建立模块包括：启动子模块，设置为在所述资源重启后，启动所述资源的作业控制节点和资源汇报节点；通知子模块，设置为利用所述资源汇报节点通知所述资源调度概览节点所述资源可用；Specifically, the establishing module includes: a startup submodule, configured to start a job control node and a resource reporting node of the resource after the resource is restarted; and the notification submodule is configured to notify the using the resource reporting node Resource scheduling overview node said resources are available;

查找子模块，设置为利用所述资源调度概览节点依据所述资源的限制约束查找到对应的所述资源调度节点；第六处理子模块，设置为利用所述资源调度节点向所述资源的资源汇报节点进行注册，并接收所述资源的资源汇报节点根据预定规则(涉及各资源调度节点的负载、权重)汇报的空闲资源。a locating sub-module, configured to use the resource scheduling overview node to find a corresponding resource scheduling node according to a constraint constraint of the resource; and a sixth processing sub-module configured to use the resource scheduling node to allocate resources to the resource The reporting node performs registration, and receives the idle resource reported by the resource reporting node of the resource according to a predetermined rule (load and weight involved in each resource scheduling node).

其中，上述去中心化资源调度方法的所述实现实施例均适用于该去中心化资源调度*** 的实施例中，也能达到相同的技术效果。The implementation examples of the decentralized resource scheduling method are applicable to the decentralized resource scheduling system. In the embodiment, the same technical effect can also be achieved.

需要说明的是，此说明书中所描述的许多功能部件都被称为模块/子模块，以便更加特别地强调其实现方式的独立性。It should be noted that many of the functional components described in this specification are referred to as modules/sub-modules to more particularly emphasize the independence of their implementation.

本发明实施例中，模块/子模块可以用软件实现，以便由各种类型的处理器执行。举例来说，一个标识的可执行代码模块可以包括计算机指令的一个或多个物理或者逻辑块，举例来说，其可以被构建为对象、过程或函数。尽管如此，所标识模块的可执行代码无需物理地位于一起，而是可以包括存储在不同位里上的不同的指令，当这些指令逻辑上结合在一起时，其构成模块并且实现该模块的规定目的。In an embodiment of the invention, the modules/sub-modules may be implemented in software for execution by various types of processors. For example, an identified executable code module can comprise one or more physical or logical blocks of computer instructions, which can be constructed, for example, as an object, procedure, or function. Nonetheless, the executable code of the identified modules need not be physically located together, but may include different instructions stored in different bits that, when logically combined, constitute a module and implement the provisions of the module. purpose.

实际上，可执行代码模块可以是单条指令或者是许多条指令，并且甚至可以分布在多个不同的代码段上，分布在不同程序当中，以及跨越多个存储器设备分布。同样地，操作数据可以在模块内被识别，并且可以依照任何适当的形式实现并且被组织在任何适当类型的数据结构内。所述操作数据可以作为单个数据集被收集，或者可以分布在不同位置上(包括在不同存储设备上)，并且至少部分地可以仅作为电子信号存在于***或网络上。In practice, the executable code module can be a single instruction or a plurality of instructions, and can even be distributed across multiple different code segments, distributed among different programs, and distributed across multiple memory devices. As such, operational data may be identified within the modules and may be implemented in any suitable form and organized within any suitable type of data structure. The operational data may be collected as a single data set, or may be distributed at different locations (including on different storage devices), and may at least partially exist as an electronic signal on a system or network.

在模块可以利用软件实现时，考虑到现有硬件工艺的水平，所以可以以软件实现的模块，在不考虑成本的情况下，本领域技术人员都可以搭建对应的硬件电路来实现对应的功能，所述硬件电路包括常规的超大规模集成(VLSI)电路或者门阵列以及诸如逻辑芯片、晶体管之类的现有半导体或者是其它分立的元件。模块还可以用可编程硬件设备，诸如现场可编程门阵列、可编程阵列逻辑、可编程逻辑设备等实现。When the module can be implemented by software, considering the level of the existing hardware process, the module can be implemented in software, and the technician can construct a corresponding hardware circuit to implement the corresponding function without considering the cost. The hardware circuitry includes conventional Very Large Scale Integration (VLSI) circuits or gate arrays as well as existing semiconductors such as logic chips, transistors, or other discrete components. The modules can also be implemented with programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices, and the like.

以上所述的是本发明的优选实施方式，应当指出对于本技术领域的普通人员来说，在不脱离本发明所述原理前提下，还可以作出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above is a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. These improvements and retouchings should also be considered. It is the scope of protection of the present invention.

工业实用性Industrial applicability

本发明实施例的上述技术方案，可以应用于去中心化资源调度过程中，所述去中心化资源调度方法通过动态生成资源调度节点，能消除作业并发容量的限制，达到比集群联合更强的作业承载能力；并且由于资源调度节点是动态生成的，所以可以根据集群实际资源总量，数据本地性，租户特定等情况动态调整，实现动态负载均衡，性能及租户特性等需求，较好的解决了现有技术中大量作业并发的问题。 The foregoing technical solution of the embodiment of the present invention can be applied to a decentralized resource scheduling process, and the decentralized resource scheduling method can dynamically reduce the concurrent capacity of the operation by using a resource scheduling node to achieve a stronger joint than the cluster. Job load capacity; and because the resource scheduling node is dynamically generated, it can be dynamically adjusted according to the actual resource total, data locality, tenant specific, etc., to achieve dynamic load balancing, performance and tenant characteristics, etc. The problem of a large number of concurrent operations in the prior art.

Claims

一种去中心化资源调度方法，包括：A decentralized resource scheduling method includes:

利用集群接入节点接收作业；Receiving jobs using a cluster access node;

利用资源调度策略节点根据当前资源负载情况和所述作业中的作业特征和用户特征得到对应的资源调度节点信息；Using the resource scheduling policy node to obtain corresponding resource scheduling node information according to the current resource load situation and the job feature and the user feature in the job;

若不存在与所述资源调度节点信息相对应的资源调度节点，则生成所述资源调度节点，利用所述资源调度节点为所述作业调度资源。If there is no resource scheduling node corresponding to the resource scheduling node information, generate the resource scheduling node, and use the resource scheduling node to schedule resources for the job.
如权利要求1所述的去中心化资源调度方法，其中，在所述利用集群接入节点接收作业之前，所述去中心化资源调度方法还包括：The decentralized resource scheduling method according to claim 1, wherein the decentralized resource scheduling method further comprises:

配置预设数量预启动的所述资源调度节点；Configuring a preset number of pre-launched resource scheduling nodes;

在***启动时，启动配置的所述预设数量的所述资源调度节点。When the system is started, the preset number of the resource scheduling nodes configured are started.
如权利要求1所述的去中心化资源调度方法，其中，在所述利用所述资源调度节点为所述作业调度资源之后，所述去中心化资源调度方法还包括：The decentralized resource scheduling method according to claim 1, wherein the decentralized resource scheduling method further comprises: after the utilizing the resource scheduling node to schedule resources for the job,

若预设时间段内生成的所述资源调度节点没有为另一作业调度资源，则自动关闭生成的所述资源调度节点。If the resource scheduling node generated in the preset time period does not schedule resources for another job, the generated resource scheduling node is automatically closed.
如权利要求1所述的去中心化资源调度方法，其中，所述利用资源调度策略节点根据当前资源负载情况和所述作业中的作业特征和用户特征得到对应的资源调度节点信息的步骤包括：The decentralized resource scheduling method according to claim 1, wherein the step of using the resource scheduling policy node to obtain corresponding resource scheduling node information according to the current resource loading situation and the job feature and the user feature in the job comprises:

根据所述作业特征得到所述作业的数据本地性较好的资源调度节点集合；Obtaining, according to the job feature, a resource scheduling node set with better data locality of the job;

根据所述用户特征得到资源的限制约束；Obtaining a resource constraint according to the user feature;

根据预设策略，结合所述当前资源负载情况、资源调度节点集合和资源的限制约束得到对应的资源调度节点信息。According to the preset policy, the corresponding resource scheduling node information is obtained according to the current resource load situation, the resource scheduling node set, and the resource constraint constraint.
如权利要求1所述的去中心化资源调度方法，其中，所述生成所述资源调度节点的步骤包括：The decentralized resource scheduling method according to claim 1, wherein the step of generating the resource scheduling node comprises:

利用所述资源调度策略节点随机选取一个隶属于所述资源调度节点的作业运算节点，并通知该作业运算节点启动一个容器来运行所述资源调度节点。The resource scheduling policy node randomly selects a job operation node that belongs to the resource scheduling node, and notifies the job operation node to start a container to run the resource scheduling node.
如权利要求1所述的去中心化资源调度方法，其中，所述利用所述资源调度节点为所述作业调度资源的步骤包括：The decentralized resource scheduling method according to claim 1, wherein the step of using the resource scheduling node to schedule resources for the job comprises:

利用所述资源调度节点向资源汇报节点进行注册，并接收所述资源汇报节点根据预定规则汇报的空闲资源；Using the resource scheduling node to register with the resource reporting node, and receiving the idle resource reported by the resource reporting node according to a predetermined rule;

利用所述资源调度节点对作业控制节点根据切分的所述作业申请的资源进行调度。 The resource scheduling node is used to schedule the job control node according to the cut-off resource of the job application.
如权利要求6所述的去中心化资源调度方法，其中，在所述利用所述资源调度节点对作业控制节点根据切分的所述作业申请的资源进行调度之前，所述去中心化资源调度方法还包括：The decentralized resource scheduling method according to claim 6, wherein the decentralized resource scheduling is performed before the scheduling by the resource scheduling node for the job control node according to the cut-off resource of the job application The method also includes:

利用所述资源调度节点通知与所述空闲资源对应的作业运算节点启动容器运行所述作业控制节点。And using the resource scheduling node to notify the job operation node corresponding to the idle resource to start a container to run the job control node.
如权利要求1所述的去中心化资源调度方法，其中，所述去中心化资源调度方法还包括：The decentralized resource scheduling method according to claim 1, wherein the decentralized resource scheduling method further comprises:

利用资源调度概览节点实时获取资源负载情况。The resource scheduling situation is obtained in real time by using the resource scheduling overview node.
如权利要求1所述的去中心化资源调度方法，其中，在所述利用所述资源调度节点为所述作业调度资源之后，所述去中心化资源调度方法还包括：The decentralized resource scheduling method according to claim 1, wherein the decentralized resource scheduling method further comprises: after the utilizing the resource scheduling node to schedule resources for the job,

利用作业控制节点将调度的资源分配给所述作业中具体的任务，并通知与调度的所述资源相对应的作业运算节点启动容器运行所述任务。The scheduled resource is allocated to the specific task in the job by the job control node, and the job computing node corresponding to the scheduled resource is started to start the container to run the task.
如权利要求9所述的去中心化资源调度方法，其中，在所述通知与调度的所述资源相对应的作业运算节点启动容器运行所述任务之后，所述去中心化资源调度方法还包括：The decentralized resource scheduling method according to claim 9, wherein after the notification of the job operation node corresponding to the scheduled resource, the container deactivates the resource scheduling method, the decentralized resource scheduling method further includes :

在所述容器中的任务执行完毕后，利用所述作业运算节点通知所述作业控制节点关闭所述容器。After the task in the container is executed, the job operation node is notified by the job operation node to close the container.
如权利要求10所述的去中心化资源调度方法，其中，在所述利用所述作业运算节点通知所述作业控制节点关闭所述容器之后，所述去中心化资源调度方法还包括：The decentralized resource scheduling method according to claim 10, wherein after the using the job operation node to notify the job control node to close the container, the decentralized resource scheduling method further comprises:

在所述作业中的所有任务均执行完毕后，利用所述作业控制节点通知所述资源调度节点资源释放，并向所述资源调度节点申请关闭所述作业控制节点。After all the tasks in the job are executed, the job control node is used to notify the resource scheduling node of resource release, and the resource scheduling node is requested to close the job control node.
如权利要求1所述的去中心化资源调度方法，其中，在所述利用所述资源调度节点为所述作业调度资源之后，所述去中心化资源调度方法还包括：The decentralized resource scheduling method according to claim 1, wherein the decentralized resource scheduling method further comprises: after the utilizing the resource scheduling node to schedule resources for the job,

在所述资源重启后，利用资源调度概览节点根据所述资源的限制约束重新与对应的所述资源调度节点建立关系。After the resource is restarted, the resource scheduling overview node re-establishes a relationship with the corresponding resource scheduling node according to the resource constraint constraint.
如权利要求12所述的去中心化资源调度方法，其中，所述在所述资源重启后，利用资源调度概览节点根据所述资源的限制约束重新与对应的所述资源调度节点建立关系的步骤包括：The decentralized resource scheduling method according to claim 12, wherein the step of reestablishing a relationship with the corresponding resource scheduling node by using the resource scheduling overview node according to the resource constraint constraint after the resource is restarted include:

在所述资源重启后，启动所述资源的作业控制节点和资源汇报节点；After the resource is restarted, the job control node and the resource reporting node of the resource are started;

利用所述资源汇报节点通知所述资源调度概览节点所述资源可用；Notifying, by the resource reporting node, that the resource scheduling overview node is available;

利用所述资源调度概览节点依据所述资源的限制约束查找到对应的所述资源调度节点；Using the resource scheduling overview node to find the corresponding resource scheduling node according to the resource constraint constraint;

利用所述资源调度节点向所述资源的资源汇报节点进行注册，并接收所述资源的资源汇报节点根据预定规则汇报的空闲资源。Using the resource scheduling node to register with the resource reporting node of the resource, and receiving the resource of the resource The idle resource reported by the source reporting node according to the predetermined rule.
一种去中心化资源调度***，包括：A decentralized resource scheduling system comprising:

接收模块，设置为利用集群接入节点接收作业；a receiving module, configured to receive a job by using a cluster access node;

第一处理模块，设置为利用资源调度策略节点根据当前资源负载情况和所述作业中的作业特征和用户特征得到对应的资源调度节点信息；a first processing module, configured to use the resource scheduling policy node to obtain corresponding resource scheduling node information according to the current resource load situation and the job feature and the user feature in the job;

第二处理模块，设置为若不存在与所述资源调度节点信息相对应的资源调度节点，则生成所述资源调度节点，利用所述资源调度节点为所述作业调度资源。And a second processing module, configured to: if there is no resource scheduling node corresponding to the resource scheduling node information, generate the resource scheduling node, and use the resource scheduling node to schedule resources for the job.
如权利要求14所述的去中心化资源调度***，其中，所述去中心化资源调度***还包括：The decentralized resource scheduling system of claim 14, wherein the decentralized resource scheduling system further comprises:

配置模块，设置为在所述接收模块执行操作之前，配置预设数量预启动的所述资源调度节点；a configuration module, configured to configure a preset number of pre-launched resource scheduling nodes before the receiving module performs an operation;

启动模块，设置为在***启动时，启动配置的所述预设数量的所述资源调度节点。The startup module is configured to initiate the configured number of the resource scheduling nodes of the configuration when the system is started.
如权利要求14所述的去中心化资源调度***，其中，所述去中心化资源调度***还包括：The decentralized resource scheduling system of claim 14, wherein the decentralized resource scheduling system further comprises:

关闭模块，设置为在所述第二处理模块执行操作之后，若预设时间段内生成的所述资源调度节点没有为另一作业调度资源，则自动关闭生成的所述资源调度节点。The module is closed, and is configured to automatically shut down the generated resource scheduling node if the resource scheduling node generated in the preset time period does not schedule resources for another job after the second processing module performs an operation.
如权利要求14所述的去中心化资源调度***，其中，所述第一处理模块包括：The decentralized resource scheduling system of claim 14, wherein the first processing module comprises:

第一处理子模块，设置为根据所述作业特征得到所述作业的数据本地性较好的资源调度节点集合；a first processing submodule, configured to obtain, according to the job feature, a resource scheduling node set with better data locality of the job;

第二处理子模块，设置为根据所述用户特征得到资源的限制约束；a second processing submodule, configured to obtain a resource constraint according to the user feature;

第三处理子模块，设置为根据预设策略，结合所述当前资源负载情况、资源调度节点集合和资源的限制约束得到对应的资源调度节点信息。The third processing submodule is configured to obtain corresponding resource scheduling node information according to the preset policy, combining the current resource load situation, the resource scheduling node set, and the resource constraint constraint.
如权利要求14所述的去中心化资源调度***，其中，所述第二处理模块包括：The decentralized resource scheduling system of claim 14, wherein the second processing module comprises:

第四处理子模块，设置为利用所述资源调度策略节点随机选取一个隶属于所述资源调度节点的作业运算节点，并通知该作业运算节点启动一个容器来运行所述资源调度节点。The fourth processing submodule is configured to randomly select a job operation node that belongs to the resource scheduling node by using the resource scheduling policy node, and notify the job computing node to start a container to run the resource scheduling node.
如权利要求14所述的去中心化资源调度***，其中，所述第二处理模块包括：The decentralized resource scheduling system of claim 14, wherein the second processing module comprises:

第五处理子模块，设置为利用所述资源调度节点向资源汇报节点进行注册，并接收所述资源汇报节点根据预定规则汇报的空闲资源；a fifth processing submodule, configured to use the resource scheduling node to register with the resource reporting node, and receive an idle resource reported by the resource reporting node according to a predetermined rule;

调度子模块，设置为利用所述资源调度节点对作业控制节点根据切分的所述作业申请的资源进行调度。 The scheduling sub-module is configured to use the resource scheduling node to schedule, by the job control node, the resource of the job request according to the segmentation.
如权利要求19所述的去中心化资源调度***，其中，所述去中心化资源调度***还包括：The decentralized resource scheduling system of claim 19, wherein the decentralized resource scheduling system further comprises:

第一通知模块，设置为所述调度子模块执行操作之前，利用所述资源调度节点通知与所述空闲资源对应的作业运算节点启动容器运行所述作业控制节点。The first notification module is configured to notify the job operation node corresponding to the idle resource to start the container to run the job control node by using the resource scheduling node before the operation of the scheduling sub-module.
如权利要求14所述的去中心化资源调度***，其中，所述去中心化资源调度***还包括：The decentralized resource scheduling system of claim 14, wherein the decentralized resource scheduling system further comprises:

获取模块，设置为利用资源调度概览节点实时获取资源负载情况。The acquisition module is set to use the resource scheduling overview node to obtain the resource load situation in real time.
如权利要求14所述的去中心化资源调度***，其中，所述去中心化资源调度***还包括：The decentralized resource scheduling system of claim 14, wherein the decentralized resource scheduling system further comprises:

第三处理模块，设置为所述第二处理模块执行操作之后，利用作业控制节点将调度的资源分配给所述作业中具体的任务，并通知与调度的所述资源相对应的作业运算节点启动容器运行所述任务。a third processing module, configured to allocate, by the job control node, the scheduled resource to a specific task in the job after the operation is performed by the second processing module, and notify the job operation node corresponding to the scheduled resource to start The container runs the task.
如权利要求22所述的去中心化资源调度***，其中，所述去中心化资源调度***还包括：The decentralized resource scheduling system of claim 22, wherein the decentralized resource scheduling system further comprises:

第二通知模块，设置为所述第三处理模块执行操作之后，在所述容器中的任务执行完毕后，利用所述作业运算节点通知所述作业控制节点关闭所述容器。The second notification module is configured to notify the job control node to close the container by using the job operation node after the task in the container is executed after the third processing module performs an operation.
如权利要求23所述的去中心化资源调度***，其中，所述去中心化资源调度***还包括：The decentralized resource scheduling system of claim 23, wherein the decentralized resource scheduling system further comprises:

第四处理模块，设置为所述第二通知模块执行操作之后，在所述作业中的所有任务均执行完毕后，利用所述作业控制节点通知所述资源调度节点资源释放，并向所述资源调度节点申请关闭所述作业控制节点。a fourth processing module, configured to: after the second notification module performs an operation, after the all tasks in the job are executed, use the job control node to notify the resource scheduling node of resource release, and send the resource to the resource The scheduling node requests to close the job control node.
如权利要求14所述的去中心化资源调度***，其中，所述去中心化资源调度***还包括：The decentralized resource scheduling system of claim 14, wherein the decentralized resource scheduling system further comprises:

建立模块，设置为所述第二处理模块执行操作之后，在所述资源重启后，利用资源调度概览节点根据所述资源的限制约束重新与对应的所述资源调度节点建立关系。After the module is configured to perform the operation, after the resource is restarted, the resource scheduling overview node re-establishes a relationship with the corresponding resource scheduling node according to the resource constraint constraint.
如权利要求25所述的去中心化资源调度***，其中，所述建立模块包括：The decentralized resource scheduling system of claim 25, wherein the establishing module comprises:

启动子模块，设置为在所述资源重启后，启动所述资源的作业控制节点和资源汇报节点；Activating a submodule, configured to start a job control node and a resource reporting node of the resource after the resource is restarted;

通知子模块，设置为利用所述资源汇报节点通知所述资源调度概览节点所述资源可用；Notifying a sub-module, configured to notify the resource scheduling overview node that the resource is available by using the resource reporting node;

查找子模块，设置为利用所述资源调度概览节点依据所述资源的限制约束查找到对应的所述资源调度节点；Locating a sub-module, configured to use the resource scheduling overview node to find a corresponding resource scheduling node according to the resource constraint constraint;

第六处理子模块，设置为利用所述资源调度节点向所述资源的资源汇报节点进行注册，并接收所述资源的资源汇报节点根据预定规则汇报的空闲资源。 The sixth processing submodule is configured to use the resource scheduling node to register with the resource reporting node of the resource, and receive the idle resource reported by the resource reporting node of the resource according to a predetermined rule.