CN109976873B - Scheduling scheme obtaining method and scheduling method of containerized distributed computing framework - Google Patents

Scheduling scheme obtaining method and scheduling method of containerized distributed computing framework Download PDF

Info

Publication number
CN109976873B
CN109976873B CN201910137847.7A CN201910137847A CN109976873B CN 109976873 B CN109976873 B CN 109976873B CN 201910137847 A CN201910137847 A CN 201910137847A CN 109976873 B CN109976873 B CN 109976873B
Authority
CN
China
Prior art keywords
computing
containerized
frame
node
scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910137847.7A
Other languages
Chinese (zh)
Other versions
CN109976873A (en
Inventor
童薇
冯丹
刘景宁
谢乘胜
邓竣中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201910137847.7A priority Critical patent/CN109976873B/en
Publication of CN109976873A publication Critical patent/CN109976873A/en
Application granted granted Critical
Publication of CN109976873B publication Critical patent/CN109976873B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a scheduling scheme obtaining method and a scheduling method of a containerized distributed computing framework, which comprise the following steps: determining a computing node for operating each containerization component according to computing resources required to be consumed by each unscheduled containerization component in the containerization distributed computing framework to be scheduled and available computing resources of each computing node, so that the containerization core component and as many containerization computing components as possible can be scheduled to the same computing node, and thus obtaining a scheduling scheme of the containerization distributed computing framework to be scheduled; for the newly-built computing framework, scheduling each containerization component to a corresponding computing node after a scheduling scheme is obtained; if no newly built computing framework exists and the rescheduling execution conditions are met, the scheduling scheme of one or more running computing frameworks is obtained, and rescheduling is executed when the total network communication cost of all computing frameworks is reduced. The invention can effectively improve the performance of the containerized distributed computing framework.

Description

Scheduling scheme obtaining method and scheduling method of containerized distributed computing framework
Technical Field
The invention belongs to the technical field of container cluster task scheduling, and particularly relates to a scheduling scheme obtaining method and a scheduling method of a containerized distributed computing framework.
Background
The resource utilization rate of a physical machine is improved by the existing data center through a virtualization technology, and the virtualization technology provides isolation of running environments for different applications. The virtualization technology comprises a virtual machine technology and a container technology, wherein the virtual machine technology needs to virtualize an operating system operated by a whole set of clients, the container technology allows all containers operated on the same physical server to share an operating system kernel of the same host, and a user only needs to construct a minimum operating environment required by an application for a specific application. The data center administrator does not need to consider the environment in the container, on which the application runs, in management of the containerized application, the flow of managing the application is simplified, and more application running environments of the data center are migrated to the container from the virtual machine.
With the rise of big data and artificial intelligence, the number of big data processing tasks and deep learning training tasks submitted to a data center by a user becomes more, and the tasks are selected to run on a distributed computing framework; the current typical distributed computing framework includes a parallel computing model (e.g., OpenMPI), a big data processing model (e.g., Hadoop and Spark), and a deep neural network training model (e.g., tensorflow); the distributed computing frameworks are generally composed of core components and computing components, wherein the core components are responsible for receiving tasks sent to the computing frameworks by users, dividing original tasks into a plurality of subtasks, distributing the subtasks to each computing component, then collecting and processing results obtained by computing of each computing component, and returning final computing results to the users; the computing component is responsible for receiving the tasks distributed by the core component and carrying out local computing, and after the computing is finished, the computing component sends the computed result to the core component; in the process of processing tasks submitted by users by the computing framework, a large amount of data communication exists between the core components and the computing components, the data communication bandwidth is easy to become the performance bottleneck of the whole computing framework, and if the data communication bandwidth is smaller, the efficiency of the computing framework for executing the tasks is reduced; the containerized distributed computing framework containerizes all components of the computing framework and provides computing services for users, wherein communication bandwidth between the containerized core component and the containerized computing component is influenced by a scheduling strategy of the container clustering arrangement system.
Currently popular container cluster arranging systems such as Docker Swarm, kubernets and the like have a single policy for scheduling containers, do not consider the influence of data communication bandwidth between containerized components on the performance of a computing framework, and may schedule containerized components with a large amount of data communication in the computing framework to different nodes in a cluster for operation, as shown in fig. 1, which may cause the communication rate between the containerized core component and the containerized computing component of the computing framework to be limited by the network communication rate between physical nodes, so that the time for the containerized components to synchronize peer data becomes long, and further, the performance for the containerized computing framework to execute tasks is not high.
Disclosure of Invention
Aiming at the defects and improvement requirements of the prior art, the invention provides a scheduling scheme obtaining method and a scheduling method of a containerized distributed computing framework, and aims to improve the performance of the containerized distributed computing framework.
To achieve the above object, according to a first aspect of the present invention, there is provided a scheduling scheme acquiring method for a containerized distributed computing framework, including:
(1) obtaining all unscheduled containerized components in a containerized distributed computing framework to be scheduled, thereby obtaining a component set to be scheduled;
(2) according to the computing resources required to be consumed by each containerization component in the component set to be scheduled and the available computing resources of each computing node in the cluster, the computing nodes used for operating each containerization component in the component set to be scheduled are determined, so that the containerization core components and as many containerization computing components as possible can be scheduled to the same computing node, and therefore the scheduling scheme of the containerization distributed computing framework to be scheduled is obtained.
According to the invention, the containerized core components belonging to the same computing frame and as many containerized computing components as possible can be scheduled to the same computing node in the cluster for operation, so that the communication speed between the containerized core components and the containerized computing components in the computing frame can be improved, the time for the containerized components to synchronize the data of the other side is shortened, the overall time consumption for the containerized distributed computing frame to execute tasks can be reduced, and the performance of the containerized distributed computing frame is improved.
Further, the step (2) comprises:
(21) sorting containerized computing components in the component set to be scheduled according to the sequence of the computing resources required to be consumed from small to large to obtain an ordered component set;
(22) if the component set to be scheduled comprises the containerized core component, inserting the containerized core component as the first element of the ordered component set, and turning to step (23); otherwise, directly switching to the step (23);
(23) obtaining total computational resources R required to be consumed by all containerized components in the ordered component set;
(24) if the available computing resources of all the computing nodes are less than the total computing resource R, turning to the step (25); otherwise, all the computing nodes with the available computing resources larger than or equal to the total computing resources R are obtained to form a candidate node set, and the step (27) is carried out;
(25) obtaining a compute node I with the largest available computing resources, determining the first m containerized components in the ordered set of components that can be scheduled to compute node I, such that
Figure BDA0001977554160000031
NmaxIs an available computing resource of computing node I;
the containerized core components and the containerized computing components which are low in computing resource consumption are scheduled preferentially, so that the containerized core components and the containerized computing components which are as many as possible can be scheduled to the same computing node;
(26) determining the computing node I as a computing node for running m containerized components, and updating the available computing resources of the computing node I to
Figure BDA0001977554160000041
After m containerized components are removed from the ordered component set, the step (23) is carried out;
(27) obtaining a computing node I 'with the minimum available computing resource in the candidate node set, and determining the computing node I' as a computing node for running each containerized component in the ordered component set;
the computing node with the minimum available computing resource is selected, so that the possibility that containerized components belonging to the same computing frame are scheduled to the same node in the subsequent scheduling process can be improved;
wherein i is the numbering of the containerized assembly, FiThe computational resources that are consumed for the ith containerized component in the ordered set of components.
According to a second aspect of the present invention, there is provided a scheduling method for a containerized distributed computing framework, comprising:
for a containerized distributed computing frame Fr which is newly built by a user and needs to be scheduled in a cluster, a scheduling scheme S is obtained by using the scheduling scheme acquisition method of the containerized distributed computing frame provided by the first aspect of the invention;
and scheduling each containerization component of the computing frame Fr to the corresponding computing node according to the scheduling scheme S, thereby completing the scheduling of the computing frame Fr.
According to a third aspect of the present invention, there is provided a scheduling method for a containerized distributed computing framework, comprising:
(1) judging whether a containerized distributed computing frame Fr which is newly built by a user and needs to be scheduled exists in the current cluster, if so, turning to the step (6); if not, the step (2) is carried out;
(2) obtaining a cluster current timestamp tpTime stamp t of last execution rescheduling process of clusterlIf the difference value delta T is larger than T, the step (3) is carried out; otherwise, the step (1) is carried out;
(3) one or more containerized distributed computing frames currently running in the cluster are used as rescheduling objects, and the scheduling scheme of each rescheduling object is obtained again by using the scheduling scheme obtaining method of the containerized distributed computing frame provided by the first aspect of the invention;
(4) respectively calculating the total network communication expenses V and V 'of all calculation frames in the front cluster and the rear cluster which are scheduled according to a new scheduling scheme, if V' is less than V, rescheduling all or part of containerized components of each rescheduled object to the corresponding calculation nodes according to the new scheduling scheme to complete rescheduling, and turning to the step (5) after the rescheduling is completed; otherwise, the rescheduling is not carried out, and the step (1) is carried out;
when no newly-built computing frame needs to be scheduled, rescheduling the containerized distributed computing frame which is running in the cluster so as to reduce the total network communication overhead of all the computing frames in the cluster, and further improving the performance of the containerized distributed computing frame;
(5) time stamp tlIs updated to the time stamp tpAnd proceeding to step (1);
(6) the scheduling scheme S of the computing frame Fr is obtained by using the scheduling scheme obtaining method of the containerized distributed computing frame provided by the first aspect of the invention, and each containerized component of the computing frame Fr is scheduled to the corresponding computing node according to the scheduling scheme S, so that the scheduling of the computing frame Fr is completed;
scheduling the newly-built computing frame Fr according to the scheduling scheme S, and ensuring that the containerized core components of the computing frame Fr and as many containerized computing components as possible are scheduled to the same computing node, so that the communication rate among the components is improved, and the performance of the containerized distributed computing frame is improved;
(7) after the scheduling is finished, the step (1) is carried out;
wherein T is a preset time interval threshold.
Further, the step (3) comprises:
(31) obtaining a computing frame M with the largest communication overhead of the network in the cluster;
(32) taking the scale B of the calculation frame M as a threshold value, and screening out all calculation frames with scales smaller than the threshold value B in the cluster so as to obtain a calculation frame set H;
(33) taking the computing frame M and the computing frame in the computing frame set H as rescheduling objects, and when each rescheduling object is not scheduled to a computing node, calculating available resources N of each computing node in the clusterj′=Nj+FM,j+FH,jThereby obtainingRescheduling the node p with the most available computing resources before the object is scheduled;
rescheduling the computing frame with the largest network communication overhead can reduce the total network communication overhead of all the computing frames in the cluster with the largest probability; the calculation resources needed by the calculation frame with smaller scale are often smaller, and the calculation frame with smaller scale is rescheduled, so that the resource fragments can be effectively utilized;
(34) for each calculation frame H epsilon H, obtaining all containerization components which run on the node p and serve as a new calculation frame H', and obtaining a calculation frame set K consisting of all new calculation frames;
(35) sequencing the calculation frames in the calculation frame set K according to the sequence from large scale to small scale to obtain a calculation frame queue Q, and inserting the calculation frame M into the head of the calculation frame queue Q;
the larger the scale of the containerized distributed computing framework is, the larger the communication overhead is, the scheduling scheme of the computing framework with the larger scale is preferentially obtained, and the overall performance of the cluster can be effectively improved;
(36) updating the available computing resources of compute node p to Wj′=Nj+FM,j+FH,jUpdating the available computing resources of the other computing nodes to Wj′=Nj+FM,jThe scheduling scheme of each computation frame in the computation frame queue Q is sequentially obtained by using the method for obtaining the scheduling scheme of the containerized distributed computation frame provided by the first aspect of the invention;
wherein the scale of the computing frame is the number of containerized components contained in the computing frame, j is the number of the computing node, NjFor rescheduling available computing resources of the jth computing node in the cluster before rescheduling, FM,jTo calculate the total amount of computing resources consumed by the containerized component running on the jth compute node in the framework M, FH,jThe total amount of computing resources consumed to compute the containerized components of the frameset H running on the jth compute node.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
(1) according to the invention, the containerized core components belonging to the same computing frame and as many containerized computing components as possible can be scheduled to the same computing node in the cluster for operation, so that the communication speed between the containerized core components and the containerized computing components in the computing frame can be improved, the time for the containerized components to synchronize the data of the other side is shortened, the overall time consumption for the containerized distributed computing frame to execute tasks can be reduced, and the performance of the containerized distributed computing frame is improved.
(2) The scheduling scheme acquired by the invention does not depend on a specific container cluster arrangement system, so that the transportability is good.
(3) The invention can realize the dispatching of the newly-built containerized distributed computing framework, can also realize the rescheduling of the computing framework running in the cluster according to the network communication overhead among containerized components in the computing framework, can minimize the sum of the network communication overhead of a plurality of computing frameworks running in the cluster, and improves the dispatching flexibility.
(4) In the rescheduling process, after the rescheduling object is determined and a new scheduling scheme is obtained, the total network communication overhead of all computing frames in the cluster after the rescheduling is executed is smaller than that before the rescheduling process is executed, and the scheduling stability is higher.
Drawings
FIG. 1 is a diagram illustrating a scheduling result of a conventional container cluster scheduling system;
fig. 2 is a flowchart of a scheduling scheme obtaining method of a containerized distributed computing framework according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a result of scheduling a scheduling scheme obtained by the scheduling scheme obtaining method according to the embodiment of the present invention;
FIG. 4 is a flowchart of a scheduling method of a containerized distributed computing framework according to a second embodiment of the present invention;
FIG. 5 is a flowchart of a scheduling method of a containerized distributed computing framework according to a third embodiment of the present invention;
fig. 6 is a flowchart of a method for rescheduling a running computing framework in a cluster according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
In view of the problem shown in fig. 1 that when a containerized core component and a containerized computing component belonging to the same containerized distributed computing framework are scheduled to different computing nodes, communication overhead between the components is large, in a first embodiment of the present invention, a scheduling scheme obtaining method of a containerized distributed computing framework provided by the present invention is shown in fig. 2, and includes:
(1) obtaining all unscheduled containerized components in a containerized distributed computing framework to be scheduled, thereby obtaining a component set to be scheduled;
(2) determining a computing node for operating each containerized component in the component set to be scheduled according to computing resources required to be consumed by each containerized component in the component set to be scheduled and available computing resources of each computing node in the cluster, so that a containerized core component and as many containerized computing components as possible can be scheduled to the same computing node, and thus a scheduling scheme of a containerized distributed computing framework to be scheduled is obtained;
in an optional embodiment, step (2) specifically includes:
(21) sorting containerized computing components in the component set to be scheduled according to the sequence of the computing resources required to be consumed from small to large to obtain an ordered component set;
(22) if the component set to be scheduled comprises the containerized core component, inserting the containerized core component as the first element of the ordered component set, and turning to step (23); otherwise, directly switching to the step (23);
(23) obtaining a total computational resource consumed by all containerized components in an ordered set of components
Figure BDA0001977554160000081
Wherein i is the numbering of the containerized assembly, FiI is more than or equal to 1 and less than or equal to n, and n is the number of containerized components contained in the ordered component set;
(24) if the available computing resources of all the computing nodes are less than the total computing resource R, turning to the step (25); otherwise, all the computing nodes with the available computing resources larger than or equal to the total computing resources R are obtained to form a candidate node set, and the step (27) is carried out;
(25) obtaining a compute node I with the largest available computing resources, determining the first m containerized components in the ordered set of components that can be scheduled to compute node I, such that
Figure BDA0001977554160000091
NmaxIs an available computing resource of computing node I;
the containerized core components and the containerized computing components which are low in computing resource consumption are scheduled preferentially, so that the containerized core components and the containerized computing components which are as many as possible can be scheduled to the same computing node;
(26) determining the computing node I as a computing node for running m containerized components, and updating the available computing resources of the computing node I to
Figure BDA0001977554160000092
After m containerized components are removed from the ordered component set, the step (23) is carried out;
(27) obtaining a computing node I 'with the minimum available computing resource in the candidate node set, and determining the computing node I' as a computing node for running each containerized component in the ordered component set;
the computing node with the minimum available computing resource is selected, so that the possibility that containerized components belonging to the same computing framework are scheduled to the same node in the subsequent scheduling process can be improved.
According to the invention, the containerized core components belonging to the same computing frame and as many containerized computing components as possible can be scheduled to the same computing node in the cluster to run, and the result of scheduling the computing frame according to the scheduling scheme obtained by the scheduling scheme obtaining method of the containerized distributed computing frame provided by the embodiment of the invention is shown in fig. 3, so that the communication rate between the containerized core components and the containerized computing components in the computing frame can be improved, the time for the containerized components to synchronize the data of the other side is shortened, the overall time consumption for the containerized distributed computing frame to execute tasks can be reduced, and the performance of the containerized distributed computing frame can be improved.
In a second embodiment of the present invention, the present invention further provides a scheduling method of a containerized distributed computing framework, as shown in fig. 4, including:
for a containerized distributed computing frame Fr which is newly built by a user and needs to be scheduled in a cluster, a scheduling scheme S of the containerized distributed computing frame is obtained by using a scheduling scheme acquisition method of the containerized distributed computing frame;
and scheduling each containerization component of the computing frame Fr to the corresponding computing node according to the scheduling scheme S, thereby completing the scheduling of the computing frame Fr.
By the scheduling method, the containerized core component of the containerized distributed computing framework newly built by a user can be scheduled to the same computing node with as many containerized computing components as possible to operate, so that the communication overhead among the components is reduced, and the performance of the computing framework is improved.
In a third embodiment of the present invention, the present invention further provides a scheduling method of a containerized distributed computing framework, as shown in fig. 5, including:
(1) judging whether a containerized distributed computing frame Fr which is newly built by a user and needs to be scheduled exists in the current cluster, if so, turning to the step (6); if not, the step (2) is carried out;
(2) judging the current timestamp t of the clusterpTime stamp t of last execution rescheduling process of clusterlIf the difference value delta T is larger than T, the step (3) is carried out; otherwise, the step (1) is carried out;
wherein T is a preset time interval threshold; the value of the time interval threshold T can be reasonably set according to the actual cluster environment and application characteristics, so that the situation that the rescheduling process is not started after the task is finished due to overlarge setting or the calculation cost is overlarge due to the overlarge setting is avoided; in an embodiment, T is set to 10 seconds;
(3) one or more containerized distributed computing frames which are currently running in the cluster are used as rescheduling objects, and the scheduling scheme of each rescheduling object is obtained again by using the scheduling scheme obtaining method of the containerized distributed computing frame;
in an alternative embodiment, as shown in fig. 6, step (3) specifically includes:
(31) obtaining a computing frame M with the largest communication overhead of the network in the cluster;
(32) taking the scale B of the calculation frame M as a threshold value, and screening out all calculation frames with scales smaller than the threshold value B in the cluster so as to obtain a calculation frame set H;
(33) taking the computing frame M and the computing frame in the computing frame set H as rescheduling objects, and when each rescheduling object is not scheduled to a computing node, calculating available resources N of each computing node in the clusterj′=Nj+FM,j+FH,jThus obtaining the node p with the most available computing resources before the rescheduled object is scheduled;
rescheduling the computing frame with the largest network communication overhead can reduce the total network communication overhead of all the computing frames in the cluster with the largest probability; the calculation resources needed by the calculation frame with smaller scale are often smaller, and the calculation frame with smaller scale is rescheduled, so that the resource fragments can be effectively utilized;
(34) for each calculation frame H epsilon H, obtaining all containerization components which run on the node p and serve as a new calculation frame H', and obtaining a calculation frame set K consisting of all new calculation frames;
(35) sequencing the calculation frames in the calculation frame set K according to the sequence from large scale to small scale to obtain a calculation frame queue Q, and inserting the calculation frame M into the head of the calculation frame queue Q;
the larger the scale of the containerized distributed computing framework is, the larger the communication overhead is, the scheduling scheme of the computing framework with the larger scale is preferentially obtained, and the overall performance of the cluster can be effectively improved;
(36) updating the available computing resources of compute node p to Wj′=Nj+FM,j+FH,jUpdating the available computing resources of the other computing nodes to Wj′=Nj+FM,jThe scheduling scheme of each calculation frame in the calculation frame queue Q is sequentially obtained by using the scheduling scheme acquisition method of the containerized distributed calculation frame;
wherein the scale of the computing frame is the number of containerized components contained in the computing frame, j is the number of the computing node, NjFor rescheduling available computing resources of the jth computing node in the cluster before rescheduling, FM,jTo calculate the total amount of computing resources consumed by the containerized component running on the jth compute node in the framework M, FH,jThe total amount of computing resources consumed for computing containerized components in the frameset H running at the jth computing node;
(4) respectively calculating the total network communication expenses V and V 'of all calculation frames in the front cluster and the rear cluster which are scheduled according to a new scheduling scheme, if V' is less than V, rescheduling all or part of containerized components of each rescheduled object to the corresponding calculation nodes according to the new scheduling scheme to complete rescheduling, and turning to the step (5) after the rescheduling is completed; otherwise, the rescheduling is not carried out, and the step (1) is carried out;
when the core component and the computing component of the computing framework are scheduled to different nodes to run, network communication overhead exists between the containerization components; for the ith containerized distributed computing framework, the network communication overhead between the containerized core component and the ith containerized component is as follows:
Figure BDA0001977554160000121
in the formula, klA constant k for representing that the network communication overhead between the computing components and the core components and the resources required to be consumed by the computing components are in a positive correlation relationship, and different types of computing frameworkslThe values of (A) are different; giThe computational resources required to be consumed for the ith containerized component, GiThe larger the network communication overhead, the more resources the containerized computing component needs to consume within the computing framework, the more data traffic it needs to communicate with the containerized core component, and the more network communication overhead between the containerized component and the core component;
before rescheduling, the total network communication overhead of all computing frames in the cluster is as follows:
Figure BDA0001977554160000122
Figure BDA0001977554160000123
where l is the number of the running containerized distributed computing framework in the cluster, ClRepresenting the network communication overhead of the ith containerized distributed computing framework, wherein r represents the total number of the running containerized computing frameworks in the cluster;
after rescheduling, the calculation method of the total network communication overhead V' of all the calculation frames in the cluster is similar to the calculation method of the network communication overhead V, and is not described herein again;
when no newly-built computing frame needs to be scheduled, rescheduling the containerized distributed computing frame which is running in the cluster so as to reduce the total network communication overhead of all the computing frames in the cluster, and further improving the performance of the containerized distributed computing frame;
(5) time stamp tlIs updated to the time stamp tpAnd proceeding to step (1);
(6) the scheduling scheme S of the computing frame Fr is obtained by the scheduling scheme obtaining method of the containerized distributed computing frame, and each containerized component of the computing frame Fr is scheduled to the corresponding computing node according to the scheduling scheme S, so that the scheduling of the computing frame Fr is completed;
scheduling the newly-built computing frame Fr according to the scheduling scheme S, and ensuring that the containerized core components of the computing frame Fr and as many containerized computing components as possible are scheduled to the same computing node, so that the communication rate among the components is improved, and the performance of the containerized distributed computing frame is improved;
(7) and (4) after the scheduling is finished, switching to the step (1).
According to the method, the communication overhead between the containerized core components and the containerized computing components belonging to the same containerized distributed computing frame is fully considered, the containerized core components and the containerized computing components as many as possible are dispatched to the same computing node, and the communication overhead between the containerized components in the computing frame can be effectively reduced, so that the overall time consumption for the containerized distributed computing frame to execute tasks is reduced, and the performance of the containerized distributed computing frame is improved.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (4)

1. A scheduling scheme acquisition method for a containerized distributed computing framework is characterized by comprising the following steps:
(1) obtaining all unscheduled containerized components in a containerized distributed computing framework to be scheduled, thereby obtaining a component set to be scheduled;
(2) determining a computing node for operating each containerization component in the component set to be scheduled according to computing resources required to be consumed by each containerization component in the component set to be scheduled and available computing resources of each computing node in a cluster, so that a containerization core component and as many containerization computing components as possible can be scheduled to the same computing node, and thus a scheduling scheme of the containerization distributed computing framework to be scheduled is obtained;
the step (2) comprises the following steps:
(21) sequencing the containerized computing components in the component set to be scheduled according to the sequence of the computing resources to be consumed from small to large to obtain an ordered component set;
(22) if the component set to be scheduled comprises a containerized core component, inserting the containerized core component as the first element of the ordered component set, and proceeding to step (23); otherwise, directly switching to the step (23);
(23) obtaining a total computational resource R required to be consumed by all containerized components in the ordered component set;
(24) if the available computing resources of all the computing nodes are smaller than the total computing resource R, turning to the step (25); otherwise, obtaining all the computing nodes with available computing resources greater than or equal to the total computing resources R to form a candidate node set, and proceeding to step (27);
(25) obtaining a computing node I with the largest available computing resource, and determining the first m containerized components in the ordered component set that can be dispatched to the computing node I, so that
Figure FDA0002739171360000011
NmaxIs an available computing resource of the computing node I;
(26) determining the computing node I as a computing node for running the m containerized components, and updating the available computing resources of the computing node I to
Figure FDA0002739171360000021
After the m containerized components are removed from the ordered component set, the step (23) is carried out;
(27) obtaining a computing node I 'with the minimum available computing resource in the candidate node set, and determining the computing node I' as a computing node for running each containerized component in the ordered component set;
wherein i is the numbering of the containerized assembly, FiThe computational resources that are consumed for the ith containerized component in the ordered set of components.
2. A scheduling method for a containerized distributed computing framework, comprising:
for a containerized distributed computing frame Fr which is newly built by a user and needs to be scheduled in a cluster, obtaining a scheduling scheme S of the containerized distributed computing frame Fr by using the scheduling scheme obtaining method of the containerized distributed computing frame of claim 1;
and scheduling each containerized component of the computing frame Fr to a corresponding computing node according to the scheduling scheme S, thereby completing the scheduling of the computing frame Fr.
3. A scheduling method for a containerized distributed computing framework, comprising:
(1) judging whether a containerized distributed computing frame Fr which is newly built by a user and needs to be scheduled exists in the current cluster, if so, turning to the step (6); if not, the step (2) is carried out;
(2) obtaining a cluster current timestamp tpTime stamp t of last execution rescheduling process of clusterlIf the difference value delta T is larger than T, the step (3) is carried out; otherwise, the step (1) is carried out;
(3) taking one or more containerized distributed computing frames currently running in a cluster as rescheduling objects, and obtaining the scheduling scheme of each rescheduling object again by using the scheduling scheme obtaining method of the containerized distributed computing frame in claim 1;
(4) respectively calculating the total network communication expenses V and V 'of all calculation frames in the cluster before and after scheduling according to a new scheduling scheme, if V' is less than V, rescheduling all or part of containerized components of each rescheduled object to the corresponding calculation nodes according to the new scheduling scheme to complete rescheduling, and turning to the step (5) after the rescheduling is completed; otherwise, the rescheduling is not carried out, and the step (1) is carried out;
(5) the time stamp tlIs updated to said time stamp tpAnd proceeding to step (1);
(6) the scheduling scheme obtaining method of the containerized distributed computing frame of claim 1 is utilized to obtain the scheduling scheme S of the computing frame Fr, and each containerized component of the computing frame Fr is scheduled to a corresponding computing node according to the scheduling scheme S, so that the scheduling of the computing frame Fr is completed;
(7) after the scheduling is finished, the step (1) is carried out;
wherein T is a preset time interval threshold.
4. The method for scheduling a containerized distributed computing framework of claim 3 wherein said step (3) comprises:
(31) obtaining a computing frame M with the largest communication overhead of the network in the cluster;
(32) taking the scale B of the calculation frame M as a threshold value, and screening out all calculation frames with the scale smaller than the threshold value B in the cluster, thereby obtaining a calculation frame set H;
(33) taking the computing frame M and the computing frames in the computing frame set H as rescheduling objects, and when the rescheduling objects are not scheduled to the computing nodes, calculating available resources N of the computing nodes in the clusterj′=Nj+FM,j+FH,jThus obtaining the node p with the most available computing resources before the rescheduled object is scheduled;
(34) for each calculation frame H epsilon H, obtaining all containerization components which run on the node p and serve as a new calculation frame H', and obtaining a calculation frame set K consisting of all new calculation frames;
(35) sequencing the calculation frames in the calculation frame set K according to the sequence from large scale to small scale to obtain a calculation frame queue Q, and inserting the calculation frame M into the head of the calculation frame queue Q;
(36) updating the available computing resources of the node p to Wj′=Nj+FM,j+FH,jUpdating the available computing resources of the other computing nodes to Wj′=Nj+FM,jSequentially obtaining the scheduling schemes of the computing frames in the computing frame queue Q by using the method for obtaining the scheduling schemes of the containerized distributed computing frames according to claim 1;
wherein the scale of the computing frame is the number of containerized components contained in the computing frame, j is the number of the computing node, NjFor available computing resources of the jth computing node in the cluster before rescheduling, FM,jThe total amount of computing resources required to be consumed by the containerized component running at the jth computing node in the computing framework M, FH,jThe total amount of computing resources consumed by the containerization component running on the jth computing node in the computing frameset H.
CN201910137847.7A 2019-02-25 2019-02-25 Scheduling scheme obtaining method and scheduling method of containerized distributed computing framework Active CN109976873B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910137847.7A CN109976873B (en) 2019-02-25 2019-02-25 Scheduling scheme obtaining method and scheduling method of containerized distributed computing framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910137847.7A CN109976873B (en) 2019-02-25 2019-02-25 Scheduling scheme obtaining method and scheduling method of containerized distributed computing framework

Publications (2)

Publication Number Publication Date
CN109976873A CN109976873A (en) 2019-07-05
CN109976873B true CN109976873B (en) 2020-12-18

Family

ID=67077367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910137847.7A Active CN109976873B (en) 2019-02-25 2019-02-25 Scheduling scheme obtaining method and scheduling method of containerized distributed computing framework

Country Status (1)

Country Link
CN (1) CN109976873B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110636120B (en) * 2019-09-09 2022-02-08 广西东信易联科技有限公司 Distributed resource coordination system and method based on service request
CN110764887A (en) * 2019-09-10 2020-02-07 浙江大华技术股份有限公司 Task rescheduling method and system, and related equipment and device
CN110704135B (en) * 2019-09-26 2020-12-08 北京智能工场科技有限公司 Competition data processing system and method based on virtual environment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103414767A (en) * 2013-07-30 2013-11-27 华南师范大学 Method and device for deploying application software on cloud computing platform
CN105786619A (en) * 2016-02-24 2016-07-20 中国联合网络通信集团有限公司 Virtual machine distribution method and device
CN109039686A (en) * 2017-06-12 2018-12-18 中兴通讯股份有限公司 A kind of method and device of mix of traffic layout

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10346189B2 (en) * 2016-12-05 2019-07-09 Red Hat, Inc. Co-locating containers based on source to improve compute density

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103414767A (en) * 2013-07-30 2013-11-27 华南师范大学 Method and device for deploying application software on cloud computing platform
CN105786619A (en) * 2016-02-24 2016-07-20 中国联合网络通信集团有限公司 Virtual machine distribution method and device
CN109039686A (en) * 2017-06-12 2018-12-18 中兴通讯股份有限公司 A kind of method and device of mix of traffic layout

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DLTAP:A Network-efficient Scheduling Method for Distributed Deep Learning Workload in Containerized Cluster Environment;Wei Qiao,Ying Li,Zhong-Hai Wu;《ITM Web of Conferences,EDP Sciences》;20170131;1-5 *
Hadoop YARN大数据计算框架及其资源调度机制研究;董春涛,李文婷,沈晴霓,吴中海;《信息通信技术》;20150215;77-84 *

Also Published As

Publication number Publication date
CN109976873A (en) 2019-07-05

Similar Documents

Publication Publication Date Title
Liu et al. Adaptive asynchronous federated learning in resource-constrained edge computing
CN109324875B (en) Data center server power consumption management and optimization method based on reinforcement learning
WO2020181896A1 (en) Multi-agent reinforcement learning scheduling method and system and electronic device
CN104714852B (en) A kind of parameter synchronization optimization method and its system suitable for distributed machines study
CN109976873B (en) Scheduling scheme obtaining method and scheduling method of containerized distributed computing framework
Liu et al. Resource preprocessing and optimal task scheduling in cloud computing environments
CN103401939A (en) Load balancing method adopting mixing scheduling strategy
CN107168770B (en) Low-energy-consumption cloud data center workflow scheduling and resource supply method
CN111381950A (en) Task scheduling method and system based on multiple copies for edge computing environment
CN108564164A (en) A kind of parallelization deep learning method based on SPARK platforms
US20240111586A1 (en) Multi-policy intelligent scheduling method and apparatus oriented to heterogeneous computing power
CN110059829A (en) A kind of asynchronous parameters server efficient parallel framework and method
CN111913800B (en) Resource allocation method for optimizing cost of micro-service in cloud based on L-ACO
CN114610474A (en) Multi-strategy job scheduling method and system in heterogeneous supercomputing environment
CN115033359A (en) Internet of things agent multi-task scheduling method and system based on time delay control
CN115134371A (en) Scheduling method, system, equipment and medium containing edge network computing resources
CN111309472A (en) Online virtual resource allocation method based on virtual machine pre-deployment
CN109117254A (en) A kind of dispatching method and system of deep learning frame
Iverson et al. Hierarchical, competitive scheduling of multiple dags in a dynamic heterogeneous environment
CN112698637B (en) Cooperative resource scheduling method for multi-task bee colony
CN117493020A (en) Method for realizing computing resource scheduling of data grid
CN112685162A (en) High-efficiency scheduling method, system and medium for heterogeneous computing resources of edge server
CN113190342A (en) Method and system architecture for multi-application fine-grained unloading of cloud-edge cooperative network
CN115879543B (en) Model training method, device, equipment, medium and system
CN116166396A (en) Training method and device of scheduling model, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant