CN117331668A - Job scheduling method, device, equipment and storage medium - Google Patents

Job scheduling method, device, equipment and storage medium Download PDF

Info

Publication number
CN117331668A
CN117331668A CN202311385991.5A CN202311385991A CN117331668A CN 117331668 A CN117331668 A CN 117331668A CN 202311385991 A CN202311385991 A CN 202311385991A CN 117331668 A CN117331668 A CN 117331668A
Authority
CN
China
Prior art keywords
job
virtual machine
target
cluster
state information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311385991.5A
Other languages
Chinese (zh)
Inventor
何玉林
莫沛恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Shenzhen
Original Assignee
Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Shenzhen filed Critical Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Shenzhen
Priority to CN202311385991.5A priority Critical patent/CN117331668A/en
Publication of CN117331668A publication Critical patent/CN117331668A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45591Monitoring or debugging support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/505Clust
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a job scheduling method, a device, equipment and a storage medium, comprising the following steps: acquiring state information of a cluster environment; screening out target actions from the action space by adopting a specified algorithm according to the state information; determining a target virtual machine according to the target action, and distributing the current operation to the target virtual machine for scheduling operation; and obtaining rewards aiming at the target actions, and adjusting the designated algorithm according to the rewards. The method comprises the steps of screening out target actions according to the acquired state information of the cluster environment, determining a target virtual machine for scheduling and running current jobs according to the target actions, accurately scheduling the jobs according to the acquired optimal target actions because the state information is acquired at the latest moment, and adjusting a designated algorithm according to rewards acquired by the target actions, so that the designated algorithm is optimized according to service instructions for service job scheduling, and the accuracy of overall job scheduling is guaranteed.

Description

Job scheduling method, device, equipment and storage medium
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a job scheduling method, apparatus, device, and storage medium.
Background
Spark is a general and rapid large-scale data processing system, and currently mainstream Spark job scheduling methods are mainly divided into two types, namely Spark job scheduling based on heuristic and Spark job scheduling based on deep reinforcement learning (Deep Reinforcement Learning, DRL).
However, current heuristics-based models rely heavily on past data, which is sometimes outdated due to various changes in the cluster environment, and it is also difficult to adjust or modify the heuristics-based methods to incorporate workload and cluster changes. While DRL based methods do not pay good attention to QoS requirements. Moreover, the real Spark cluster environment is usually complex, algorithms used by the methods are old, modeling of the complex Spark cluster environment is difficult, and the complex Spark cluster environment cannot be well adapted to the changed Spark cluster environment.
Disclosure of Invention
The invention provides a job scheduling method, a device, equipment and a storage medium, which are used for realizing accurate scheduling of jobs.
According to an aspect of the present invention, there is provided a job scheduling method including: acquiring state information of a cluster environment, wherein the state information comprises resource information of each virtual machine in the cluster environment and associated information of current operation;
screening out target actions from an action space by adopting a specified algorithm according to the state information, wherein the action space comprises waiting actions and executing actions of each virtual machine;
determining a target virtual machine according to the target action, and distributing the current job to the target virtual machine for scheduling operation;
and obtaining rewards aiming at the target actions, and adjusting the designated algorithm according to the rewards.
According to another aspect of the present invention, there is provided a job scheduling apparatus including:
the system comprises a state information acquisition module, a state information processing module and a state information processing module, wherein the state information acquisition module is used for acquiring state information of a cluster environment, and the state information comprises resource information of each virtual machine in the cluster environment and associated information of current operation;
the target action screening module is used for screening target actions from an action space by adopting a specified algorithm according to the state information, wherein the action space comprises waiting actions and actions executed by each virtual machine;
the job scheduling operation module is used for determining a target virtual machine according to the target action and distributing the current job to the target virtual machine for scheduling operation;
and the reward acquisition module is used for acquiring the reward aiming at the target action and adjusting the designated algorithm according to the reward.
According to another aspect of the present invention, there is provided a computer apparatus, characterized in that the apparatus comprises:
one or more processors;
storage means for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the methods described in any of the embodiments of the invention.
According to another aspect of the invention, there is provided a storage medium having stored thereon computer program which when executed by a processor implements a method according to any of the embodiments of the invention.
According to the technical scheme, the target action is screened out through the acquired state information of the cluster environment, the target virtual machine for dispatching and running the current operation is determined according to the target action, and because the state information is acquired at the latest moment, the operation is accurately dispatched according to the acquired optimal target action, and the designated algorithm is adjusted according to rewards acquired by the target action, so that the designated algorithm is optimized according to the service instruction for service operation dispatching, and the accuracy of overall operation dispatching is ensured.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a job scheduling method according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of an application framework for job scheduling according to a first embodiment of the present invention;
FIG. 3 is a flow chart of a job scheduling method according to a second embodiment of the present invention;
fig. 4 is a schematic structural diagram of a job scheduling device according to a third embodiment of the present invention;
fig. 5 is a block diagram of an electronic device according to a fourth embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, "comprises," "comprising," and "having" and any variations thereof are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or device.
Example 1
Fig. 1 is a flowchart of a job scheduling method according to an embodiment of the present invention, where the method may be performed by a job scheduling device. As shown in fig. 1, the method includes:
step S101, acquiring state information of a cluster environment.
Optionally, before acquiring the state information of the cluster environment, the method further includes: receiving a job submitted by a user through a cluster, wherein the job is configured with resource requirements including a job identifier, a required number of CPUs, a required memory size, an expected use cost, a latest completion time and a required number of execution units; and storing the jobs submitted by the user in a distributed mode in each virtual machine, wherein a cluster comprises a master virtual machine and a plurality of slave virtual machines, wherein the master virtual machine is configured with a deep reinforcement learning agent unit.
Specifically, as shown in fig. 2, which is a schematic diagram of an application framework of job scheduling provided in the present application, in this embodiment, a cluster includes a master virtual machine configured with a deep reinforcement learning Agent unit DRL Agent and a plurality of slave virtual machines, so that the application framework mainly includes a DRL Agent and a cluster environment, the Agent of the DRL places an execution unit Executor of a submitted job by observing the state of the environment, and the environment feeds back the state of the next time step and rewards of the current time step to the Agent after the placement. In a cluster, a user may submit one or more jobs, and the user configures a resource requirement for the submitted job, where the resource requirement includes a job identifier, a required number of CPUs, a required memory size, an expected use cost, a latest completion time, and a required number of execution units, which are of course only illustrative and not limiting. Therefore, after the job submitted by the user is received through the cluster, the job submitted by the user is distributed and stored in each virtual machine. The cluster environment of the present embodiment is mainly composed of jobs submitted by the respective slave virtual machines and users. The DRL Agent located in the host virtual machine is mainly used as a scheduler, and is responsible for scheduling the jobs in the cluster. The operation is performed by observing the state of the environment. Rewards and next observable states are obtained from the environment based on the operation. In this context, an Agent observes the status of each slave virtual machine in the Spark cluster environment, as well as the information of arriving jobs, to schedule jobs and make better decisions on the environment's feedback.
Optionally, acquiring status information of the cluster environment includes: determining a current job to be executed from jobs submitted by a user, and determining the number of CPUs (central processing units) required by each execution unit and the memory size required by each execution unit of the current job; taking a job identifier of the current job, the number of CPUs required by each execution unit, the memory size required by each execution unit, the number of the required execution units, the expected use cost and the latest completion time as associated information of the current job; taking the number of CPUs configured by each slave virtual machine and the size of the configured memory in the cluster as the resource information of each virtual machine; and combining the resource information of each virtual machine and the associated information of the current operation in a one-dimensional vector form to generate the state information of the cluster environment, and transmitting the state information of the cluster environment to the main virtual machine.
The state information comprises resource information of each virtual machine in the cluster environment and associated information of the current operation. Since the resource requirement is configured in the job submitted by the user, in this embodiment, the current job to be executed is determined from the job submitted by the user, for example, the first time scheduling is performed, the job w1 submitted by the user is used as the current job, the number of CPUs required by each execution unit of the current job and the memory size required by each execution unit are determined according to the resource requirement configured by the current job, for example, the job identifier jobid of the current job is 1, the number of required execution units enum is 2, the number of required CPUs is 10, the required memory size is 2G, the expected use cost is 5, and when the latest completion time jobdl is 10, the number of CPUs ecpu required by each execution unit is 5, and the memory size emem required by each execution unit is 1G, so that the obtained jobid, ecpu, emem, enum, jobtar and jobdl are used as the related information of the current job. In addition, when the number of slave virtual machines in the cluster is determined to be N, the number of CPUs and the size of the memory configured by each slave virtual machine are also obtained as resource information of each virtual machine, for example, the number of CPUs configured by the slave virtual machine 1 is vm1CPU, and the size of the memory configured by the slave virtual machine 1 is vm1mem. And combining the obtained resource information of each virtual machine and the associated information of the current operation in a one-dimensional vector form to generate state information of a cluster environment, for example, the state information of the cluster environment is: [ vm1cpu, vm1mem, …, vmNcpu, vmNmem, jobid, ecpu, emem, enum, jobtar, jobdl ], and transmits the acquired status information of the cluster environment to the host virtual machine. Of course, the present embodiment is merely illustrative, and the specific content and format of the status information of the cluster environment are not limited, and all embodiments are within the scope of the present application as long as the cluster environment can be accurately described.
Step S102, a specified algorithm is adopted to screen out target actions from the action space according to the state information.
Optionally, selecting the target action from the action space by adopting a specified algorithm according to the state information includes: determining a type of the specified algorithm, wherein the type comprises a Q-Learning algorithm or a strategy gradient-based algorithm; traversing in an action space by adopting a designated algorithm through a deep reinforcement learning agent unit in the main virtual machine; and taking the action selected by the specified algorithm as a target action.
The Action space includes a waiting Action and each virtual machine executing actions, for example, action 0 indicates that an Agent does not immediately create an execution unit for an arriving job, but waits for some specific virtual machines to be idle and then performs scheduling, and Action 1-N indicates that an Agent selects one of the current virtual machines to create an execution unit for the job. The type of the specified algorithm is specifically determined when the target action is screened, and the specified algorithm adopted by the Agent in the embodiment when the target action is screened can specifically comprise a Q-Learning algorithm such as a algorithm DQN of a strategy gradient, such as PPO+GAE, and the like. These two algorithms are chosen because they are applicable to DRL environments with discrete states and action spaces. In addition, the two algorithms work differently, wherein the global DQN optimizes state action values, while PPO directly updates policies, and similar but different results can be obtained from different work. From the Spark job scheduling process, the deep reinforcement learning environment will provide job specification information similar to the tracking run for actual workload, and virtual machine resource availability will also be used and updated as part of the state space. Since the specific working principles of the two algorithms are not important in the present application, a detailed description is omitted in this embodiment, and one algorithm may be selected from the two algorithms according to the actual scheduling requirement as the specified algorithm adopted for performing the subsequent job scheduling. In this embodiment, the DRL Agent in the host virtual machine traverses the Action space by using the specified algorithm, and uses the Action selected by the specified algorithm as the target Action, for example, action 1 as the target Action, which is, of course, merely illustrative, and not limiting the specific content of the selected target Action and the specific principle of the selection, and if the target Action according to the cluster environment state information can be selected, it is within the scope of protection of the present application, and the present embodiment is not limited thereto.
And step S103, determining a target virtual machine according to the target action, and distributing the current job to the target virtual machine for scheduling operation.
Optionally, determining the target virtual machine, and distributing the current job to the target virtual machine for scheduling operation, including: determining a virtual machine corresponding to the target action, and taking the corresponding virtual machine as a target virtual machine; creating execution units on the target virtual machine according to the number of the required execution units; and distributing the current job to an execution unit created on the target virtual machine, and performing scheduling operation on the current job through the execution unit.
Specifically, in this embodiment, the target virtual machine is determined according to the target Action, for example, when the target Action is determined to be Action 1, the target virtual machine is determined to be the slave virtual machine 1, when the current job is determined to be the job w1 submitted by the user for the first time, the execution unit Executor is created on the slave virtual machine 1 according to the number 1 of execution units required by the job w1, the current job w1 is distributed to the execution units created on the slave virtual machine 1, and the current job w1 is scheduled to run by the execution units created and placed on the slave virtual machine 1.
It should be noted that, when the number of execution units is plural, a filtering is performed again by the DRL Agent for each creation of the execution units, so as to redetermine the target virtual machine, for example, when the number of execution units required for the current job w1 is 2, if the DRL Agent determines that the target virtual machine is the slave virtual machine 1 through the first filtering, after the current job w1 runs on the chest virtual machine 1, state information in the cluster environment is changed and perceived by the DRL Agent, at this time, the DRL Agent re-filters a new target action from the action space by adopting a specified algorithm based on the perceived new state information, determines a new target virtual machine according to the target action, for example, the slave virtual machine 2, creates a second execution unit Executor on the slave virtual machine 2, and distributes the current job w1 to the execution units created on the slave virtual machine 2, and continues to schedule the current job w1 through the execution units created and placed on the slave virtual machine 2. Therefore, the present embodiment relates to a case where a plurality of slave virtual machines execute the same job, and of course, the present embodiment is merely illustrative, and the number of target virtual machines corresponding to each job is not limited, and if the execution of the current job can be scheduled, the present invention is not limited in the scope of protection of the present application.
Step S104, obtaining rewards aiming at the target action, and adjusting the designated algorithm according to the rewards.
Optionally, acquiring a constant reward when it is determined that the current job is not the last job submitted by the user, and taking the constant reward as the reward of the target action; when the current job is the last job submitted by the user, acquiring interaction parameters of the cluster, and calculating according to the interaction parameters to acquire rewards of target actions, wherein the interaction parameters comprise total use cost of the cluster, maximum use cost of the cluster, minimum use cost of the cluster, total average response time of the job, shortest average response time of the job and maximum average response time of the job.
It should be noted that, in this embodiment, a complete interaction between an Agent and an environment may be referred to as an epoode, which indicates that a first job arrives at a Spark cluster to begin scheduling, and ends after all jobs have been scheduled, and includes rewarding information acquired by the Agent from this process. In addition, episode may terminate prematurely when an Agent performs a negative operation. Only when a job is submitted, the environment will send status information to the Agent. After the Agent receives the environment information, executing the placing action of the Executor or waiting for the placing action, and after the Agent makes the action, the environment sends the updated state to the Agent.
Wherein, once an Agent takes action, the Agent gets rewards immediately, and the Agent optimizes its policy according to the rewards. The Agent receives a positive or negative reward after performing the operation. Positive rewards will cause agents to take good action and optimize the jackpot in the overall epoode. Conversely, negative rewards allow agents to avoid adverse behavior. In an Epinode, to obtain a maximized jackpot, an Agent must consider both the current and future jackpots. The current rewards are rewards that the environment gives to an Agent when the Agent has placed an Executor successfully or failed. When an Agent successfully places execution Executor, it will be given a small positive prize. When an Agent fails to execute an Executor, a large amount of negative rewards are given to the Agent and the event is ended. In addition, in order to avoid that the Agent always performs a waiting operation to avoid failure, a small negative prize is given to the Agent after the Agent performs the waiting operation. When an Epinode ends successfully, the environment rewards the Agent with an Epinode.
In this embodiment, the target action is screened out according to the acquired state information of the cluster environment, and the target virtual machine for scheduling the current job is determined according to the target action, so that the job is accurately scheduled according to the acquired optimal target action, and the designated algorithm is adjusted according to the reward acquired by the target action, so that the designated algorithm is optimized according to the service instruction for service job scheduling, and the accuracy of overall job scheduling is ensured.
Example two
Fig. 3 is a flowchart of a job scheduling method according to a second embodiment of the present invention, and on the basis of the foregoing embodiment, a specific description will be given of obtaining rewards for target actions. As shown in fig. 3, the method includes:
step S201, status information of the cluster environment is acquired.
Optionally, before acquiring the state information of the cluster environment, the method further includes: receiving a job submitted by a user through a cluster, wherein the job is configured with resource requirements including a job identifier, a required number of CPUs, a required memory size, an expected use cost, a latest completion time and a required number of execution units; and storing the jobs submitted by the user in a distributed mode in each virtual machine, wherein a cluster comprises a master virtual machine and a plurality of slave virtual machines, wherein the master virtual machine is configured with a deep reinforcement learning agent unit.
Step S202, a specified algorithm is adopted to screen out target actions from the action space according to the state information.
Optionally, selecting the target action from the action space by adopting a specified algorithm according to the state information includes: determining a type of the specified algorithm, wherein the type comprises a Q-Learning algorithm or a strategy gradient-based algorithm; traversing in an action space by adopting a designated algorithm through a deep reinforcement learning agent unit in the main virtual machine; and taking the action selected by the specified algorithm as a target action.
Step S203, determining a target virtual machine according to the target action, and distributing the current job to the target virtual machine for scheduling operation.
Optionally, determining the target virtual machine, and distributing the current job to the target virtual machine for scheduling operation, including: determining a virtual machine corresponding to the target action, and taking the corresponding virtual machine as a target virtual machine; creating execution units on the target virtual machine according to the number of the required execution units; and distributing the current job to an execution unit created on the target virtual machine, and performing scheduling operation on the current job through the execution unit.
Step S204, judging whether the current job is the last job submitted by the user, if yes, executing step S206, otherwise, executing step S205.
Step S205, obtaining constant rewards, taking the constant rewards as rewards of target actions, and adjusting a specified algorithm according to the rewards.
Wherein the constant prize may be R fixed Within each Epinode is a fixed constant, but will vary as the Agent's behavior in each Epinode changes to find the maximized objective function. Thus, when the current job is determined not to be the last job submitted by the user, the current Epinode is not interacted completely, and therefore the constant is awarded R fixed The specific adjustment method of the specific algorithm according to the reward is not limited in the present embodiment, and is not limited in the present embodiment as long as the specific adjustment method can be more suitable for the cluster environment.
Step S206, the interaction parameters of the clusters are obtained, rewards of target actions are obtained through calculation according to the interaction parameters, and the designated algorithm is adjusted according to the rewards.
Specifically, the interaction parameters include total usage cost of the cluster, maximum usage cost of the cluster, minimum usage cost of the cluster, total average response time of the job, shortest average response time of the job, and maximum average response time of the job, wherein the following formula (1) is the obtained total usage cost of the cluster:
wherein,representing the cost of the ith slave virtual machine, +.>Indicating the time of use of the ith slave virtual machine.
The following equation (2) is the total average response time of the acquired job:
wherein,indicates the end time of the j-th job, +.>Indicating the start time of the j-th job.
The following equation (3) is the maximum usage cost of the acquired cluster:
wherein,indicating the maximum execution time of the j-th job,/->Representing the cost of the ith slave virtual machine.
The following equation (4) is the minimum usage cost of the acquired cluster:
wherein, among them,representing the minimum execution time of the j-th job,/->Representing the cost of the ith slave virtual machine.
Equation (5) below is the shortest average response time of the acquired job:
wherein,representing the minimum execution time of the j-th job, M represents the number of slave virtual machines in the cluster.
Equation (6) below is the maximum average response time of the acquired job:
wherein,representing the maximum execution time of the j-th job, M represents the number of slave virtual machines in the cluster.
Optionally, calculating to obtain the reward of the target action according to the interaction parameter includes: carrying out standardization processing on the total use cost of the cluster according to the maximum use cost and the minimum use cost of the cluster, and obtaining the normalized total use cost of the cluster; carrying out standardization processing on the total average response time of the job according to the shortest average response time of the job and the maximum average response time of the job, and obtaining the normalized total average response time of the job; and calculating to obtain the rewards of the target actions according to the constant rewards, the total using cost of the normalized cluster and the total average response time of the normalized job.
The total cost of use of the cluster after normalization is obtained is shown in the following formula (7):
the total average response time of the normalized job is obtained as shown in the following formula (8):
wherein, when obtaining a constant reward R fixed Total Cost of use Cost of normalized clusters norm And the normalized total average response time ART of the job norm Thereafter, the following equation (9) may be used to calculate a reward for obtaining the target action:
R epi =R fixed ×[(1-Cost norm )×α+(1-ART norm )×(1-α)] (9)
wherein, the value of alpha is [0,1]. By adjusting the value of formula α, the importance of an Agent on different optimization objectives can be controlled. If α is chosen to be 0, this means that the Agent will be trained, shortening only the average response time of the job. Conversely, if α is selected to be 1, this means that the Agent will be trained, reducing only the overall resource usage cost of the cluster.
In this embodiment, the target action is screened out according to the acquired state information of the cluster environment, and the target virtual machine for scheduling the current job is determined according to the target action, so that the job is accurately scheduled according to the acquired optimal target action, and the designated algorithm is adjusted according to the reward acquired by the target action, so that the designated algorithm is optimized according to the service instruction for service job scheduling, and the accuracy of overall job scheduling is ensured.
Example III
Fig. is a schematic structural diagram of a job scheduling device according to a third embodiment of the present invention. As shown in fig. 4, the apparatus includes: a status information acquisition module 310, a target action screening module 320, a job scheduling operation module 330, and a reward acquisition module 340.
A state information obtaining module 310, configured to obtain state information of a cluster environment, where the state information includes resource information of each virtual machine in the cluster environment and association information of a current operation;
the target action screening module 320 is configured to screen target actions from the action space by adopting a specified algorithm according to the state information, where the action space includes waiting actions and actions executed by each virtual machine;
the job scheduling operation module 330 is configured to determine a target virtual machine according to the target action, and allocate the current job to the target virtual machine for scheduling operation;
and the reward obtaining module 340 is configured to obtain a reward for the target action, and adjust the specified algorithm according to the reward.
Optionally, the apparatus further includes a job receiving module configured to receive a job submitted by a user through the cluster, where the job is configured with a resource requirement, and the resource requirement includes a job identifier, a required number of CPUs, a required memory size, an expected use cost, a latest completion time, and a required number of execution units;
and storing the jobs submitted by the user in a distributed mode in each virtual machine, wherein a cluster comprises a master virtual machine and a plurality of slave virtual machines, wherein the master virtual machine is configured with a deep reinforcement learning agent unit.
Optionally, the status information obtaining module is configured to determine a current job to be executed from the jobs submitted by the user, and determine the number of CPUs required by each execution unit and the size of the memory required by each execution unit of the current job;
taking a job identifier of the current job, the number of CPUs required by each execution unit, the memory size required by each execution unit, the number of the required execution units, the expected use cost and the latest completion time as associated information of the current job;
taking the number of CPUs configured by each slave virtual machine and the size of the configured memory in the cluster as the resource information of each virtual machine;
and combining the resource information of each virtual machine and the associated information of the current operation in a one-dimensional vector form to generate the state information of the cluster environment, and transmitting the state information of the cluster environment to the main virtual machine.
Optionally, the target action screening module is used for determining the type of the specified algorithm, wherein the type comprises a Q-Learning algorithm or an algorithm based on a strategy gradient;
traversing in an action space by adopting a designated algorithm through a deep reinforcement learning agent unit in the main virtual machine;
and taking the action selected by the specified algorithm as a target action.
Optionally, the job scheduling operation module is configured to determine a virtual machine corresponding to the target action, and take the corresponding virtual machine as the target virtual machine;
creating execution units on the target virtual machine according to the number of the required execution units;
and distributing the current job to an execution unit created on the target virtual machine, and performing scheduling operation on the current job through the execution unit.
Optionally, the reward obtaining module is configured to obtain a constant reward when it is determined that the current job is not the last job submitted by the user, and take the constant reward as the reward of the target action;
when the current job is the last job submitted by the user, acquiring interaction parameters of the cluster, and calculating according to the interaction parameters to acquire rewards of target actions, wherein the interaction parameters comprise total use cost of the cluster, maximum use cost of the cluster, minimum use cost of the cluster, total average response time of the job, shortest average response time of the job and maximum average response time of the job.
Optionally, the reward acquisition module is further configured to perform standardization processing on the total usage cost of the cluster according to the maximum usage cost of the cluster and the minimum usage cost of the cluster, so as to acquire the normalized total usage cost of the cluster;
carrying out standardization processing on the total average response time of the job according to the shortest average response time of the job and the maximum average response time of the job, and obtaining the normalized total average response time of the job;
and calculating to obtain the rewards of the target actions according to the constant rewards, the total using cost of the normalized cluster and the total average response time of the normalized job.
The job scheduling device provided by the embodiment of the invention can execute the job scheduling method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example IV
Fig. 5 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the respective methods and processes described above, such as a job scheduling method.
In some embodiments, the job scheduling method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the job scheduling method described above may be performed. Alternatively, in other embodiments, processor 11 may be configured to perform the job scheduling method in any other suitable manner (e.g., by means of firmware).
Various implementations of the apparatus and techniques described here above may be implemented in digital electronic circuit devices, integrated circuit devices, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), on-chip device devices (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on programmable devices including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, operable to receive data and instructions from, and to transmit data and instructions to, a storage device, at least one input device, and at least one output device.
The computer program used to implement the job scheduling method of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable crown block work warning device such that the computer programs, when executed by the processor, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution apparatus, device, or apparatus. The computer readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor apparatus, device, or apparatus, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the apparatus and techniques described herein may be implemented on a device having: a display device (e.g., a touch screen) for displaying information to a user; and keys, the user may provide input to the device through a touch screen or keys. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A job scheduling method, comprising:
acquiring state information of a cluster environment, wherein the state information comprises resource information of each virtual machine in the cluster environment and associated information of current operation;
screening out target actions from an action space by adopting a specified algorithm according to the state information, wherein the action space comprises waiting actions and executing actions of each virtual machine;
determining a target virtual machine according to the target action, and distributing the current job to the target virtual machine for scheduling operation;
and obtaining rewards aiming at the target actions, and adjusting the designated algorithm according to the rewards.
2. The method of claim 1, further comprising, prior to the obtaining the status information of the clustered environment:
receiving a job submitted by a user through a cluster, wherein the job is configured with resource requirements, and the resource requirements comprise a job identifier, a required CPU number, a required memory size, an expected use cost, a latest completion time and a required number of execution units;
and storing the jobs submitted by the users in a distributed mode in each virtual machine, wherein the cluster comprises a master virtual machine and a plurality of slave virtual machines, wherein the master virtual machine is configured with a deep reinforcement learning agent unit.
3. The method of claim 2, wherein the obtaining status information of the clustered environment comprises:
determining a current job to be executed from jobs submitted by a user, and determining the number of CPUs required by each execution unit and the memory size required by each execution unit of the current job;
taking the job identification of the current job, the number of CPUs required by each execution unit, the memory size required by each execution unit, the number of the required execution units, the expected use cost and the latest completion time as the associated information of the current job;
taking the number of CPUs configured by each slave virtual machine and the size of the configured memory in the cluster as resource information of each virtual machine;
and combining the resource information of each virtual machine and the associated information of the current operation in a one-dimensional vector form to generate the state information of the cluster environment, and transmitting the state information of the cluster environment to the main virtual machine.
4. The method of claim 2, wherein the screening out the target action from the action space using a specified algorithm based on the status information comprises:
determining a type of the specified algorithm, wherein the type comprises a Q-Learning algorithm or a strategy gradient-based algorithm;
traversing in the action space by adopting the specified algorithm through a deep reinforcement learning agent unit in the main virtual machine;
and taking the action selected by the specified algorithm as the target action.
5. The method of claim 3, wherein the determining the target virtual machine and assigning the current job to the target virtual machine for scheduling operations comprises:
determining a virtual machine corresponding to the target action, and taking the corresponding virtual machine as the target virtual machine;
creating execution units on the target virtual machine according to the number of the required execution units;
and distributing the current job to an execution unit created on the target virtual machine, and performing scheduling operation on the current job through the execution unit.
6. The method of claim 1, wherein the obtaining a reward for the target action comprises:
acquiring a constant reward when the current job is not the last job submitted by a user, and taking the constant reward as the reward of the target action;
and when the current job is the last job submitted by the user, acquiring interaction parameters of the cluster, and calculating according to the interaction parameters to acquire rewards of the target action, wherein the interaction parameters comprise total use cost of the cluster, maximum use cost of the cluster, minimum use cost of the cluster, total average response time of the job, shortest average response time of the job and maximum average response time of the job.
7. The method of claim 6, wherein calculating the reward for obtaining the target action based on the interaction parameter comprises:
carrying out standardization processing on the total use cost of the cluster according to the maximum use cost of the cluster and the minimum use cost of the cluster, and obtaining the normalized total use cost of the cluster;
carrying out standardization processing on the total average response time of the job according to the shortest average response time of the job and the maximum average response time of the job, and obtaining the normalized total average response time of the job;
and calculating and acquiring the rewards of the target actions according to the constant rewards, the total using cost of the normalized cluster and the total average response time of the normalized job.
8. A job scheduling device, comprising:
the system comprises a state information acquisition module, a state information processing module and a state information processing module, wherein the state information acquisition module is used for acquiring state information of a cluster environment, and the state information comprises resource information of each virtual machine in the cluster environment and associated information of current operation;
the target action screening module is used for screening target actions from an action space by adopting a specified algorithm according to the state information, wherein the action space comprises waiting actions and actions executed by each virtual machine;
the job scheduling operation module is used for determining a target virtual machine according to the target action and distributing the current job to the target virtual machine for scheduling operation;
and the reward acquisition module is used for acquiring the reward aiming at the target action and adjusting the designated algorithm according to the reward.
9. A computer device, the device comprising:
one or more processors;
storage means for storing one or more programs,
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.
10. A storage medium having stored thereon computer program of instructions, which when executed by a processor, performs the method of any of claims 1-7.
CN202311385991.5A 2023-10-24 2023-10-24 Job scheduling method, device, equipment and storage medium Pending CN117331668A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311385991.5A CN117331668A (en) 2023-10-24 2023-10-24 Job scheduling method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311385991.5A CN117331668A (en) 2023-10-24 2023-10-24 Job scheduling method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117331668A true CN117331668A (en) 2024-01-02

Family

ID=89293014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311385991.5A Pending CN117331668A (en) 2023-10-24 2023-10-24 Job scheduling method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117331668A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117682429A (en) * 2024-02-01 2024-03-12 华芯(嘉兴)智能装备有限公司 Crown block carrying instruction scheduling method and device of material control system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117682429A (en) * 2024-02-01 2024-03-12 华芯(嘉兴)智能装备有限公司 Crown block carrying instruction scheduling method and device of material control system
CN117682429B (en) * 2024-02-01 2024-04-05 华芯(嘉兴)智能装备有限公司 Crown block carrying instruction scheduling method and device of material control system

Similar Documents

Publication Publication Date Title
Zhu et al. Scheduling stochastic multi-stage jobs to elastic hybrid cloud resources
US20090077235A1 (en) Mechanism for profiling and estimating the runtime needed to execute a job
CN110928689B (en) Self-adaptive resource management method and device for distributed reinforcement learning training
CN110297711A (en) Batch data processing method, device, computer equipment and storage medium
CN110389816B (en) Method, apparatus and computer readable medium for resource scheduling
CN107003887A (en) Overloaded cpu setting and cloud computing workload schedules mechanism
EP3935503B1 (en) Capacity management in a cloud computing system using virtual machine series modeling
CN106557369A (en) A kind of management method and system of multithreading
CN113032102B (en) Resource rescheduling method, device, equipment and medium
CN117331668A (en) Job scheduling method, device, equipment and storage medium
KR101770191B1 (en) Resource allocation and apparatus
JP2016042284A (en) Parallel computer system, management device, method for controlling parallel computer system, and management device control program
CN114237869B (en) Ray double-layer scheduling method and device based on reinforcement learning and electronic equipment
CN112486642B (en) Resource scheduling method, device, electronic equipment and computer readable storage medium
CN110764915A (en) Optimization method for kubernetes main node selection
CN107203256B (en) Energy-saving distribution method and device under network function virtualization scene
WO2020206699A1 (en) Predicting virtual machine allocation failures on server node clusters
CN115357401B (en) Task scheduling and visualization method and system based on multiple data centers
CN115827179B (en) Calculation power scheduling method, device and equipment of physical machine equipment and storage medium
KR101360263B1 (en) Apparatus for controlling a computation job, method of controlling a computation job, and storage medium for storing a software controlling a computation job
CN115952054A (en) Simulation task resource management method, device, equipment and medium
Jiang et al. Resource allocation in contending virtualized environments through VM performance modeling and feedback
CN105718297A (en) Virtual machine establishing system and method
CN106878435A (en) A kind of mobile phone application and development method based on cloud storage
CN115061794A (en) Method, device, terminal and medium for scheduling task and training neural network model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination