CN117331668A

CN117331668A - Job scheduling method, device, equipment and storage medium

Info

Publication number: CN117331668A
Application number: CN202311385991.5A
Authority: CN
Inventors: 何玉林; 莫沛恒
Original assignee: Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Shenzhen
Current assignee: Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Shenzhen
Priority date: 2023-10-24
Filing date: 2023-10-24
Publication date: 2024-01-02

Abstract

The invention discloses a job scheduling method, a device, equipment and a storage medium, comprising the following steps: acquiring state information of a cluster environment; screening out target actions from the action space by adopting a specified algorithm according to the state information; determining a target virtual machine according to the target action, and distributing the current operation to the target virtual machine for scheduling operation; and obtaining rewards aiming at the target actions, and adjusting the designated algorithm according to the rewards. The method comprises the steps of screening out target actions according to the acquired state information of the cluster environment, determining a target virtual machine for scheduling and running current jobs according to the target actions, accurately scheduling the jobs according to the acquired optimal target actions because the state information is acquired at the latest moment, and adjusting a designated algorithm according to rewards acquired by the target actions, so that the designated algorithm is optimized according to service instructions for service job scheduling, and the accuracy of overall job scheduling is guaranteed.

Description

Job scheduling method, device, equipment and storage medium

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a job scheduling method, apparatus, device, and storage medium.

Background

Spark is a general and rapid large-scale data processing system, and currently mainstream Spark job scheduling methods are mainly divided into two types, namely Spark job scheduling based on heuristic and Spark job scheduling based on deep reinforcement learning (Deep Reinforcement Learning, DRL).

However, current heuristics-based models rely heavily on past data, which is sometimes outdated due to various changes in the cluster environment, and it is also difficult to adjust or modify the heuristics-based methods to incorporate workload and cluster changes. While DRL based methods do not pay good attention to QoS requirements. Moreover, the real Spark cluster environment is usually complex, algorithms used by the methods are old, modeling of the complex Spark cluster environment is difficult, and the complex Spark cluster environment cannot be well adapted to the changed Spark cluster environment.

Disclosure of Invention

The invention provides a job scheduling method, a device, equipment and a storage medium, which are used for realizing accurate scheduling of jobs.

According to an aspect of the present invention, there is provided a job scheduling method including: acquiring state information of a cluster environment, wherein the state information comprises resource information of each virtual machine in the cluster environment and associated information of current operation;

screening out target actions from an action space by adopting a specified algorithm according to the state information, wherein the action space comprises waiting actions and executing actions of each virtual machine;

determining a target virtual machine according to the target action, and distributing the current job to the target virtual machine for scheduling operation;

and obtaining rewards aiming at the target actions, and adjusting the designated algorithm according to the rewards.

According to another aspect of the present invention, there is provided a job scheduling apparatus including:

the system comprises a state information acquisition module, a state information processing module and a state information processing module, wherein the state information acquisition module is used for acquiring state information of a cluster environment, and the state information comprises resource information of each virtual machine in the cluster environment and associated information of current operation;

the target action screening module is used for screening target actions from an action space by adopting a specified algorithm according to the state information, wherein the action space comprises waiting actions and actions executed by each virtual machine;

the job scheduling operation module is used for determining a target virtual machine according to the target action and distributing the current job to the target virtual machine for scheduling operation;

and the reward acquisition module is used for acquiring the reward aiming at the target action and adjusting the designated algorithm according to the reward.

According to another aspect of the present invention, there is provided a computer apparatus, characterized in that the apparatus comprises:

one or more processors;

storage means for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the methods described in any of the embodiments of the invention.

According to another aspect of the invention, there is provided a storage medium having stored thereon computer program which when executed by a processor implements a method according to any of the embodiments of the invention.

According to the technical scheme, the target action is screened out through the acquired state information of the cluster environment, the target virtual machine for dispatching and running the current operation is determined according to the target action, and because the state information is acquired at the latest moment, the operation is accurately dispatched according to the acquired optimal target action, and the designated algorithm is adjusted according to rewards acquired by the target action, so that the designated algorithm is optimized according to the service instruction for service operation dispatching, and the accuracy of overall operation dispatching is ensured.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a job scheduling method according to a first embodiment of the present invention;

FIG. 2 is a schematic diagram of an application framework for job scheduling according to a first embodiment of the present invention;

FIG. 3 is a flow chart of a job scheduling method according to a second embodiment of the present invention;

fig. 4 is a schematic structural diagram of a job scheduling device according to a third embodiment of the present invention;

fig. 5 is a block diagram of an electronic device according to a fourth embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, "comprises," "comprising," and "having" and any variations thereof are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or device.

Example 1

Fig. 1 is a flowchart of a job scheduling method according to an embodiment of the present invention, where the method may be performed by a job scheduling device. As shown in fig. 1, the method includes:

step S101, acquiring state information of a cluster environment.

Optionally, before acquiring the state information of the cluster environment, the method further includes: receiving a job submitted by a user through a cluster, wherein the job is configured with resource requirements including a job identifier, a required number of CPUs, a required memory size, an expected use cost, a latest completion time and a required number of execution units; and storing the jobs submitted by the user in a distributed mode in each virtual machine, wherein a cluster comprises a master virtual machine and a plurality of slave virtual machines, wherein the master virtual machine is configured with a deep reinforcement learning agent unit.

Specifically, as shown in fig. 2, which is a schematic diagram of an application framework of job scheduling provided in the present application, in this embodiment, a cluster includes a master virtual machine configured with a deep reinforcement learning Agent unit DRL Agent and a plurality of slave virtual machines, so that the application framework mainly includes a DRL Agent and a cluster environment, the Agent of the DRL places an execution unit Executor of a submitted job by observing the state of the environment, and the environment feeds back the state of the next time step and rewards of the current time step to the Agent after the placement. In a cluster, a user may submit one or more jobs, and the user configures a resource requirement for the submitted job, where the resource requirement includes a job identifier, a required number of CPUs, a required memory size, an expected use cost, a latest completion time, and a required number of execution units, which are of course only illustrative and not limiting. Therefore, after the job submitted by the user is received through the cluster, the job submitted by the user is distributed and stored in each virtual machine. The cluster environment of the present embodiment is mainly composed of jobs submitted by the respective slave virtual machines and users. The DRL Agent located in the host virtual machine is mainly used as a scheduler, and is responsible for scheduling the jobs in the cluster. The operation is performed by observing the state of the environment. Rewards and next observable states are obtained from the environment based on the operation. In this context, an Agent observes the status of each slave virtual machine in the Spark cluster environment, as well as the information of arriving jobs, to schedule jobs and make better decisions on the environment's feedback.

Optionally, acquiring status information of the cluster environment includes: determining a current job to be executed from jobs submitted by a user, and determining the number of CPUs (central processing units) required by each execution unit and the memory size required by each execution unit of the current job; taking a job identifier of the current job, the number of CPUs required by each execution unit, the memory size required by each execution unit, the number of the required execution units, the expected use cost and the latest completion time as associated information of the current job; taking the number of CPUs configured by each slave virtual machine and the size of the configured memory in the cluster as the resource information of each virtual machine; and combining the resource information of each virtual machine and the associated information of the current operation in a one-dimensional vector form to generate the state information of the cluster environment, and transmitting the state information of the cluster environment to the main virtual machine.

The state information comprises resource information of each virtual machine in the cluster environment and associated information of the current operation. Since the resource requirement is configured in the job submitted by the user, in this embodiment, the current job to be executed is determined from the job submitted by the user, for example, the first time scheduling is performed, the job w1 submitted by the user is used as the current job, the number of CPUs required by each execution unit of the current job and the memory size required by each execution unit are determined according to the resource requirement configured by the current job, for example, the job identifier jobid of the current job is 1, the number of required execution units enum is 2, the number of required CPUs is 10, the required memory size is 2G, the expected use cost is 5, and when the latest completion time jobdl is 10, the number of CPUs ecpu required by each execution unit is 5, and the memory size emem required by each execution unit is 1G, so that the obtained jobid, ecpu, emem, enum, jobtar and jobdl are used as the related information of the current job. In addition, when the number of slave virtual machines in the cluster is determined to be N, the number of CPUs and the size of the memory configured by each slave virtual machine are also obtained as resource information of each virtual machine, for example, the number of CPUs configured by the slave virtual machine 1 is vm1CPU, and the size of the memory configured by the slave virtual machine 1 is vm1mem. And combining the obtained resource information of each virtual machine and the associated information of the current operation in a one-dimensional vector form to generate state information of a cluster environment, for example, the state information of the cluster environment is: [ vm1cpu, vm1mem, …, vmNcpu, vmNmem, jobid, ecpu, emem, enum, jobtar, jobdl ], and transmits the acquired status information of the cluster environment to the host virtual machine. Of course, the present embodiment is merely illustrative, and the specific content and format of the status information of the cluster environment are not limited, and all embodiments are within the scope of the present application as long as the cluster environment can be accurately described.

Step S102, a specified algorithm is adopted to screen out target actions from the action space according to the state information.

Optionally, selecting the target action from the action space by adopting a specified algorithm according to the state information includes: determining a type of the specified algorithm, wherein the type comprises a Q-Learning algorithm or a strategy gradient-based algorithm; traversing in an action space by adopting a designated algorithm through a deep reinforcement learning agent unit in the main virtual machine; and taking the action selected by the specified algorithm as a target action.

The Action space includes a waiting Action and each virtual machine executing actions, for example, action 0 indicates that an Agent does not immediately create an execution unit for an arriving job, but waits for some specific virtual machines to be idle and then performs scheduling, and Action 1-N indicates that an Agent selects one of the current virtual machines to create an execution unit for the job. The type of the specified algorithm is specifically determined when the target action is screened, and the specified algorithm adopted by the Agent in the embodiment when the target action is screened can specifically comprise a Q-Learning algorithm such as a algorithm DQN of a strategy gradient, such as PPO+GAE, and the like. These two algorithms are chosen because they are applicable to DRL environments with discrete states and action spaces. In addition, the two algorithms work differently, wherein the global DQN optimizes state action values, while PPO directly updates policies, and similar but different results can be obtained from different work. From the Spark job scheduling process, the deep reinforcement learning environment will provide job specification information similar to the tracking run for actual workload, and virtual machine resource availability will also be used and updated as part of the state space. Since the specific working principles of the two algorithms are not important in the present application, a detailed description is omitted in this embodiment, and one algorithm may be selected from the two algorithms according to the actual scheduling requirement as the specified algorithm adopted for performing the subsequent job scheduling. In this embodiment, the DRL Agent in the host virtual machine traverses the Action space by using the specified algorithm, and uses the Action selected by the specified algorithm as the target Action, for example, action 1 as the target Action, which is, of course, merely illustrative, and not limiting the specific content of the selected target Action and the specific principle of the selection, and if the target Action according to the cluster environment state information can be selected, it is within the scope of protection of the present application, and the present embodiment is not limited thereto.

And step S103, determining a target virtual machine according to the target action, and distributing the current job to the target virtual machine for scheduling operation.

Optionally, determining the target virtual machine, and distributing the current job to the target virtual machine for scheduling operation, including: determining a virtual machine corresponding to the target action, and taking the corresponding virtual machine as a target virtual machine; creating execution units on the target virtual machine according to the number of the required execution units; and distributing the current job to an execution unit created on the target virtual machine, and performing scheduling operation on the current job through the execution unit.

Specifically, in this embodiment, the target virtual machine is determined according to the target Action, for example, when the target Action is determined to be Action 1, the target virtual machine is determined to be the slave virtual machine 1, when the current job is determined to be the job w1 submitted by the user for the first time, the execution unit Executor is created on the slave virtual machine 1 according to the number 1 of execution units required by the job w1, the current job w1 is distributed to the execution units created on the slave virtual machine 1, and the current job w1 is scheduled to run by the execution units created and placed on the slave virtual machine 1.

It should be noted that, when the number of execution units is plural, a filtering is performed again by the DRL Agent for each creation of the execution units, so as to redetermine the target virtual machine, for example, when the number of execution units required for the current job w1 is 2, if the DRL Agent determines that the target virtual machine is the slave virtual machine 1 through the first filtering, after the current job w1 runs on the chest virtual machine 1, state information in the cluster environment is changed and perceived by the DRL Agent, at this time, the DRL Agent re-filters a new target action from the action space by adopting a specified algorithm based on the perceived new state information, determines a new target virtual machine according to the target action, for example, the slave virtual machine 2, creates a second execution unit Executor on the slave virtual machine 2, and distributes the current job w1 to the execution units created on the slave virtual machine 2, and continues to schedule the current job w1 through the execution units created and placed on the slave virtual machine 2. Therefore, the present embodiment relates to a case where a plurality of slave virtual machines execute the same job, and of course, the present embodiment is merely illustrative, and the number of target virtual machines corresponding to each job is not limited, and if the execution of the current job can be scheduled, the present invention is not limited in the scope of protection of the present application.

Step S104, obtaining rewards aiming at the target action, and adjusting the designated algorithm according to the rewards.

Optionally, acquiring a constant reward when it is determined that the current job is not the last job submitted by the user, and taking the constant reward as the reward of the target action; when the current job is the last job submitted by the user, acquiring interaction parameters of the cluster, and calculating according to the interaction parameters to acquire rewards of target actions, wherein the interaction parameters comprise total use cost of the cluster, maximum use cost of the cluster, minimum use cost of the cluster, total average response time of the job, shortest average response time of the job and maximum average response time of the job.

It should be noted that, in this embodiment, a complete interaction between an Agent and an environment may be referred to as an epoode, which indicates that a first job arrives at a Spark cluster to begin scheduling, and ends after all jobs have been scheduled, and includes rewarding information acquired by the Agent from this process. In addition, episode may terminate prematurely when an Agent performs a negative operation. Only when a job is submitted, the environment will send status information to the Agent. After the Agent receives the environment information, executing the placing action of the Executor or waiting for the placing action, and after the Agent makes the action, the environment sends the updated state to the Agent.

Wherein, once an Agent takes action, the Agent gets rewards immediately, and the Agent optimizes its policy according to the rewards. The Agent receives a positive or negative reward after performing the operation. Positive rewards will cause agents to take good action and optimize the jackpot in the overall epoode. Conversely, negative rewards allow agents to avoid adverse behavior. In an Epinode, to obtain a maximized jackpot, an Agent must consider both the current and future jackpots. The current rewards are rewards that the environment gives to an Agent when the Agent has placed an Executor successfully or failed. When an Agent successfully places execution Executor, it will be given a small positive prize. When an Agent fails to execute an Executor, a large amount of negative rewards are given to the Agent and the event is ended. In addition, in order to avoid that the Agent always performs a waiting operation to avoid failure, a small negative prize is given to the Agent after the Agent performs the waiting operation. When an Epinode ends successfully, the environment rewards the Agent with an Epinode.

In this embodiment, the target action is screened out according to the acquired state information of the cluster environment, and the target virtual machine for scheduling the current job is determined according to the target action, so that the job is accurately scheduled according to the acquired optimal target action, and the designated algorithm is adjusted according to the reward acquired by the target action, so that the designated algorithm is optimized according to the service instruction for service job scheduling, and the accuracy of overall job scheduling is ensured.

Example two

Fig. 3 is a flowchart of a job scheduling method according to a second embodiment of the present invention, and on the basis of the foregoing embodiment, a specific description will be given of obtaining rewards for target actions. As shown in fig. 3, the method includes:

step S201, status information of the cluster environment is acquired.

Step S202, a specified algorithm is adopted to screen out target actions from the action space according to the state information.

Step S203, determining a target virtual machine according to the target action, and distributing the current job to the target virtual machine for scheduling operation.

Step S204, judging whether the current job is the last job submitted by the user, if yes, executing step S206, otherwise, executing step S205.

Step S205, obtaining constant rewards, taking the constant rewards as rewards of target actions, and adjusting a specified algorithm according to the rewards.

Wherein the constant prize may be R _fixed Within each Epinode is a fixed constant, but will vary as the Agent's behavior in each Epinode changes to find the maximized objective function. Thus, when the current job is determined not to be the last job submitted by the user, the current Epinode is not interacted completely, and therefore the constant is awarded R _fixed The specific adjustment method of the specific algorithm according to the reward is not limited in the present embodiment, and is not limited in the present embodiment as long as the specific adjustment method can be more suitable for the cluster environment.

Step S206, the interaction parameters of the clusters are obtained, rewards of target actions are obtained through calculation according to the interaction parameters, and the designated algorithm is adjusted according to the rewards.

Specifically, the interaction parameters include total usage cost of the cluster, maximum usage cost of the cluster, minimum usage cost of the cluster, total average response time of the job, shortest average response time of the job, and maximum average response time of the job, wherein the following formula (1) is the obtained total usage cost of the cluster:

wherein,representing the cost of the ith slave virtual machine, +.>Indicating the time of use of the ith slave virtual machine.

The following equation (2) is the total average response time of the acquired job:

wherein,indicates the end time of the j-th job, +.>Indicating the start time of the j-th job.

The following equation (3) is the maximum usage cost of the acquired cluster:

wherein,indicating the maximum execution time of the j-th job,/->Representing the cost of the ith slave virtual machine.

The following equation (4) is the minimum usage cost of the acquired cluster:

wherein, among them,representing the minimum execution time of the j-th job,/->Representing the cost of the ith slave virtual machine.

Equation (5) below is the shortest average response time of the acquired job:

wherein,representing the minimum execution time of the j-th job, M represents the number of slave virtual machines in the cluster.

Equation (6) below is the maximum average response time of the acquired job:

wherein,representing the maximum execution time of the j-th job, M represents the number of slave virtual machines in the cluster.

Optionally, calculating to obtain the reward of the target action according to the interaction parameter includes: carrying out standardization processing on the total use cost of the cluster according to the maximum use cost and the minimum use cost of the cluster, and obtaining the normalized total use cost of the cluster; carrying out standardization processing on the total average response time of the job according to the shortest average response time of the job and the maximum average response time of the job, and obtaining the normalized total average response time of the job; and calculating to obtain the rewards of the target actions according to the constant rewards, the total using cost of the normalized cluster and the total average response time of the normalized job.

The total cost of use of the cluster after normalization is obtained is shown in the following formula (7):

the total average response time of the normalized job is obtained as shown in the following formula (8):

wherein, when obtaining a constant reward R _fixed Total Cost of use Cost of normalized clusters _norm And the normalized total average response time ART of the job _norm Thereafter, the following equation (9) may be used to calculate a reward for obtaining the target action:

R _epi ＝R _fixed ×[(1-Cost _norm )×α+(1-ART _norm )×(1-α)] (9)

wherein, the value of alpha is [0,1]. By adjusting the value of formula α, the importance of an Agent on different optimization objectives can be controlled. If α is chosen to be 0, this means that the Agent will be trained, shortening only the average response time of the job. Conversely, if α is selected to be 1, this means that the Agent will be trained, reducing only the overall resource usage cost of the cluster.

Example III

Fig. is a schematic structural diagram of a job scheduling device according to a third embodiment of the present invention. As shown in fig. 4, the apparatus includes: a status information acquisition module 310, a target action screening module 320, a job scheduling operation module 330, and a reward acquisition module 340.

A state information obtaining module 310, configured to obtain state information of a cluster environment, where the state information includes resource information of each virtual machine in the cluster environment and association information of a current operation;

the target action screening module 320 is configured to screen target actions from the action space by adopting a specified algorithm according to the state information, where the action space includes waiting actions and actions executed by each virtual machine;

the job scheduling operation module 330 is configured to determine a target virtual machine according to the target action, and allocate the current job to the target virtual machine for scheduling operation;

and the reward obtaining module 340 is configured to obtain a reward for the target action, and adjust the specified algorithm according to the reward.

Optionally, the apparatus further includes a job receiving module configured to receive a job submitted by a user through the cluster, where the job is configured with a resource requirement, and the resource requirement includes a job identifier, a required number of CPUs, a required memory size, an expected use cost, a latest completion time, and a required number of execution units;

and storing the jobs submitted by the user in a distributed mode in each virtual machine, wherein a cluster comprises a master virtual machine and a plurality of slave virtual machines, wherein the master virtual machine is configured with a deep reinforcement learning agent unit.

Optionally, the status information obtaining module is configured to determine a current job to be executed from the jobs submitted by the user, and determine the number of CPUs required by each execution unit and the size of the memory required by each execution unit of the current job;

taking a job identifier of the current job, the number of CPUs required by each execution unit, the memory size required by each execution unit, the number of the required execution units, the expected use cost and the latest completion time as associated information of the current job;

taking the number of CPUs configured by each slave virtual machine and the size of the configured memory in the cluster as the resource information of each virtual machine;

and combining the resource information of each virtual machine and the associated information of the current operation in a one-dimensional vector form to generate the state information of the cluster environment, and transmitting the state information of the cluster environment to the main virtual machine.

Optionally, the target action screening module is used for determining the type of the specified algorithm, wherein the type comprises a Q-Learning algorithm or an algorithm based on a strategy gradient;

traversing in an action space by adopting a designated algorithm through a deep reinforcement learning agent unit in the main virtual machine;

and taking the action selected by the specified algorithm as a target action.

Optionally, the job scheduling operation module is configured to determine a virtual machine corresponding to the target action, and take the corresponding virtual machine as the target virtual machine;

creating execution units on the target virtual machine according to the number of the required execution units;

and distributing the current job to an execution unit created on the target virtual machine, and performing scheduling operation on the current job through the execution unit.

Optionally, the reward obtaining module is configured to obtain a constant reward when it is determined that the current job is not the last job submitted by the user, and take the constant reward as the reward of the target action;

when the current job is the last job submitted by the user, acquiring interaction parameters of the cluster, and calculating according to the interaction parameters to acquire rewards of target actions, wherein the interaction parameters comprise total use cost of the cluster, maximum use cost of the cluster, minimum use cost of the cluster, total average response time of the job, shortest average response time of the job and maximum average response time of the job.

Optionally, the reward acquisition module is further configured to perform standardization processing on the total usage cost of the cluster according to the maximum usage cost of the cluster and the minimum usage cost of the cluster, so as to acquire the normalized total usage cost of the cluster;

carrying out standardization processing on the total average response time of the job according to the shortest average response time of the job and the maximum average response time of the job, and obtaining the normalized total average response time of the job;

and calculating to obtain the rewards of the target actions according to the constant rewards, the total using cost of the normalized cluster and the total average response time of the normalized job.

The job scheduling device provided by the embodiment of the invention can execute the job scheduling method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example IV

Fig. 5 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the respective methods and processes described above, such as a job scheduling method.

In some embodiments, the job scheduling method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the job scheduling method described above may be performed. Alternatively, in other embodiments, processor 11 may be configured to perform the job scheduling method in any other suitable manner (e.g., by means of firmware).

Various implementations of the apparatus and techniques described here above may be implemented in digital electronic circuit devices, integrated circuit devices, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), on-chip device devices (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on programmable devices including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, operable to receive data and instructions from, and to transmit data and instructions to, a storage device, at least one input device, and at least one output device.

The computer program used to implement the job scheduling method of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable crown block work warning device such that the computer programs, when executed by the processor, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution apparatus, device, or apparatus. The computer readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor apparatus, device, or apparatus, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the apparatus and techniques described herein may be implemented on a device having: a display device (e.g., a touch screen) for displaying information to a user; and keys, the user may provide input to the device through a touch screen or keys. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A job scheduling method, comprising:

acquiring state information of a cluster environment, wherein the state information comprises resource information of each virtual machine in the cluster environment and associated information of current operation;

2. The method of claim 1, further comprising, prior to the obtaining the status information of the clustered environment:

receiving a job submitted by a user through a cluster, wherein the job is configured with resource requirements, and the resource requirements comprise a job identifier, a required CPU number, a required memory size, an expected use cost, a latest completion time and a required number of execution units;

and storing the jobs submitted by the users in a distributed mode in each virtual machine, wherein the cluster comprises a master virtual machine and a plurality of slave virtual machines, wherein the master virtual machine is configured with a deep reinforcement learning agent unit.

3. The method of claim 2, wherein the obtaining status information of the clustered environment comprises:

determining a current job to be executed from jobs submitted by a user, and determining the number of CPUs required by each execution unit and the memory size required by each execution unit of the current job;

taking the job identification of the current job, the number of CPUs required by each execution unit, the memory size required by each execution unit, the number of the required execution units, the expected use cost and the latest completion time as the associated information of the current job;

taking the number of CPUs configured by each slave virtual machine and the size of the configured memory in the cluster as resource information of each virtual machine;

4. The method of claim 2, wherein the screening out the target action from the action space using a specified algorithm based on the status information comprises:

determining a type of the specified algorithm, wherein the type comprises a Q-Learning algorithm or a strategy gradient-based algorithm;

traversing in the action space by adopting the specified algorithm through a deep reinforcement learning agent unit in the main virtual machine;

and taking the action selected by the specified algorithm as the target action.

5. The method of claim 3, wherein the determining the target virtual machine and assigning the current job to the target virtual machine for scheduling operations comprises:

determining a virtual machine corresponding to the target action, and taking the corresponding virtual machine as the target virtual machine;

6. The method of claim 1, wherein the obtaining a reward for the target action comprises:

acquiring a constant reward when the current job is not the last job submitted by a user, and taking the constant reward as the reward of the target action;

and when the current job is the last job submitted by the user, acquiring interaction parameters of the cluster, and calculating according to the interaction parameters to acquire rewards of the target action, wherein the interaction parameters comprise total use cost of the cluster, maximum use cost of the cluster, minimum use cost of the cluster, total average response time of the job, shortest average response time of the job and maximum average response time of the job.

7. The method of claim 6, wherein calculating the reward for obtaining the target action based on the interaction parameter comprises:

carrying out standardization processing on the total use cost of the cluster according to the maximum use cost of the cluster and the minimum use cost of the cluster, and obtaining the normalized total use cost of the cluster;

and calculating and acquiring the rewards of the target actions according to the constant rewards, the total using cost of the normalized cluster and the total average response time of the normalized job.

8. A job scheduling device, comprising:

9. A computer device, the device comprising:

one or more processors;

storage means for storing one or more programs,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.

10. A storage medium having stored thereon computer program of instructions, which when executed by a processor, performs the method of any of claims 1-7.