WO2008040563A1

WO2008040563A1 - Method, system and computer program for distributing execution of independent jobs

Info

Publication number: WO2008040563A1
Application number: PCT/EP2007/052719
Authority: WO
Inventors: Tullio Tancredi; Fabio Benedetti; Paolo Deidda; Alan John Bivens
Original assignee: International Business Machines Corporation; Compagnie Ibm France
Priority date: 2006-10-03
Filing date: 2007-03-22
Publication date: 2008-04-10

Abstract

A solution (500) is proposed for distributing execution of jobs in a scheduler. The jobs must be submitted for execution (515) individually on a set of execution servers with selection target frequencies corresponding to their performance weights (for example, provided by a workload manager). For this purpose, the execution servers are ordered (521) in a sequence; the corresponding sequence of target frequencies is then converted (527) into a cumulative form - by assigning a selection cumulative range to each execution server (with a size equal to the corresponding target frequency). A random number (from 0 to 1) is used (530-535) to select the cumulative range including it. The job is then submitted (536) on the execution server corresponding to the selected cumulative range. The same operations may also be reiterated (543-560) to obtain a list of execution servers in decreasing order of preference for executing the job.

Description

METHOD, SYSTEM AND COMPUTER PROGRAM FOR DISTRIBUTING EXECUTION

OF INDEPENDENT JOBS

Field of the Invention

The present invention relates to the information technology field. More specifically, the invention relates to the distribution of the execution of work units in a data processing system.

Background Art

Data processing systems based on multiple computers are commonplace in several environments (such as in server farms) . This architecture allows implementing systems to be very powerful and easily scaleable; moreover, the same structure ensures high reliability (since operation of the system can be recovered even in case of failure of one or more computers) .

A problem of the above-described systems is that of distributing the required activities correctly among the different computers.

For this purpose, load-balancing applications (or load-balancers) are available. The load-balancers submit the activities for their execution on the computers according to predefined algorithms, which are aimed at optimizing the overall performance of the system. A conventional approach consists of assigning the activities to the computers according to a round-robin policy; more sophisticated techniques select the computers for processing the activities according to their measured workloads.

However, the load-balancing solutions known in the art are very difficult to apply in specific environments, wherein the activities must be submitted for execution individually (each one independently of the others) without knowledge of the computers to which previous activities have been submitted. A typical example is that of a scheduling application (or scheduler) , which is commonly used to control the execution of different work units - such as batch jobs - according to a specific plan (for example, based on constraints defined by the completion of other jobs).

The scheduler may interface with a workload manager to get recommendations of ways to distribute the execution of the jobs on the available computers. For this purpose, the workload manager periodically provides information about the workload of each computer to the scheduler. In this way, the scheduler can select the computer with the lowest workload for executing each job.

In cases where the selection of the computer for each job is done with knowledge of all other selections, the scheduler can easily achieve any distribution suggested by the workload manager. However, the selection is generally performed individually for each job; therefore, there is the risk of overloading the same computer (since the effects of the execution of the other jobs are not apparent until the next collection of the workload information from the load-balancer) . Indeed, in this context each job must be executed as soon the plan allows doing so, and it is not possible to wait for a significant number of jobs ready to be executed (which should then be distributed among the computers according to the corresponding workloads) .

A further problem is caused by the fact that each job has specific requirements, which limit the computers on which the job can be executed. Therefore, the scheduler can select the computer for executing each job only among the ones satisfying the corresponding requirements. As a result, the distribution of the frequency of execution of the jobs on the computers in any case cannot reflect the corresponding workloads.

All of the above prevents the correct distribution of the execution of the jobs among the different computers. This has a detrimental effect on the overall performance of the system.

Summary of the Invention

In its general terms, the present invention suggests the use of a probabilistic method for distributing the execution of the work units.

Particularly, the present invention provides a solution as set out in the independent claims. Advantageous embodiments of the invention are described in the dependent claims.

More specifically, an aspect of the invention provides a method for distributing execution of work units on a plurality of data processing entities; each entity is associated with a corresponding target frequency (for the distribution of the work units on the entity) . For each work unit the method includes the following steps. At first, a sequence of cumulative ranges associated with the entities is created; the cumulative range associated with each entity has a size corresponding to the target frequency of the entity. The method continues by generating a substantially random number. One of the cumulative ranges of the sequence is selected according to the random number. The entity associated with the selected cumulative range of the sequence is then elected for executing the work unit.

In a preferred embodiment of the invention, the cumulative range of each entity is obtained by summing its target frequency with the preceding ones.

As a further enhancement, the same operations are reiterated on the remaining entities for their elections as secondary choices for executing the work unit. For this purpose, it is preferable to reduce the sequence of cumulative ranges by shifting forwards the cumulative ranges preceding or shifting backwards the cumulative ranges following the selected one (and updating them accordingly) .

In an advantageous implementation, the cumulative ranges are shifted forwards or backwards when the selected cumulative range is in the first half or in the second half, respectively, of the sequence.

During each iteration of the process (following the first one) , the random number - from 0 to 1 - is corrected according to a global range of the cumulative ranges.

In an embodiment of the invention, each random number is uniformly distributed.

Preferably, the target frequencies are calculated by normalizing corresponding performance weights of the entities.

For example, the proposed solution finds application in a scheduler .

Another aspect of the invention proposes a computer program for performing the method.

A further aspect of the invention proposes a corresponding system.

Reference to the drawings

The invention itself, as well as further features and the advantages thereof, will be best understood with reference to the following detailed description, given purely by way of a non-restrictive indication, to be read in conjunction with the accompanying drawings, in which:

Figure 1 is a schematic block diagram of a data processing system in which the solution according to an embodiment of the invention is applicable;

Figure 2 is a collaboration diagram representing the roles of different software modules implementing the solution according to an embodiment of the invention; Figures 3a-3g illustrate an exemplary application of the solution according to an embodiment of the invention;

Figure 4 provides experimental results of an exemplary application of the solution according to an embodiment of the invention; and

Figures 5a-5b show a diagram describing the flow of activities relating to an implementation of the solution according to an embodiment of the invention.

Detailed Description

With reference in particular to Figure 1, a data processing system 100 with distributed architecture is illustrated. The system 100 includes a central scheduling server 105, which is used to automate, monitor and control the execution of work units in the system 100. Typically, the work units consist of non-interactive tasks (for example, payroll programs, cost analysis applications, and the like) , which are to be executed on a set of execution servers 110. For this purpose, the scheduling server 105 and the execution servers 110 communicate through a network 115 - such as a Local Area Network (LAN) .

More specifically, the scheduling server 105 is formed by several units that are connected in parallel to a system bus 120. In detail, multiple microprocessors (μP) 125 control operation of the scheduling server 105; a RAM 130 is directly used as a working memory by the microprocessors 125, and a ROM 135 stores basic code for a bootstrap of the scheduling server 105. Several peripheral units are clustered around a local bus 140 (by means of respective interfaces) . Particularly, a mass storage consists of one or more hard-disks 145 and drives 150 for reading CD-ROMs 155. Moreover, the scheduling server 105 includes input units 160 (for example, a keyboard and a mouse), and output units 165 (for example, a monitor and a printer) . An adapter 170 is used to connect the scheduling server 105 to the network 115. A bridge unit 175 interfaces the system bus 120 with the local bus 140. Each microprocessor 125 and the bridge unit 175 can operate as master agents requesting an access to the system bus 120 for transmitting information. An arbiter 180 manages the granting of the access with mutual exclusion to the system bus 120.

Moving to Figure 2, the main software modules that run on the scheduling server are denoted as a whole with the reference 200. The information (programs and data) is typically stored on the hard-disk and loaded (at least partially) into the working memory of the scheduling server when the programs are running. The programs are initially installed onto the hard disk, for example, from CD-ROM. Particularly, the figure describes the static structure of the system (by means of the corresponding modules) and its dynamic behavior (by means of a series of exchanged messages, each one representing a corresponding action denoted with sequence numbers preceded by the symbol "A") .

More in detail, the scheduling server runs a scheduler 205 for example, consisting of the "IBM Tivoli Workload Scheduler (TWS)" by IBM Corporation. The scheduler 205 includes a controller 210 (such as the "Composer" program in the case of the "TWS") , which is used to maintain a workload database 215 (action "Al .Maintain") .

The workload database 215 contains the definition of the whole scheduling environment. Particularly, the workload database 215 stores a representation of the topology of the system (i.e., the execution servers with their connections) and of the hardware/software resources that are available for executing the jobs. The workload database 215 also includes a descriptor of each job, which defines rules controlling its execution (written in a suitable control language, for example, XML-based). More specifically, the job descriptor specifies the programs to be invoked, their arguments and environmental variables. The execution of the job is typically conditioned by a set of dependencies (which must be satisfied before the job can be submitted); exemplary dependencies are time constraints (such as its run-cycle - like every day, week or month) , sequence constraints (such as the successful completion of other jobs), or enabling constraints (such as the entering of a response to a prompt by an operator) . The job descriptor also specifies the resources that are required by the job; those resources can be seen as further dependencies, which condition the execution of the job to their availability. At the end, the job descriptor includes statistics information relating to the job; for example, the statistics information provides a log of an actual duration of previous executions of the job, from which an estimated duration for its next executions may be inferred.

A planner 220 (such as the "Master Domain Manager" of the "TWS") creates a workload plan; the plan consists of a batch of jobs - together with their dependencies - scheduled for execution on a specific production period (typically, one day) . A new plan is generally created automatically before every production period. For this purpose, the planner 220 processes the information available in the workload database 215 so as to select the jobs to be run and to arrange them in the desired sequence (according to their dependencies and expected duration) .

The planner 220 creates the plan by adding the jobs to be executed (for the next production period) and by removing the preexisting jobs (of the previous production period) that have been completed; in addition, the jobs of the previous production period that did not complete successfully or that are still running or waiting to be run can be maintained in the plan (for their execution during the next production period) . The plan so obtained is then stored into a corresponding control file 225 - such as the "Symphony" of the "TWS" (action "A2. Create") .

A handler 230 (such as the "Batchman" process of the "TWS") starts the plan at the beginning of every production period (action "A3. Start"). The handler 230 interfaces with a workload manager 235 - for example, consisting of the "Enterprise Workload Manager (EWLM)" by IBM Corporation; the "EWLM" allows defining business-oriented performance goals for an entire domain of execution servers, so as to provide an end-to-end view of actual performance relating to those goals. The workload manager 235 provides information about the workload of each execution server to the scheduler 205. Typically, this workload information consists of a numerical weight for each execution server. The weight measures the performance of the execution server - for example, ranging from 0 (when the execution server is completely busy) to 100 (when the execution server is substantially idle) . This performance weight is commonly obtained by combining key metrics of the execution server (such as its processing power usage, memory usage, network activity, amount of input/output operations, job execution details, and the like) . The handler 230 saves the performance weights of all the execution servers into a corresponding table 240 (action "A4.Weights") .

The handler 230 submits the jobs of the plan for execution as soon as possible - according to their dependencies. As described in detail in the following, for each job to be submitted an analyzer 245 selects one or more of the execution servers that are eligible to execute the job (i.e., satisfying its resource dependencies) ; preferably, the analyzer 245 creates a list that orders the (eligible) execution servers (or a part of them) in decreasing order of preference. The creation of this preference list for each job is described in detail further in the text. The preference list so obtained is then provided to the handler 230 (action "A5.List").

The handler 230 selects the first execution server of the preference list that is available for submitting the execution of the job (Action "A6. Submit") . The actual execution of the job is managed by a corresponding module 250 (such as the "Jobman" process of the "TWS") . The executor 250 directly launches and tracks the job, by interfacing with an agent (not shown in the Figure) running on the selected execution server. The executor 250 returns feedback information about the execution of the job to the handler 230 (for example, whether the job has been completed successfully, its actual duration, and the like) ; the handler 230 enters this feedback information into the control file 225, so as to have a real-time picture of the current state of all the jobs of the plan (action "A7. Feedback") .

At the end of the production period, the planner 220 extracts the feedback information of the completed jobs from the control file 225. The planner 220 revises the statistics information relating to the completed jobs accordingly, and updates it in the workload database 215 (action "A8. Statistics") .

An shown in Figure 3a, a very simplified system that may be used to explain the solution according to an embodiment of the present invention includes 7 execution servers (denoted with Si, with i=a...g) ; for each execution server Si the corresponding performance weight (denoted with Wi) is provided. As can be seen, the performance weights Wi are not normalized (i.e., their sum is different from 100); this is due to the fact that the execution servers are generally only a part of the computers controlled by the workload manager and/or to an intrinsic limitation of the workload manager.

Let us assume now that - for a generic job to be submitted for execution - the corresponding eligible execution servers (satisfying its resource dependencies) are Sl, S2, S4, S5 and S7. A sequence of the corresponding performance weights Wl, W2, W4, W5 and W7 - obtained by removing the information relating to the other execution servers S3 and S6 - is illustrated in Figure 3b. Naturally, when one ore more execution servers Si are discarded, the performance weights Wi can never be normalized (even if they were so originally) .

Moving now to Figure 3c, for each (eligible) execution server Sj (with j=l,2,4,5,7) a selection target frequency Fj is calculated by normalizing its performance weight Wj : FJ = ∑ Wh h=Vj

In other words, the target frequencies Fj indicate how the jobs should be distributed for execution on the execution servers Sj in the ideal situation (so as to distribute the workload according to their performance) .

As shown in Figure 3d, the sequence of target frequencies Fj so obtained is then converted into a cumulative form - defined by a sequence of disjoint ranges of values being adjacent to one another. Particularly, for each execution server Sj a selection cumulative range Cj is calculated. The cumulative range Cj has an upper limit HCj equal to the sum of its target frequency Fj plus the target frequencies of the execution servers preceding it in the sequence: HCj = ∑ Fh

VA≤_/

In practice, this result is achieved more efficiently by applying the following recursive formula:

HCj = Fj for j=0

HCj = Fj + HCj-I for j>0

Moreover, the cumulative range Cj has a lower limit LCj higher than the upper limit HCj-I of the preceding execution server Sj-I in the sequence (with the lower limit LCO of the first execution server SO that is LCO=O) . As a result, the cumulative range Cj of each execution server Sj has a size (from the lower limit LCj to the upper limit HCj) that is equal to its target frequency Fj . Moreover, the cumulative ranges Cj will span a global range GR from 0 to 1.

In order to select one of the execution servers Sj for executing the job, a uniformly distributed random number RN is generated; the random number RN spans the same global range GR of the cumulative ranges Cj (i.e., from 0 to 1) . This random number RN selects one of the cumulative ranges Cj including it (differentiated with a prime notation, i.e., Cj'); for example, this result may be achieved by applying a binary search on the sequence of cumulative ranges Cj . The execution server corresponding to the selected cumulative range Cj ' (differentiated with the same prime notation, i.e., Sj') is then selected for executing the job.

In this way, the probability of selecting each cumulative range Cj is equal to HCj-LCj=Fj; therefore, the probability of executing the jobs on each execution server Sj is equal to its target frequency Fj .

As a result, it is possible to obtain a distribution of the frequency of execution of the jobs on the execution servers Si that statistically reflects the corresponding performance weights Wi.

This result is achieved in spite of the fact that the jobs are submitted for execution individually (each one independently of the others), without knowledge of the execution servers to which the previous jobs have been submitted.

Moreover, the proposed solution allows preventing any risk of overloading single execution servers Si.

The devised technique is also less sensitive to the fact that the selection of the execution servers Si on which each job must be submitted is limited to a part of them only (i.e., to the corresponding eligible execution servers Sj).

In this way, it is possible to obtain a correct distribution of the execution of the jobs among the different execution servers. This has a beneficial effect on the overall performance of the system.

The next execution servers - wherein the job may be executed in decreasing order of preference - are selected by reiterating the same operations described above on a reduced sequence of execution servers - obtained by removing the information relating to the selected execution server Sj ' .

However, the same result is achieved more efficiently by applying a reduction algorithm that avoids recreating the whole sequence of cumulative ranges Cj from scratch. For this purpose, all the cumulative ranges Cj preceding the selected cumulative range Cj' in the sequence are shifted forwards, with the first position in the sequence that is invalidated - so as to overwrite the selected cumulative range Cj ' that then disappears from the sequence. For example, as shown in Figure 3e, when the random number RN=O.122 selects the cumulative range Cj ' =C2 (from LC2>0.090 to HC2≤0.294), the preceding cumulative range Cl is shifted forwards.

Moving to Figure 3f, the remaining cumulative ranges Cj (with j=l,4,5,7) are then corrected to compensate the removal of the selected cumulative range Cj'. For this purpose, each shifted cumulative range Cj is increased by the target frequency of the selected execution server Sj' (differentiated by the same prime notation, i.e., Fj'):

Cj=Cj +Fj ' for j<j ' .

In this way, the cumulative range Cj of each execution server Sj has again a size (from the lower limit LCj to the upper limit HCj) that is equal to its target frequency Fj. However, the cumulative ranges Cj will span a different global range (differentiated by adding an apex ^A, i.e., GR^A); particularly, the global range GR^A is now from the lower limit of the first cumulative range Cj in the sequence - denoted with LCfirst - to 1.

In order to select the next execution server Sj for executing the job, a further random number (again from 0 to 1) is generated - differentiated by adding the same apex ^A, i.e., RN^A; the random number RN^A is then updated so as to span the global range GR^A of the cumulative ranges Cj (now from LCfirst to 1) :

RN^Λ=LCfirst+RN^Λ *GE^Λ.

As above, this random number RN^A selects one of the cumulative ranges Cj including it; the execution server corresponding to the selected cumulative range is then selected as a second choice for executing the job.

Alternatively, it is possible to shift backwards all the cumulative ranges Cj following the selected cumulative range Cj' in the sequence, with the last position in the sequence that is invalidated - so as to overwrite the selected cumulative range Cj' that then disappears from the sequence. For example, as shown in Figure 3g, when the random number RN=O.325 selects the cumulative range Cj'=C4 (from LC4>0.294 to HC4<0.371), the following cumulative ranges C5 and C7 are shifted backwards. The remaining cumulative ranges Cj (with j=l,2,5,7) are then corrected (to compensate the removal of the selected cumulative range Cj') decreasing each shifted cumulative range Cj by the target frequency Fj ' of the selected execution server Sj ' :

Cj=Cj-Fj' for j>j'

(so that the cumulative range Cj of each execution server Sj has again a size that is equal to its target frequency Fj) . The global range GR^A of the cumulative ranges Cj is now from 0 to the upper limit HCj of the last cumulative range Cj in the sequence - denoted with HClast. The same random number RN^A, updated as above so as to span this global range GR^A (i.e., RN^Λ=LCfirst+RN^Λ*GR^Λ=RN^Λ*GR^Λ) , is then used to select the cumulative range including it, and then the corresponding execution server (as a second choice for executing the job) .

In both cases (i.e., when the cumulative ranges Cj are shifted either forwards or backwards) , the desired result is achieved without having to create any new sequence. This reduces the overhead of the memory of the scheduling server, with an increase of its performance.

Preferably, the two reduction algorithms described above are combined together. Particularly, the cumulative ranges Cj are shifted forwards when the selected cumulative range Cj' is in a first half of the sequence, whereas they are shifted backwards when the selected cumulative range Cj' is in a second half of the sequence (with the choice that is indifferent when the selected cumulative range Cj' is exactly in the middle of the sequence) . In this case, the global range GR^A will be from the first lower limit LCfirst to last upper limit HClast. As a result, it is possible to minimize the shifting operations and the corresponding corrections of the shifted cumulative ranges Cj. Considering now Figure 4, the proposed solution was tested by simulating the submission of the same job for execution several thousands of times. The number of times each execution server Sj was selected (as the first choice for executing the job) was recorded. As can be seen, the obtained values show a close correlation with the corresponding target frequencies Fj . This confirms that the jobs are actually distributed for execution among the execution servers Sj according to the corresponding performance weights Wj .

In practical implementations, the random numbers (used to select the cumulative ranges) are obtained from a pseudo-random generator. For example, it is possible to use the object "Random" of the Java language, which object exposes the method "nextDouble ( ) " returning all 2⁵³ possible float values of the form m*2^~53, where m is a positive integer less than 2⁵³, with approximately equal probability. In this case, if a single object "Random" is used for all the jobs the values of the random numbers generated for a particular choice in the preference lists for different jobs may exhibit a lack of correlation to the given weights. This is because the values of the random numbers already generated are no longer available for the other jobs (making the random number generator no longer uniform) . This might cause the creation of preference lists that do not reflect the distribution of the desired target frequencies accurately. Care should then be taken to make sure a new random number generator is available to use for each choice iteration to ensure the uniformity of the random numbers generated.

With reference now to Figures 5a-5b, the logic flow of an exemplary process that can be implemented in the above-described system (to control the execution of the jobs of a specific plan) is represented with a method 500.

The method begins at the black start circle 503 in the swim-lane of the load-balancer. Whenever a predefined time-out expires at block 506 (for example, every 20-4Os), the workload manager at block 509 transmits the performance weights of all the execution servers to the scheduler. In response thereto, the scheduler at block 512 stores these performance weights in the corresponding table (replacing their previous values) .

The desired plan is started at the beginning of every production period; every job of the plan is submitted for execution at block 515 as soon as possible (according to its dependencies). The method passes to block 518, wherein the eligible execution servers for the job are selected (among all the execution servers, by discarding the ones that do not satisfy its resource dependencies). Continuing to block 521, the execution servers are arranged in a sequence (for example, according to their names) . The target frequency of each execution server is then calculated at block 524 (by normalizing its performance weight) . The flow of activity passes to block 527, wherein the target frequencies are converted into the corresponding cumulative ranges.

The method then enters a loop for building the preference list (of the execution servers) for the job. Particularly, the loop begins at block 530 with the generation of a new random number (from 0 to 1) . The random number is then updated at block 533 so as to span the current global range of the cumulative ranges (when it is necessary, i.e., starting from the second iteration of the loop - for the selection of the execution servers following the first choice) . Proceeding to block 535, the cumulative range including the random number is identified. The execution server corresponding to this cumulative range is then selected and added to the preference list (starting from the beginning) at block 536.

The method branches at block 539 according to the configuration of the scheduler. If a single execution server must be selected, the flow of activity descends into block 542 (described in the following) . Conversely, a test is made at block 543 to verify whether the desired preference list is complete (for example, when all the execution servers have been added or a predefined maximum length has been reached) . If so, the method likewise descends into block 542. On the contrary, the flow of activity branches again at block 545 according to the position of the previously selected cumulative range in the corresponding sequence. Particularly, if the position falls in the first half of the sequence the blocks 548-551 are executed, whereas if the position falls in the second half of the sequence the blocks 554-557 are executed; in both cases, the method merges at block 560.

Considering now block 548 (first half), the cumulative ranges preceding the selected one in the sequence are shifted forwards. The shifted cumulative ranges are then corrected at block 551 by adding the target frequency of the selected execution server (so as to compensate the removal of the selected cumulative range) . In a dual manner, at block 554 (second half) the cumulative ranges following the selected one in the sequence are shifted backwards. The shifted cumulative ranges are then corrected at block 557 by subtracting the target frequency of the selected execution server. In both cases, the global range is updated accordingly at block 560 - so as to span from the first lower limit to the last upper limit of the cumulative ranges. The method then returns to block 530 to repeat the operations described above (for the selection of next execution servers representing secondary choices for executing the job) .

With reference now to block 542, as soon as the preference list has been completed - either because a single execution server has been selected (block 539) or all the desired execution servers have been selected (block 543) - the job is submitted for execution. For this purpose, the first execution server in the preference list is extracted at block 566. A test is made at block 569 to verify whether the (current) execution server is available for executing the job. If so, the job is launched on this execution server at block 572. The flow of activity then passes to block 575 (described in the following) . Conversely, the method verifies at block 578 the residual content of the preference list. If the preference list is not empty, a further execution server is extracted therefrom at block 581. The method then returns to block 569 to reiterate the same operations. Conversely, when the job cannot be executed on any one of the execution servers of the preference list, an error condition is entered at block 584, and the method then descends into block 575. In any case, the feedback information about the execution of the job is saved at block 575 into the control file. The flow of activity then returns to block 515 to process a next job to be submitted for execution.

At the end of the production period (block 590), the statistics information relating to the completed jobs is updated in the workload database at block 593 according to the corresponding feedback information. The method then ends at the concentric white/black stop circles 596.

Naturally, in order to satisfy local and specific requirements, a person skilled in the art may apply to the solution described above many modifications and alterations. Particularly, although the present invention has been described with a certain degree of particularity with reference to preferred embodiment (s) thereof, it should be understood that various omissions, substitutions and changes in the form and details as well as other embodiments are possible; moreover, it is expressly intended that specific elements and/or method steps described in connection with any disclosed embodiment of the invention may be incorporated in any other embodiment as a general matter of design choice.

For example, similar considerations apply if the system has a different architecture or includes equivalent units. Moreover, each computer may have another structure or may include similar elements (such as cache memories temporarily storing the programs or parts thereof to reduce the accesses to the mass memory during execution) ; in any case, it is possible to replace the computer with any code execution entity (such as a PDA, a mobile phone, and the like) . Likewise, it should be readily apparent that the term job as used herein may include other work units (such as interactive tasks) , or more generally whatever activities to be distributed for their execution on multiple computers.

Alternatively, the execution servers may be ordered in a different way, the cumulative ranges may be calculated with other formulas (with their sizes that may also not depend linearly on the corresponding target frequencies) , or the random numbers may be generated with equivalent functions.

The proposed algorithm for selecting the first execution server is not to be interpreted in a limitative manner. For example, nothing prevents assigning the cumulative ranges to the execution servers in a different way (for example, calculating them starting from the end of the sequence) .

Similar considerations apply to the algorithm for selecting the next execution servers.

In any case, nothing prevents recreating the sequence of the cumulative ranges at each iteration.

As indicated above, the possibility of implementing the reduction algorithm (for removing the selected cumulative range) by shifting the other cumulative ranges always forwards or always backwards is not excluded.

In this case as well, the random numbers may be generated with equivalent functions (for example, with the random numbers that span the current global range of the cumulative ranges directly) .

In any case, any other generator of (substantially) random number may be used.

Alternatively, the workload manager (or any equivalent component) may provide different performance information (such as already normalized) , or the selection may always be performed on all the execution servers for every job.

The proposed technique has equal applicability to equivalent schedulers. In any case, the same solution may also be implemented in any other data processing system (such as in a server farm) .

Similar considerations apply if the program (which may be used to implement each embodiment of the invention) is structured in a different way, or if additional modules or functions are provided; likewise, the memory structures may be of other types, or may be replaced with equivalent entities

(not necessarily consisting of physical storage media) . Moreover, the proposed solution lends itself to be implemented with an equivalent method (by using similar steps, removing some steps being not essential, or adding further optional steps - even in a different order) . In any case, the program may take any form suitable to be used by or in connection with any data processing system, such as external or resident software, firmware, or microcode (either in object code or in source code) . Moreover, it is possible to provide the program on any computer-usable medium; the medium can be any element suitable to contain, store, communicate, propagate, or transfer the program. For example, the medium may be of the electronic, magnetic, optical, electromagnetic, infrared, or semiconductor type; examples of such medium are fixed disks

(where the program can be pre-loaded) , removable disks, tapes, cards, wires, fibers, wireless connections, networks, broadcast waves, and the like.

In any case, the solution according to the present invention lends itself to be implemented with a hardware structure (for example, integrated in a chip of semiconductor material), or with a combination of software and hardware.

Claims

1. A method (500) for distributing execution of work units on a plurality of data processing entities (110) each one associated with a corresponding target frequency for the distribution of the work units on the entity, for each work unit the method including the steps of: creating (521-527) a sequence of cumulative ranges associated with the entities, the cumulative range associated with each entity having a size corresponding to the target frequency of the entity, generating (530-533) a substantially random number, selecting (535) one of the cumulative ranges of the sequence according to the random number, and electing (536) the entity associated with the selected cumulative range of the sequence for executing the work unit.

2. The method (500) according to claim 1, wherein the step of creating a sequence of cumulative ranges (521-527) includes setting each cumulative range between an upper limit, equal to the sum of the target frequency of the associated entity and the target frequency of each entity associated with a preceding cumulative range in the sequence, and a lower limit, equal to zero for a first cumulative range of the sequence or to the upper limit of the preceding cumulative range otherwise, wherein the step of generating a substantially random number (530-533) includes generating the random number ranging from 0 to 1, and wherein the step of selecting (535) one of the cumulative ranges includes selecting the cumulative range including the random number.

3. The method (500) according to claim 1 or 2, further including at least one iteration of the steps of: creating (545-560) a further sequence of cumulative ranges associated with the entities being not selected, generating (530-533) a further substantially random number, selecting (535) one of the cumulative ranges of the further sequence according to the further random number, and electing (536) the entity associated with the selected cumulative range of the further sequence as a secondary choice for executing the work unit after each previously selected entity.

4. The method (500) according to claim 3, wherein the step of creating a further sequence of cumulative ranges (545-560) includes : shifting forwards (548) the cumulative ranges preceding or shifting backwards (554) the cumulative ranges following the previously selected cumulative range in the corresponding sequence, and increasing (551) each cumulative range being shifted forwards or decreasing (557) each cumulative range being shifted backwards by the target frequency of the entity associated with the previously selected cumulative range.

5. The method (500) according to claim 4, wherein the step of creating a further sequence of cumulative ranges (545-560) further includes: determining (545) an inclusion of the previously selected cumulative range in a first half or in a second half of the corresponding sequence, the cumulative ranges being shifted forwards (548) in response to the inclusion in the first half and being shifted backwards (554) in response to the inclusion in the second half.

6. The method (500) according to claim 4 or 5, wherein the step of generating (530-533) a further substantially random number includes: generating (530) the further random number ranging from 0 to 1, and correcting (533) the further random number according to a global range of the cumulative ranges of the corresponding sequence .

7. The method according to any claim from 1 to 6, wherein each random number is uniformly distributed.

8. The method according to any claim from 1 to 7, wherein a performance weight is associated with each entity, for each work unit the method further including the steps of: reducing (518) the plurality of entities by discarding each invalid entity not satisfying predefined requirements of the work unit, and calculating (524) the target frequency of each valid entity by dividing the performance weight of the valid entity by the sum of the performance weights of all the valid entities .

9. The method (500) according to any claim from 1 to 8, further including the step of: scheduling (515) the work units for execution in succession according to a predefined plan.

10. A computer program (200) for performing the method (500) of any claim from 1 to 9 when the computer program is executed on a data processing system (105) .

11. A system (105) including means (200) for performing the steps of the method (500) according to any claim from 1 to 9.