CN112631693B - Dynamic expansion method for computing resources in runtime - Google Patents

Dynamic expansion method for computing resources in runtime Download PDF

Info

Publication number
CN112631693B
CN112631693B CN201910903881.0A CN201910903881A CN112631693B CN 112631693 B CN112631693 B CN 112631693B CN 201910903881 A CN201910903881 A CN 201910903881A CN 112631693 B CN112631693 B CN 112631693B
Authority
CN
China
Prior art keywords
node
new
original
computing resource
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910903881.0A
Other languages
Chinese (zh)
Other versions
CN112631693A (en
Inventor
何王全
董恩铭
于康
宋长明
方燕飞
漆锋滨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201910903881.0A priority Critical patent/CN112631693B/en
Publication of CN112631693A publication Critical patent/CN112631693A/en
Application granted granted Critical
Publication of CN112631693B publication Critical patent/CN112631693B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4482Procedural
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration

Abstract

The invention discloses a dynamic expansion method of computing resources during running, which sends a resource expansion upgrading command to an original running topic; after receiving the upgrade signal, the original node cleans up the communication environment; waiting for newly-entered computing resources to carry out secondary communication environment reconstruction communication; the new node enters a dynamic task partition area by a job management starting program; after judging that the new node is updated by the resources, the new node carries out communication environment cleaning; performing secondary communication environment reconstruction on the original node and the new node; the new node automatically partitions according to the dynamic task partition rule, a partition master node is selected, and the partition master node applies for a task to a global master node; after receiving the task application of the master nodes of the original partition and the new partition, the global master node uniformly distributes tasks to the original computing resource nodes and the new computing resource nodes, and the task continues to run normally. The invention solves the problem of communication environment reconstruction after resource adjustment, allocates idle computing resources to the task on the basis of not interrupting the operated task, and dynamically allocates uncompleted tasks, thereby achieving the maximum utilization of the computing resources.

Description

Dynamic expansion method for computing resources in runtime
Technical Field
The invention belongs to the field of runtime systems, and particularly relates to a runtime computing resource dynamic expansion method.
Background
The high-performance computing system provides possibility for resolving massively parallel applications in a plurality of fields, generally, computing resources of the high-performance computing system are shared by a plurality of application projects, the operation scale of each application project is different, the operation time is different, and situations that some application projects are finished to operate and the computing resources are idle, and some application projects need to operate for a long time exist. The technical problem to be solved is to utilize idle computing resources to accelerate the running application.
The task parallel application task is to distribute tasks in a task pool to a plurality of computing resources to be completed in parallel, if the tasks have correlation, the mapping relation of the tasks can be damaged by dynamically adding new computing resources, and the normal operation of a program is influenced; if the tasks have no correlation, the tasks are insensitive to the scale and the shape of the computing resources, the existing tasks can be dynamically divided, and the completion of the application can be accelerated by using new idle resources.
Many large-scale task parallel applications require a large amount of computing resources and long computing time, and if new computing resources are available, the schemes often adopted are to wait for breakpoint file updating, interrupt a task in operation, and resubmit the task after integrating resources, which not only brings a certain burden to users, but also causes idle waste of new computing resources within a certain time (waiting for breakpoint file updating).
Disclosure of Invention
The invention aims to provide a method for dynamically expanding computing resources during running, which aims to solve the problem of communication environment reconstruction after resource adjustment, allocate idle computing resources to a running task on the basis of not interrupting the task, dynamically allocate incomplete tasks and achieve the maximum utilization of the computing resources on the basis of ensuring the correctness and completeness of the running result of the task.
In order to achieve the purpose, the invention adopts the technical scheme that: a method for dynamically expanding computing resources at runtime, comprising the steps of,
s1, sending a resource expansion upgrading command to an original subject which normally runs;
s2, after all the original computing resource nodes and the new computing resource nodes of the original topic receive the resource expansion and upgrade command simultaneously, the following steps are respectively carried out:
a. after all original computing resource nodes of the original topic receive the resource extension upgrading command, the following steps are carried out,
a1, cleaning a communication environment and releasing related environment variables;
a2, initializing a secondary communication environment together with waiting for new incoming computing resources, and reconstructing the communication environment;
a3, after the communication environment is initialized, the original computing resource node updates the dynamic task division information, and a new computing resource node which comes in is added into a dynamic task division area according to the dynamic task division rule;
b. the new computing resource node receives the resource expansion upgrading command, starts the task by the job management, and respectively performs the following steps,
b1, starting a program by a new computing resource node, and entering a dynamic task partition area;
b2, cleaning the communication environment by the new computing resource node;
b3, initializing and reconstructing a secondary communication environment together with the original computing resource node;
b4, partitioning automatically according to a dynamic task partitioning rule, automatically partitioning a new computing resource node to become a new part of process area of dynamic task partitioning, selecting a new computing resource node as a new partition master node, and applying for a task from the partition master node to a global master node in the original computing resource node;
and S3, after the global master node of the original computing resource receives the application tasks of the original partition master node and the new partition master node, uniformly distributing the tasks to the original computing resource node and the new computing resource node, and continuing normal operation of the task.
The technical scheme of further improvement in the technical scheme is as follows:
1. in the above solution, the dynamic task partition can be used in a message library of a cross-language message communication standard
2. In the foregoing solution, the message library of the cross-language message communication standard is MPICH and Open MPI.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
1) The method for dynamically expanding the computing resources during the operation solves the problem of communication environment reconstruction after resource adjustment, allocates the idle computing resources to the task on the basis of not interrupting the operated task, dynamically allocates uncompleted tasks, and achieves the maximum utilization of the computing resources on the basis of ensuring the correctness and completeness of the task operation result.
2) The dynamic expansion method of the computing resources in the running process is transparent to the user, the user only needs to key in a resource upgrading command to tell the running operation of the system in the running process that the dynamic expansion of the resources can be carried out, and other processing is automatically completed by the system in the running process.
3) The dynamic expansion method for the computing resources during operation effectively utilizes new idle computing resources and accelerates the calculation efficiency of the problems during operation.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The invention is further described below with reference to the following examples:
example (b): as shown in fig. 1, a method for dynamically expanding a runtime computing resource includes the steps of,
s1, sending a resource expansion upgrading command to an original topic which is running normally;
s2, after all the original computing resource nodes and the new computing resource nodes of the original topic receive the resource expansion and upgrade command simultaneously, the following steps are respectively carried out:
a. after all original computing resource nodes of the original topic receive the resource extension upgrading command, the following steps are carried out,
a1, cleaning a communication environment and releasing related environment variables;
a2, initializing a secondary communication environment together with waiting for new incoming computing resources, and reconstructing the communication environment;
a3, after the communication environment is initialized, the original computing resource node updates the dynamic task division information, and a new computing resource node which comes in is added into a dynamic task division area according to the dynamic task division rule;
b. the new computing resource node receives the resource expansion upgrading command, starts the task by the job management, and respectively performs the following steps,
b1, starting a program by a new computing resource node, and entering a dynamic task partition area;
b2, cleaning a communication environment by the new computing resource node;
b3, initializing and reconstructing a secondary communication environment together with the original computing resource node;
b4, partitioning automatically according to a dynamic task partitioning rule, automatically partitioning a new computing resource node to become a new part of process area of dynamic task partitioning, selecting a new computing resource node as a new partition master node, and applying for a task from the partition master node to a global master node in the original computing resource node;
and S3, after the global master node of the original computing resource receives the application tasks of the original partition master node and the new partition master node, uniformly distributing the tasks to the original computing resource node and the new computing resource node, and continuing normal operation of the task.
The dynamic task partitioning can be used in a message library of cross-language messaging standards.
The message libraries of the cross-language message communication standard are MPICH and Open MPI.
The examples are further explained below:
in the invention, the definition of dynamic expansion of computing resources is as follows: for the task of task parallel application without correlation among tasks, under the condition of not changing the running state of the original task, the idle computing resources are expanded into the running task, incomplete tasks are dynamically distributed to all the computing resources after the expansion is completed, and new and old resources complete all the remaining tasks together, so that the purpose of reasonably utilizing the computing resources is achieved. The dynamic expansion method of the computing resources during the operation of the invention requires the application of task parallel classes, and the tasks have no correlation, so that the scale and the shape of the computing resources are not required at any time during the operation, and the computing resources can be increased during the operation to improve the resolving efficiency.
The dynamic expansion and upgrade process of the computing resources is as follows, and under the condition of idle computing resources, a user only needs to key in a resource expansion and upgrade command signal to the original topic which is running. The resource expansion upgrading command is realized by job management and mainly comprises two parts of contents: (1) The new computing resource starts to run the object code which is the same as the existing computing resource; and (2) sending signals to the original computing resource and the new computing resource.
After all original computing resource nodes of the original topic receive the resource expansion upgrading command signal, the following operations are carried out: (1) Cleaning up a communication environment and releasing related environment variables; (2) Waiting for the new computing resources to carry out secondary communication environment initialization together, and reconstructing a communication environment; (3) After the communication environment is initialized, the original computing resource node updates the dynamic task division information, and a new computing resource node is added into the dynamic task division area according to the dynamic task division rule.
And the new computing resource node starts the task by job management and enters a dynamic task partition area. After the environment variable judgment point is a new computing resource node, cleaning a communication environment, then carrying out communication environment reconstruction together with the original computing resource node, after the reconstruction is finished, enabling the original computing resource node and the new computing resource node to carry out normal communication, automatically partitioning the new computing resource node into a new area for dynamic task partitioning according to a dynamic task partitioning rule, selecting a new computing resource node (generally, a first node of the partition) as a new partition master node, and applying for a task from the partition master node to a global master node in the original computing resource node; after receiving the task applications of the original partition master node and the new partition master node, the global master node of the original computing resource evenly distributes tasks to the original computing resource node and the new computing resource node, and the task continues to run normally.
The operation steps performed by the original computing resource node after receiving the resource extension and upgrade command signal and the operation steps performed by the new computing resource node after receiving the resource extension and upgrade command signal are performed simultaneously and not sequentially.
In the invention, the communication environments of the original computing resource node and the new computing resource node need to be reconstructed, so that the communication environments of the original computing resource node and the new computing resource node are consistent. After receiving an upgrade command initiated by a user, the original computing resource node releases the established communication domain and some related global information, and then waits for a new computing resource node. After a user initiates an upgrade command, all new computing resource nodes can run the subject, when the new computing resource nodes enter a dynamic task partitioning function, the function judges the points as the new computing resource nodes, and then the points release the established communication domain and some related global information as the original computing resource nodes. Then, the new computing resource node and the original computing resource node carry out reconstruction of a communication domain and update of global information together; after reconstruction is completed, partitioning is automatically performed according to dynamic task partition rules, the partition master node applies for a task to the global master node, the global master node evenly distributes the task to the new computing resource node and the original computing resource node after receiving the application task, and the original task continues to run normally. Therefore, the dynamic expansion technology of the computing resources can effectively utilize new available resources under the condition of not interrupting the original topic, and the resolving efficiency of the application topic is greatly improved.
The invention can be realized based on dynamic task division in Message library (MPICH, open MPI) of language Message communication standard (Message Passing Interface, MPI).
When the method for dynamically expanding the computing resources in the running process is adopted, the problem of whether idle computing resources can be utilized to accelerate the running application is solved. The dynamic expansion technology of the computing resources is transparent to the user, the user only needs to key in a resource upgrading command to tell the running operation of the running system that the dynamic expansion of the resources can be carried out, and other processing is automatically completed by the running system; and new idle computing resources are effectively utilized, and the solving efficiency of the running topic is accelerated.
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (3)

1. A method for dynamically extending runtime computing resources, comprising: comprises the following steps of (a) carrying out,
s1, sending a resource expansion upgrading command to an original subject which normally runs;
s2, after all the original computing resource nodes and the new computing resource nodes of the original topic receive the resource expansion and upgrade command simultaneously, the following steps are respectively carried out:
a. after all original computing resource nodes of the original topic receive the resource expansion and upgrade command, the following steps are carried out,
a1, cleaning a communication environment and releasing related environment variables;
a2, initializing a secondary communication environment together with waiting for new incoming computing resources, and reconstructing the communication environment;
a3, after the communication environment is initialized, the original computing resource node updates the dynamic task division information, and a new computing resource node which comes in is added into a dynamic task division area according to the dynamic task division rule;
b. the new computing resource node receives the resource expansion and upgrade command, starts the task by the job management, and respectively performs the following steps,
b1, starting a program by a new computing resource node, and entering a dynamic task partition area;
b2, cleaning the communication environment by the new computing resource node;
b3, initializing and reconstructing a secondary communication environment together with the original computing resource node;
b4, partitioning automatically according to a dynamic task partitioning rule, automatically partitioning a new computing resource node to become a new part of process area of dynamic task partitioning, selecting a new computing resource node as a new partition master node, and applying for a task from the partition master node to a global master node in the original computing resource node;
and S3, after the global master node of the original computing resource receives the application tasks of the original partition master node and the new partition master node, uniformly distributing the tasks to the original computing resource node and the new computing resource node, and continuing normal operation of the task.
2. The method for dynamically extending runtime computing resources of claim 1, wherein: the dynamic task partitioning can be used in a message library of a cross-language messaging standard.
3. The method for dynamically extending runtime computing resources of claim 2, wherein: the message libraries of the cross-language message communication standard are MPICH and Open MPI.
CN201910903881.0A 2019-09-24 2019-09-24 Dynamic expansion method for computing resources in runtime Active CN112631693B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910903881.0A CN112631693B (en) 2019-09-24 2019-09-24 Dynamic expansion method for computing resources in runtime

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910903881.0A CN112631693B (en) 2019-09-24 2019-09-24 Dynamic expansion method for computing resources in runtime

Publications (2)

Publication Number Publication Date
CN112631693A CN112631693A (en) 2021-04-09
CN112631693B true CN112631693B (en) 2022-10-04

Family

ID=75282967

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910903881.0A Active CN112631693B (en) 2019-09-24 2019-09-24 Dynamic expansion method for computing resources in runtime

Country Status (1)

Country Link
CN (1) CN112631693B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050235289A1 (en) * 2004-03-31 2005-10-20 Fabio Barillari Method for allocating resources in a hierarchical data processing system
CN104615500A (en) * 2015-02-25 2015-05-13 浪潮电子信息产业股份有限公司 Dynamic server computing resource allocation method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050235289A1 (en) * 2004-03-31 2005-10-20 Fabio Barillari Method for allocating resources in a hierarchical data processing system
CN104615500A (en) * 2015-02-25 2015-05-13 浪潮电子信息产业股份有限公司 Dynamic server computing resource allocation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Improving Dynamic Memory Allocation on Many-Core Embedded Systems With Distributed Shared Memory》;Ioannis Koutras 等;《IEEE Embedded Systems Letters》;20160930;全文 *
《高能物理云平台中的弹性计算资源管理机制》;程振京 等;《计算机工程与应用》;20170831;全文 *

Also Published As

Publication number Publication date
CN112631693A (en) 2021-04-09

Similar Documents

Publication Publication Date Title
Malik et al. An optimistic parallel simulation protocol for cloud computing environments
CN102810184A (en) Method and device for dynamically executing workflow and enterprise system
CN107943592B (en) GPU cluster environment-oriented method for avoiding GPU resource contention
Bellettini et al. Mardigras: Simplified building of reachability graphs on large clusters
CN113672240A (en) Container-based multi-machine-room batch automatic deployment application method and system
WO2023124543A1 (en) Data processing method and data processing apparatus for big data
Konas et al. Parallel discrete event simulation on shared-memory multiprocessors
CN112631693B (en) Dynamic expansion method for computing resources in runtime
Vasilchikov On the recursive-parallel programming for the. NET framework
EP1563379A1 (en) Concurrent operation of a state machine family
CN111611089A (en) Asynchronous declaration type micro-service scheduling method
Matrone et al. LINDA and PVM: A comparison between two environments for parallel programming
Liu et al. BSPCloud: A hybrid distributed-memory and shared-memory programming model
Xu et al. MANA-2.0: A future-proof design for transparent checkpointing of MPI at scale
CN111104320A (en) Test method, device, equipment and medium
Ebner et al. Transformation of functional programs into data flow graphs implemented with PVM
Springer PVM support for clusters
CN112486576B (en) Large-scale dynamic expansion control method for large-scale parallel operation
CN113504956B (en) Method, device, equipment and medium for calling public function under micro service platform
Zou et al. Structural finite element method based on cloud computing
JP4668367B2 (en) Computer, parallel distributed system, and function call method
Tran et al. Parallel program model for distributed systems
Bejanyan et al. VM BASED EVALUATION OF THE SCALABLE PARALLEL MINIMUM SPANNING TREE ALGORITHM FOR PGAS MODEL
CN116909547A (en) Service function online development method, implementation method and equipment based on AXI frame
Nimbalkar et al. Mobile agent: Load balanced process migration in Linux environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant