CN110543362A - Graphics processor management method and device and server - Google Patents

Graphics processor management method and device and server Download PDF

Info

Publication number
CN110543362A
CN110543362A CN201910703720.7A CN201910703720A CN110543362A CN 110543362 A CN110543362 A CN 110543362A CN 201910703720 A CN201910703720 A CN 201910703720A CN 110543362 A CN110543362 A CN 110543362A
Authority
CN
China
Prior art keywords
target
processors
graphics
computing node
graphic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910703720.7A
Other languages
Chinese (zh)
Other versions
CN110543362B (en
Inventor
张磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201910703720.7A priority Critical patent/CN110543362B/en
Publication of CN110543362A publication Critical patent/CN110543362A/en
Application granted granted Critical
Publication of CN110543362B publication Critical patent/CN110543362B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Abstract

The embodiment of the application provides a method, a device and a server for managing a graphics processor, wherein the method comprises the following steps: receiving a computation request, the computation request comprising: a target number of graphics processors; searching a target computing node from a plurality of computing node linked lists at least based on the target number of the graph processors, wherein the number of idle graph processors of each computing node in the computing node linked lists is the same; a target number of target graphics processors for the graphics processor is looked up from all idle graphics processors of the target compute node. The overall GPU distribution efficiency of the GPU cluster and the utilization rate of the GPUs in the GPU cluster are improved.

Description

Graphics processor management method and device and server
Technical Field
The present application relates to the field of computers, and in particular, to a graphics processor management method, apparatus, and server.
Background
Currently, when a Graphics Processing Unit (GPU) cluster is used to perform large-scale computation such as neural network training and parallel acceleration, the way of allocating GPUs in the GPU cluster is generally adopted as follows: for each compute request, the allocation of compute nodes and the allocation of GPUs in the compute nodes are randomly allocated.
Then, due to the randomness of the number of GPUs requested for each calculation, the difference in data transfer performance of the connection method between GPUs in the calculation node having a plurality of GPUs, and the like, the following problems may be caused by the adoption of the method of allocating the GPUs in the GPU cluster: fragmentation of GPU allocations results in inefficient utilization of the GPUs in the GPU cluster.
disclosure of Invention
the embodiment of the application provides a graphics processor management method and device.
In a first aspect, an embodiment of the present application provides a graphics processor management method, including: receiving a computation request, the computation request comprising: a target number of graphics processors; searching a target computing node from a plurality of computing node linked lists at least based on the target number of the graph processors, wherein the number of idle graph processors of each computing node in the computing node linked lists is the same; a target number of target graphics processors for the graphics processor is looked up from all idle graphics processors of the target compute node.
In a second aspect, an embodiment of the present application provides a graphics processor management apparatus, including: a calculation request receiving unit configured to receive a calculation request including: a target number of target graphics processors; the target computing node searching unit is configured to search a target computing node from a plurality of computing node linked lists at least based on the target number of the target graphics processors, wherein the number of idle target graphics processors of each computing node in the computing node linked lists is the same; a target graphics processor lookup unit configured to lookup a target number of target graphics processors of the target graphics processors from all idle target graphics processors of a target compute node.
the graphics processor management method and device provided by the embodiment of the application have at least the following advantages:
All idle computing nodes in the GPU cluster are managed through the computing node linked list, and when computing nodes are required to be allocated after a computing request is received each time, target computing nodes with the number of idle GPUs being the same as or close to the target number of the GPUs are found out from the computing nodes to complete computing, so that the problem of fragmentation of GPU allocation caused by random allocation of the computing nodes in the GPU allocation process is avoided, and the overall GPU allocation efficiency of the GPU cluster and the GPU utilization rate in the GPU cluster are improved.
Drawings
Other features, objects, and advantages of the present request will become more apparent upon reading the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a flow diagram illustrating one embodiment of a graphics processor management method provided by an embodiment of the present application;
FIG. 2 illustrates a flow diagram for finding a target compute node from a linked list of compute nodes;
FIG. 3 shows a flow diagram of finding a target graphics processor from a target compute node;
Fig. 4 is a schematic structural diagram illustrating a graphics processor management apparatus according to an embodiment of the present application.
Detailed Description
the present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Referring to fig. 1, a flowchart of a graphics processor management method according to an embodiment of the present application is shown. The method comprises the following steps:
Step 101, receiving a calculation request.
In the present application, when performing computation using a GPU cluster, a computation request is first received. Each time a computation is performed using a cluster of GPUs, a computation request may be received. The calculation request includes: a target number of graphics processors. The target number of graphics processors may be referred to as a target number of GPUs. The target number of GPUs is the number of GPUs needed to complete the calculation.
step 102, based on at least the target number of the graphics processors, a target compute node is found from a plurality of compute node linked lists.
In the present application, for each compute node linked list, the number of idle GPUs of each compute node in the compute node linked list is the same. In other words, a linked list of compute nodes consists of the same number of compute nodes that contain the same number of idle GPUs. For any one computing node linked list, the number of idle GPUs of the computing nodes included in the computing node linked list is different from the number of idle GPUs of the computing nodes in other computing node linked lists.
For example, the plurality of compute node linked lists are compute node linked list 1, compute node linked list 2. The number of idle GPUs of each compute node in the compute node linked list 1 is 1, the number of GPUs of each compute node in the compute node linked list 2 is 2, and the number of idle GPUs of each compute node in the compute node linked list N is N.
in the present application, the target compute node may be found from a plurality of compute node linked lists based on a target number of GPUs. The compute node that is the target compute node may be selected from all compute nodes in a compute node chain that includes compute nodes having a number of idle GPUs that is greater than or equal to a target number of GPUs.
In the present application, the calculation request may further include the number of CPUs required for calculation and the amount of memory required for calculation. The target compute node may be looked up from a linked list of compute nodes based on a target number of GPUs, a number of CPUs required for the computation, and an amount of memory required for the computation.
In the present application, a certain number of CPUs and a certain amount of memory may be respectively allocated to each compute node chain in advance. And recording the quantity of idle CPUs (central processing units) of the computing node linked list and the quantity of idle memory of the computing node linked list for each computing node linked list.
When a target compute node is found from the plurality of compute node linked lists based on the target number of GPUs, the number of CPUs required for computation, and the amount of memory required for computation, a compute node chain may first be determined that includes idle CPUs that are greater than the number of CPUs required for computation and that includes an amount of idle memory that is greater than the amount of memory required for computation and that includes compute nodes whose number of idle GPUs is greater than or equal to the target number of GPUs. And then, distributing the number of CPUs required for calculation from the idle CPUs in the determined calculation node linked list, and distributing the amount of memory required for calculation from the idle amount of memory in the determined calculation node linked list. And meanwhile, selecting the computing node serving as the target computing node from the determined computing node linked list. Thus, the target compute node is looked up from the plurality of compute node linked lists based on the target number of GPUs, the number of CPUs required for the computation, and the amount of memory required for the computation. In some embodiments, when a compute node is looked up from a plurality of linked lists of compute nodes based on a target number of GPUs, the target compute node may be preferentially looked up from a linked list of compute nodes that includes compute nodes for which the number of free GPUs coincides with the target number of GPUs. The list of compute nodes that include compute nodes for which the number of idle GPUs is consistent with the target number of GPUs may be referred to as a list of preferred compute nodes.
when the preferred computing node linked list is not empty, any computing node can be selected from all computing nodes in the preferred computing node linked list to serve as a target computing node.
When the preferred computing node linked list is empty, the target computing node can be continuously searched from the upper computing node linked list of the preferred computing node linked list. The number of idle GPUs of the compute nodes in the superior compute node linked list of the preferred compute node linked list is greater than the number of idle GPUs of the compute nodes in the preferred compute node linked list. The number of the superior computing node linked lists of the preferred computing node linked list may be plural. The superior computing node linked list of the preferred computing node linked list can be accessed in the order of the number of idle GPUs in the included computing nodes from small to large until the target computing node is found.
for example, the plurality of compute node linked lists are compute node linked list 1, compute node linked list 2. The number of idle GPUs of each compute node in the compute node linked list 1 is 1, the number of GPUs of each compute node in the compute node linked list 2 is 2, and the number of idle GPUs of each compute node in the compute node linked list N is N. Preferably, the calculation node linked list is the calculation node linked list 2, and the superior calculation node linked list of the calculation node linked list 2 can be accessed according to the sequence of the calculation node linked list 3 and the calculation node linked list 4.
In some embodiments, computing the request further comprises: central Processing Unit (CPU) demand and memory demand. The number of CPU requirements may be referred to as the number of CPU requirements. Based at least on the target number of GPUs, finding a target compute node from a linked list of compute nodes comprises: and searching target computing nodes meeting allocation conditions from a plurality of computing node linked lists based on the target number of the graphic processors, the required number of the central processing units and the required memory amount. The distribution conditions include: the number of idle CPUs is greater than or equal to the number of CPU demands in the computation request and the amount of idle memory is greater than or equal to the amount of memory demands in the computation request.
When the target compute nodes meeting the allocation condition are found from the multiple compute node linked lists based on the target number of the graphics processors, the required number of the central processing units, and the required memory amount, it may be first determined whether the compute nodes meeting the allocation condition exist in the preferred compute node linked list. The number of idle GPUs of the compute nodes in the preferred compute node linked list is equal to the target number of GPUs. When the preferred computing node linked list has computing nodes meeting the distribution condition, the computing nodes meeting the distribution condition can be used as target computing nodes. When the optimal computing node linked list does not have the computing node meeting the distribution condition, the computing node meeting the distribution condition can be found from the superior computing node linked list of the optimal computing node linked list, and the found computing node meeting the distribution condition is taken as the target computing node. When searching for a compute node that satisfies the allocation condition from the upper-level compute node linked list of the preferred compute node linked list, the search may be started from the upper-level compute node linked list of the preferred compute node linked list. The number of idle GPUs of the compute nodes in the upper-level compute node linked list of the preferred compute node linked list is + 1. When searching the target computing node from a computing node linked list each time, if the computing node linked list is empty, continuing to search the target computing node from the upper-level computing node linked list of the computing node linked list.
referring to FIG. 2, a flow diagram of a lookup of a target compute node from a linked list of compute nodes is shown.
in this application, a system running on a server for finding a target compute node from a linked list of multiple compute nodes may be referred to as a first level partner system. The target compute node may be located by the first level partner system from a linked list of multiple compute nodes.
When a target compute node is found from a plurality of compute node linked lists, an optimal compute node linked list is accessed first according to the target number of GPUs. The number of idle GPUs of the compute nodes in the preferred compute node linked list is the same as the target number of GPUs. And preferably selecting the computing node linked list as the currently accessed computing node linked list.
then, whether the MAX CPU and the Mem corresponding to the current computing node linked list meet the request quantity is judged. The MAX CPU corresponding to the current computing node linked list may represent the number of idle CPUs included in the computing node linked list that has the largest number of idle CPUs, and Mem represents the idle memory amount of the computing node. Judging whether the MAX CPU and Mem corresponding to the currently accessed computing node linked list satisfy the request quantity may be: and judging whether the MAX CPU is larger than or equal to the target number of the CPUs in the calculation request and the Mem is larger than or equal to the memory demand in the calculation request.
If the MAX CPUs and the Mem corresponding to the current computing node linked list meet the request quantity, the computing node linked list at least including the computing nodes with the most idle CPUs can be used as the target computing node, at this time, the currently accessed computing node linked list can be traversed, and the computing node which is most superior to the CPU required quantity and the Mem required quantity is found out and used as the target computing node. Then, the remaining free amount of the GPU, CPU, Mem, etc. of the found computing node as the target computing node is calculated, and the found computing node is inserted into the currently accessed linked list again. And if the MAX CPU and the Mem corresponding to the current computing node linked list do not meet the request quantity, taking the upper-level computing node chain of the current computing node linked list as a new current computing node linked list, wherein the number of idle GPUs included in the upper-level computing node chain of the current accessed computing node linked list is +1 of the number of idle GPUs of the computing nodes in the current accessed computing node linked list. And continuously judging whether the MAX CPU and the Mem corresponding to the new computing node linked list meet the request quantity. And accessing at least one computing node linked list in the above mode until the target computing node is found.
Step 103, finding out the target graphic processors of the target number of graphic processors from all the idle graphic processors of the target computing node.
In this application, when the target compute node is a compute node in the preferred compute node linked list, the number of idle GPUs of the target compute node is equal to the target number of GPUs, and all idle GPUs included in the target compute node can be used as target GPUs, so that the target number of target GPUs of the GPUs is found from all idle GPUs of the target compute node. Then, the target number of GPUs of the found GPUs are utilized to complete the computation operation described in the computation request that needs to be completed by the target number of GPUs of the GPUs.
when the target compute node is a compute node in a superior compute node linked list of the preferred compute node linked list, the number of idle GPUs of the target compute node is greater than the target number of GPUs, and the target GPUs of the target number of GPUs can be selected from all idle GPUs included in the target compute node as the target GPUs, so that the target GPUs of the target number of GPUs can be found from all idle GPUs of the target compute node. Then, the target number of GPUs of the found GPUs are utilized to complete the computation operation described in the computation request that needs to be completed by the target number of GPUs of the GPUs.
In some embodiments, the target compute node is a compute node in a linked list of superior compute nodes of the linked list of preferred compute nodes. The number of idle GPUs of the target compute node is greater than the target number of GPUs. When the target number of target GPUs of the GPUs is searched from all the GPUs of the target compute node, the target number of target GPUs of the GPUs can be searched from a plurality of GPU linked lists of the target compute node.
And for each GPU linked list of the target computing node, the connection performance information of each GPU group in the GPU linked list is the same. The connection performance information of the GPU group indicates performance of data transmission by a connection manner between the GPUs in the GPU group. The connection mode between the GPUs in the GPU group may include: NVLINK, PIX (PCIe internal switch), PXB (PCIe internal switches), PHB (PCIe host bridge), SOC (cpu socket). And when the GPUs in the GPU group are connected through the NVLINK, the bandwidth is highest, and the delay is lowest. When GPUs in the GPU group are connected through the SOC, the bandwidth is the lowest, and the delay is the highest. The lower the latency between GPUs in a GPU group, the higher the performance. The performance of the above connection mode from high to low can be expressed as: NVLINK > PIX > PXB > PHB > SOC.
When the target quantity of target GPUs of the GPUs are found out from the plurality of GPU linked lists of the target computing node, the CPU linked lists in the plurality of GPU linked lists can be accessed according to the sequence of the connection performance from high to low, and the target GPUs are found out from the accessed GPU linked lists until the target quantity of target GPUs of the GPUs are found out. The connection performance of the GPU linked list is the connection performance of the GPU group included in the GPU linked list.
for example, the target compute node includes five GPU linked lists, each of which includes at least one GPU group. And the GPUs in each GPU group in the first linked list are connected by NVLINK. And the GPUs in each GPU group in the second linked list are connected by adopting PIX. And the GPUs in each GPU group in the third linked list are connected by adopting PXB. And the GPUs in each GPU group in the fourth linked list are connected by adopting a PHB. And the GPUs in each GPU group in the fifth linked list are connected by adopting an SOC.
and searching the target GPUs from the first GPU linked list until the target number of target GPUs of the GPUs is searched. And when the target number of the GPUs is larger than the number of all idle GPUs in the first linked list, all the idle GPUs in the first GPU linked list are used as target GPUs, and the target GPUs are continuously searched in the second GPU linked list. And when the sum of the number of the idle GPUs in the first GPU chain table and the number of the idle GPUs in the second GPU chain table is smaller than the target number of the GPUs, continuously searching the target GPU from the third GPU chain table. And according to the mode, until the target number of the target GPUs of the GPUs is found.
In the method, all the GPUs in the compute nodes are managed through the data transmission performance of the connection mode between the GPUs in the compute nodes, and when the target compute node is distributed and the GPUs in the target compute node need to be further distributed, the target GPUs in the GPU group with high data transmission performance are distributed each time the compute request is received, so that the data transmission efficiency between the distributed target GPUs is improved, and the compute efficiency is further improved.
in some embodiments, the set of GPUs in the target compute node includes two GPUs in number. For example, the target compute node includes five GPU linked lists, each of which includes at least one GPU group. And NVLINK connection is adopted between two idle GPUs in each GPU group in the first linked list. And PIX connection is adopted between two idle GPUs in each GPU group in the second linked list. In the third linked list, two idle GPUs in each GPU group are connected by PXB. In the fourth linked list, two idle GPUs in each GPU group are connected by adopting PHB. In the fifth linked list, two idle GPUs in each GPU group are connected by using SOC.
In some embodiments, when the target number of GPUs is an even number, CPU linked lists in the multiple GPU linked lists may be accessed and the target GPUs may be searched from the accessed GPU linked lists in an order of the connection performance from high to low until the target number of GPUs of the GPU is searched, where the connection performance of the GPU linked lists is the connection performance of the GPU group in the GPU linked list.
When the target number of the GPUs is odd, the CPU linked lists in the multiple GPU linked lists may be accessed and the target GPU may be searched from the accessed GPU linked lists in the order from low to high in connection performance until one target GPU is found, and the CPU linked lists may be searched from the accessed GPU linked lists in the order from high to low in connection performance until the remaining number of target GPUs is found. The remaining number is the target number of GPUs-1.
For example, the GPU group includes two GPUs in number. The target compute node includes five GPU linked lists, each of which includes at least one GPU group. And the GPUs in each GPU group in the first GPU chain table are connected by NVLINK. And the GPUs in each GPU group in the second GPU chain table are connected by adopting PIX. And the GPUs in each GPU group in the third GPU chain table are connected by adopting PXB. And the GPUs in each GPU group in the fourth GPU linked list are connected by adopting PHB. And the GPUs in each GPU group in the fifth GPU chain table are connected by adopting an SOC.
And when the target number of the GPUs is an even number, searching the target GPU from the first GPU linked list until the target number of the GPUs is searched. And when the target number of the GPUs is larger than the number of all idle GPUs in the first linked list, all the idle GPUs in all the GPU groups in the first linked list are used as target GPUs, and the target GPUs are continuously searched in the second GPU linked list. And when the sum of the number of the idle GPUs in the first GPU linked list and the number of the idle GPUs in the second linked list is smaller than the target number of the GPUs, continuously searching the target GPU from the third linked list. And according to the mode, until the target number of the target GPUs of the GPUs is found. When the target number of the GPUs is an even number, the searching process is equivalent to searching the target GPU from the first GPU chain table, distributing the target number/2 GPU groups of the GPUs, and taking the idle GPUs in the distributed target number/2 GPU groups of the GPUs as the target GPUs.
And when the target number of the GPUs is an odd number, searching from the fifth GPU linked list until a target GPU is searched. Except for the target GPU, the target quantity-1 target GPUs of the remaining GPUs are searched from the first GPU chain table until the target quantity-1 target GPUs of the GPUs are searched. And if the fifth GPU linked list is not empty, selecting an idle GPU from all idle GPUs in the fifth GPU linked list as a target GPU. And if the fifth GPU linked list is empty, continuously searching from the fourth GPU linked list. And each time a GPU linked list is determined to be empty, continuously searching from the upper-level GPU linked list of the GPU linked list until a target GPU is searched.
Referring to FIG. 3, a flow diagram of a lookup of a target graphics processor from a target compute node is shown.
In the present application, a target number of target GPUs for a GPU may be looked up by the second-level partner system from all idle GPUs of the target compute node. In other words, a target number of free GPUs of the GPU are allocated as target GPUs for computation by the second-level partner system.
The second level partner system may establish a plurality of GPU link lists GPUs affinity 1-GPUs affinityN. The smaller the serial number of the linked list, the higher the connection performance of the GPU group in the linked list. The connection performance of the GPU groups in the GPUs affinity 1 linked list is highest. The connection performance of the GPU groups in the GPUs affinityN linked list is the lowest.
And when the target number of the GPUs is an even number, searching from the GPUs affinity 1 linked list, and if the GPUs affinity 1 linked list is not empty and the number of idle GPUs in the GPUs affinity 1 linked list is greater than the target number of the GPUs, selecting the target number of GPUs of the GPUs from all the idle GPUs as the target GPU. And if the GPUs affinity 1 linked list is not empty, and the number of idle GPUs in the GPUs affinity 1 linked list is equal to the target number of the GPUs, taking all the idle GPUs as the target GPUs. And if the GPUs affinity 1 linked list is not empty, and the number of idle GPUs in the GPUs affinity 1 linked list is less than the target number of the GPUs, taking all the idle GPUs as target GPUs, and continuously searching from a GPUs affinity2 of a next-level linked list of the GPUs affinity 1. If the GPUs affinity 1 linked list is empty, the search continues from the next-level linked list GPUs affinity2 of the GPUs affinity 1.
After one linked list is accessed each time and all idle GPUs in one linked list are used as target GPUs, the total number of the searched target GPUs still does not reach the target number of the GPUs, and searching is continuously carried out from the next-level linked list of the accessed linked list. And searching according to the mode until the target GPUs with the target number of the GPUs are searched. And then adding the target GPUs of the target number of the found GPUs into a resource allocation set, and updating a GPUs affinity linked list with idle GPUs as the target GPUs. The resource allocation set is returned to the scheduling system. The scheduling system may be a system that schedules GPU resources for each compute node. The scheduling system may determine the number of idle GPUs that have been allocated on each compute node according to the resource allocation set to determine the remaining GPU resources on each compute node, so as to schedule the GPU resources of each compute node. For example, when the scheduling system determines that the number of idle GPUs that have already been allocated in a compute node is large, the scheduling system may allocate unallocated GPU resources in the GPU cluster to the compute node.
When the target number of the GPUs is odd, searching a target GPU from the GPUs affinityN linked list, searching the rest even target GPUs, namely the GPUs with the target number of-1 idle GPU from the GPUs affinity 1 until the target number of idle GPUs of the GPUs are searched, and if the affinityN linked list is not empty, selecting one idle GPU from all the idle GPUs in the GPUs affinityN linked list as the target GPU. And if the GPUs affinityN linked list is empty, searching from a last-level linked list of the GPUs affinityN, namely GPUs affinityN-1, determining that one linked list is empty each time, and continuously searching from the last-level linked list of the linked lists until a target GPU is searched. Then, the target GPUs of the target number of the found GPUs are added into the resource allocation set. And updating the GPUs with idle GPUs as the GPUs affinity linked list of the target GPU. The resource allocation set is returned to the scheduling system.
Referring to fig. 4, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus, which corresponds to the embodiment of the method shown in fig. 1. The specific implementation of the corresponding operations that the respective units in the apparatus are configured to perform may refer to the specific implementation of the corresponding operations described in the method embodiments.
As shown in fig. 4, the apparatus of the present embodiment includes: a calculation request receiving unit 401, a target calculation node search unit 402, and a target graphics processor search unit 403. Wherein, the calculation request receiving unit 401 is configured to receive a calculation request, the calculation request including: a target number of target graphics processors; the target compute node lookup unit 402 is configured to lookup a target compute node from a plurality of compute node linked lists based on at least a target number of the target graphics processors, wherein the number of idle target graphics processors of each compute node in the compute node linked lists is the same; the target graphics processor lookup unit 403 is configured to lookup a target number of target graphics processors of the target computing node from all idle target graphics processors of the target computing node.
In some embodiments, computing the request further comprises: the required quantity of the central processing unit and the required quantity of the memory; the target computing node lookup unit 402 is further configured to: based on the target number of the graphic processors, the required number of the central processing units and the required memory amount, searching a target computing node which meets allocation conditions from a plurality of computing node linked lists, wherein the allocation conditions comprise: the number of idle central processing units is greater than or equal to the central processing unit demand number and the amount of idle memory is greater than or equal to the memory demand amount.
In some embodiments, the target computing node lookup unit 402 is further configured to: judging whether the optimized computing node linked list comprises computing nodes or not, wherein the number of idle graphic processors of the computing nodes in the optimized computing node linked list is equal to the target number of the graphic processors; if so, selecting one computing node from the preferred computing node linked list as a target computing node; if not, finding out the computing node as the target computing node from the superior computing node linked list of the preferred computing node linked list.
In some embodiments, the target graphics processor lookup unit 403 is further configured to: when the target computing node is the computing node found from the superior computing node linked list of the preferred computing node linked list, the target graphic processors of the graphic processors are found from the graphic processor linked lists of the target computing node, wherein the connection performance of each graphic processor group in the graphic processor linked list is the same, the graphic processors comprise at least one idle graphic processor, and the connection performance of the graphic processor groups is the performance of data transmission through the connection mode among the graphic processors in the graphic processor groups.
In some embodiments, the target graphics processor lookup unit 403 is further configured to: when the target number of the graphic processors is an even number, accessing the graphic processor linked lists in the graphic processor linked lists according to the sequence of the connection performance from high to low and searching the target graphic processors from the accessed graphic processor linked lists until the target number of the graphic processors is searched; when the target number of the graphics processors is an odd number, accessing the graphics processor linked lists in the multiple graphics processor linked lists according to the sequence from low to high in connection performance and searching the target graphics processors from the accessed graphics processor linked lists until one target graphics processor is found, accessing the graphics processor linked lists in the multiple graphics processor linked lists according to the sequence from high to low in connection performance and searching the target graphics processors from the accessed graphics processor linked lists until the remaining number of the target graphics processors are found, wherein the remaining number is the target number of the graphics processors minus 1.
The present application also provides a server, which may be configured with one or more processors; a memory for storing one or more programs, the one or more programs may include instructions for performing the operations described in the above embodiments. The one or more programs, when executed by the one or more processors, cause the one or more processors to perform the instructions of the operations described in the above embodiments.
The present application also provides a computer readable medium, which may be included in a server; or the device can exist independently and is not assembled into the server. The computer readable medium carries one or more programs which, when executed by the server, cause the server to perform the operations described in the above embodiments.
It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a message execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a message execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable messages for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer messages.
the above description is only a preferred embodiment of the present request and is illustrative of the principles of the technology employed. It will be understood by those skilled in the art that the scope of the invention herein referred to is not limited to the technical embodiments with the specific combination of the above technical features, but also encompasses other technical embodiments with any combination of the above technical features or their equivalents without departing from the inventive concept. For example, technical embodiments formed by mutually replacing the above-mentioned features with (but not limited to) technical features having similar functions disclosed in the present application.

Claims (10)

1. A graphics processor management method, the method comprising:
receiving a computation request, the computation request comprising: a target number of graphics processors;
searching a target computing node from a plurality of computing node linked lists at least based on the target number of the graph processors, wherein the number of idle graph processors of each computing node in the computing node linked lists is the same;
A target number of target graphics processors for the graphics processor is looked up from all idle graphics processors of the target compute node.
2. the method of claim 1, wherein the computing request further comprises: the required quantity of the central processing unit and the required quantity of the memory;
The finding a target compute node from a plurality of linked lists of compute nodes based at least on the target number of graphics processors comprises:
Based on the target number of the graphic processors, the required number of the central processing units and the required memory amount, searching a target computing node which meets allocation conditions from a plurality of computing node linked lists, wherein the allocation conditions comprise: the number of idle central processing units is greater than or equal to the central processing unit demand number and the amount of idle memory is greater than or equal to the memory demand amount.
3. The method of claim 1, wherein finding a target compute node from a linked list of compute nodes based at least on a target number of the graphics processors comprises:
judging whether the optimized computing node linked list comprises computing nodes or not, wherein the number of idle graphic processors of the computing nodes in the optimized computing node linked list is equal to the target number of the graphic processors;
If so, selecting one computing node from the preferred computing node linked list as a target computing node;
if not, finding out the computing node as the target computing node from the superior computing node linked list of the preferred computing node linked list.
4. The method of claim 3, wherein the target compute node is a compute node found from a superior compute node linked list of the preferred compute node linked list;
The finding a target number of target graphics processors of the graphics processors from among all idle graphics processors of a target compute node comprises:
And finding out a target number of target graphic processors of the graphic processors from a plurality of graphic processor linked lists of the target computing nodes, wherein the connection performance of each graphic processor group in the graphic processor linked lists is the same, the graphic processor groups comprise a plurality of idle graphic processors, and the connection performance of the graphic processor groups is the performance of transmitting data through the connection mode among the graphic processors in the graphic processor groups.
5. The method of claim 4, wherein finding a target number of graphics processors from a plurality of graphics processor linked lists of target compute nodes comprises:
When the target number of the graphic processors is an even number, accessing the graphic processor linked lists in the graphic processor linked lists according to the sequence of the connection performance from high to low and searching the target graphic processors from the accessed graphic processor linked lists until the target number of the graphic processors is searched;
When the target number of the graphics processors is an odd number, accessing the graphics processor linked lists in the multiple graphics processor linked lists according to the sequence from low to high in connection performance and searching the target graphics processors from the accessed graphics processor linked lists until one target graphics processor is found, accessing the graphics processor linked lists in the multiple graphics processor linked lists according to the sequence from high to low in connection performance and searching the target graphics processors from the accessed graphics processor linked lists until the remaining number of the target graphics processors are found, wherein the remaining number is the target number of the graphics processors minus 1.
6. A graphics processor management apparatus, the apparatus comprising:
a calculation request receiving unit configured to receive a calculation request including: a target number of target graphics processors;
The target computing node searching unit is configured to search a target computing node from a plurality of computing node linked lists at least based on the target number of the target graphics processors, wherein the number of idle target graphics processors of each computing node in the computing node linked lists is the same;
A target graphics processor lookup unit configured to lookup a target number of target graphics processors of the target graphics processors from all idle target graphics processors of a target compute node.
7. the apparatus of claim 6, wherein the target graphics processor lookup unit is further configured to: when the target computing node is the computing node found from the superior computing node linked list of the preferred computing node linked list, the target graphic processors of the graphic processors are found from the graphic processor linked lists of the target computing node, wherein the connection performance of each graphic processor group in the graphic processor linked list is the same, the graphic processor group comprises a plurality of idle graphic processors, and the connection performance of the graphic processor group is the performance of transmitting data through the connection mode among the graphic processors in the graphic processor group.
8. The apparatus of claim 7, wherein the target graphics processor lookup unit is further configured to: when the target number of the graphic processors is an even number, accessing the graphic processor linked lists in the graphic processor linked lists according to the sequence of the connection performance from high to low and searching the target graphic processors from the accessed graphic processor linked lists until the target number of the graphic processors is searched; when the target number of the graphics processors is an odd number, accessing the graphics processor linked lists in the multiple graphics processor linked lists according to the sequence from low to high in connection performance and searching the target graphics processors from the accessed graphics processor linked lists until one target graphics processor is found, accessing the graphics processor linked lists in the multiple graphics processor linked lists according to the sequence from high to low in connection performance and searching the target graphics processors from the accessed graphics processor linked lists until the remaining number of the target graphics processors are found, wherein the remaining number is the target number of the graphics processors minus 1.
9. A server, comprising:
One or more processors;
A memory for storing one or more programs,
The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-5.
10. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN201910703720.7A 2019-07-31 2019-07-31 Graphics processor management method and device and server Active CN110543362B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910703720.7A CN110543362B (en) 2019-07-31 2019-07-31 Graphics processor management method and device and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910703720.7A CN110543362B (en) 2019-07-31 2019-07-31 Graphics processor management method and device and server

Publications (2)

Publication Number Publication Date
CN110543362A true CN110543362A (en) 2019-12-06
CN110543362B CN110543362B (en) 2022-10-21

Family

ID=68710052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910703720.7A Active CN110543362B (en) 2019-07-31 2019-07-31 Graphics processor management method and device and server

Country Status (1)

Country Link
CN (1) CN110543362B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022001086A1 (en) * 2020-06-29 2022-01-06 苏州浪潮智能科技有限公司 Efficient gpu resource allocation optimization method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140181806A1 (en) * 2012-12-20 2014-06-26 Vmware, Inc. Managing a data structure for allocating graphics processing unit resources to virtual machines
US20140282504A1 (en) * 2013-03-13 2014-09-18 Oracle America, Inc. Method and system for specifying the layout of computer system resources
CN107247629A (en) * 2017-07-04 2017-10-13 北京百度网讯科技有限公司 Cloud computing system and cloud computing method and device for controlling server
US20180217868A1 (en) * 2017-01-31 2018-08-02 Samsung Electronics, Co. Ltd. Flexible in-order and out-of-order resource allocation
US20180253816A1 (en) * 2017-03-03 2018-09-06 International Business Machines Corporation Deep learning via dynamic root solvers
CN109033001A (en) * 2018-07-17 2018-12-18 北京百度网讯科技有限公司 Method and apparatus for distributing GPU
US10325343B1 (en) * 2017-08-04 2019-06-18 EMC IP Holding Company LLC Topology aware grouping and provisioning of GPU resources in GPU-as-a-Service platform
CN109933430A (en) * 2019-03-08 2019-06-25 北京百度网讯科技有限公司 The method and apparatus for distributing graphics processor
CN109947565A (en) * 2019-03-08 2019-06-28 北京百度网讯科技有限公司 Method and apparatus for distributing calculating task

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140181806A1 (en) * 2012-12-20 2014-06-26 Vmware, Inc. Managing a data structure for allocating graphics processing unit resources to virtual machines
US20140282504A1 (en) * 2013-03-13 2014-09-18 Oracle America, Inc. Method and system for specifying the layout of computer system resources
US20180217868A1 (en) * 2017-01-31 2018-08-02 Samsung Electronics, Co. Ltd. Flexible in-order and out-of-order resource allocation
US20180253816A1 (en) * 2017-03-03 2018-09-06 International Business Machines Corporation Deep learning via dynamic root solvers
CN107247629A (en) * 2017-07-04 2017-10-13 北京百度网讯科技有限公司 Cloud computing system and cloud computing method and device for controlling server
US10325343B1 (en) * 2017-08-04 2019-06-18 EMC IP Holding Company LLC Topology aware grouping and provisioning of GPU resources in GPU-as-a-Service platform
CN109033001A (en) * 2018-07-17 2018-12-18 北京百度网讯科技有限公司 Method and apparatus for distributing GPU
CN109933430A (en) * 2019-03-08 2019-06-25 北京百度网讯科技有限公司 The method and apparatus for distributing graphics processor
CN109947565A (en) * 2019-03-08 2019-06-28 北京百度网讯科技有限公司 Method and apparatus for distributing calculating task

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙震宇等: "大型高能物理计算集群资源管理方法的评测", 《计算机科学》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022001086A1 (en) * 2020-06-29 2022-01-06 苏州浪潮智能科技有限公司 Efficient gpu resource allocation optimization method and system

Also Published As

Publication number Publication date
CN110543362B (en) 2022-10-21

Similar Documents

Publication Publication Date Title
US8984085B2 (en) Apparatus and method for controlling distributed memory cluster
US10394782B2 (en) Chord distributed hash table-based map-reduce system and method
CN102446139B (en) Method and device for data storage
CN109976907B (en) Task allocation method and system, electronic device and computer readable medium
CN104102693A (en) Object processing method and device
CN112905342B (en) Resource scheduling method, device, equipment and computer readable storage medium
CN111597148B (en) Distributed metadata management method for distributed file system
CN114020470A (en) Resource allocation method, device, readable medium and electronic equipment
CN110543362B (en) Graphics processor management method and device and server
US11093291B2 (en) Resource assignment using CDA protocol in distributed processing environment based on task bid and resource cost
CN110309229A (en) The data processing method and distributed system of distributed system
CN109783002B (en) Data reading and writing method, management equipment, client and storage system
CN112650449B (en) Method and system for releasing cache space, electronic device and storage medium
CN113347238A (en) Message partitioning method, system, device and storage medium based on block chain
CN112099728B (en) Method and device for executing write operation and read operation
CN113204421A (en) Serverless co-distribution of functions and storage pools
CN109407970B (en) Read-write request processing method and device and electronic equipment
CN116737370A (en) Multi-resource scheduling method, system, storage medium and terminal
CN115878333A (en) Method, device and equipment for judging consistency between process groups
US10635336B1 (en) Cache-based partition allocation
CN115658295A (en) Resource scheduling method and device, electronic equipment and storage medium
US20210149746A1 (en) Method, System, Computer Readable Medium, and Device for Scheduling Computational Operation Based on Graph Data
US11461284B2 (en) Method, device and computer program product for storage management
CN112948330A (en) Data merging method, device, electronic equipment, storage medium and program product
CN115129709A (en) Data processing method, server and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant