CN105677479B - The implementation method and device of parallel operation GPU operation program - Google Patents

The implementation method and device of parallel operation GPU operation program Download PDF

Info

Publication number
CN105677479B
CN105677479B CN201511024848.9A CN201511024848A CN105677479B CN 105677479 B CN105677479 B CN 105677479B CN 201511024848 A CN201511024848 A CN 201511024848A CN 105677479 B CN105677479 B CN 105677479B
Authority
CN
China
Prior art keywords
gpu
container
operation program
mirror image
run
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201511024848.9A
Other languages
Chinese (zh)
Other versions
CN105677479A (en
Inventor
潘昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201511024848.9A priority Critical patent/CN105677479B/en
Publication of CN105677479A publication Critical patent/CN105677479A/en
Application granted granted Critical
Publication of CN105677479B publication Critical patent/CN105677479B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3818Decoding for concurrent execution
    • G06F9/3822Parallel decoding, e.g. parallel decode units

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The embodiment of the invention provides a kind of implementation method and device for running GPU operation program parallel on a physical host, method includes: to obtain at least one GPU operation program to be run;The applicable container mirror image of GPU operation program each to be run is obtained respectively;According to the container mirror image that each GPU operation program to be run is applicable in, corresponding container is created for GPU operation program each to be run;The hardware resource of non-GPU is configured for each container;It is that each container binds GPU hardware resource according to preset binding granularity;Each GPU operation program is submitted into corresponding container;The hardware resource for the non-GPU that each container is configured using itself and the GPU hardware resource of binding run GPU operation program.It using the embodiment of the present invention, realizes and runs different GPU operation programs parallel on a physical host, improve virtualization technology to the degree of support of GPU.

Description

The implementation method and device of parallel operation GPU operation program
Technical field
The present invention relates to GPU operation program administrative skill fields, run on a physical host more particularly to one kind The implementation method and device of different GPU operation programs.
Background technique
Virtualization technology obtains in fields such as Computer Architecture, operating system, compiler and programming languages at present It is widely applied.The technology realizes the logical abstraction and unified representation of resource, server, network and in terms of all There is advantage outstanding, reduce management complexity, improve resource utilization and efficiency of operation, has efficiently controlled cost. Current virtualization technology can support the virtualization of the computing resources such as CPU, memory, network interface card well, but for some The support of the multimedia equipments such as specific equipment, such as video card, sound card is simultaneously not perfect.
The appearance of GPU in recent years plays immeasurable impetus to high-performance computing sector development, and GPU is opposite There is computing unit advantage on hardware structure in CPU, but GPU itself is in CPU of the virtualization driving degree of support compared to maturation Technology has biggish gap, such as: the GPU hardware of current physical host can only serial process, i.e., can only in the same period Run a GPU operation program.GPU hardware on one physical host can not virtually be turned to it is multiple being capable of isolated operation GPU The GPU resource of operation program, that is to say, that cannot achieve and run different GPU operation programs parallel in a physical machine.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of reality for running GPU operation program parallel on a physical host Existing method and device, to improve virtualization technology to the degree of support of GPU.
In order to achieve the above objectives, the embodiment of the invention discloses one kind, and different GPU operation journeys are run in a physical machine The implementation method of sequence, which comprises
Obtain at least one GPU operation program to be run;
The applicable container mirror image of GPU operation program each to be run is obtained respectively;The container mirror image is according to described in What the relevant information of the mounted GPU of physical host constructed in advance;The relevant information of the GPU includes at least: mounted each The operation frame of the ardware model number of a GPU, the driver of every kind of model GPU and every kind of model GPU;
According to the container mirror image that each GPU operation program to be run is applicable in, for GPU operation program wound each to be run Build corresponding container;
The hardware resource of non-GPU is configured for each container;
It is that each container binds GPU hardware resource according to preset binding granularity;
Each GPU operation program is submitted into corresponding container;
The hardware resource for the non-GPU that each container is configured using itself and the GPU hardware resource of binding, operation GPU fortune Calculate program.
It is preferably, described to obtain at least one GPU operation program to be run, comprising:
Obtain the GPU operation program being stored in the physical host;
And/or
GPU operation program is received by network.
It preferably, include description information relevant to GPU in the GPU operation program;It is wrapped in the description information It includes: the operation frame of the ardware model number of GPU, the driver of the GPU and the GPU required for the GPU operation program;
It is described to obtain the applicable container mirror image of GPU operation program each to be run respectively, comprising:
Obtain the description information relevant to GPU in GPU operation program each to be run;
By the description information of each of acquisition GPU operation program to be run, with each container mirror constructed in advance The relevant information of the GPU of picture is matched;
By the container mirror image of successful match, it is determined as the applicable container mirror image of GPU operation program to be run.
Preferably, the container mirror image be stored in the physical host or pre-set container mirror image server in;
It is described to obtain the applicable container mirror image of GPU operation program each to be run respectively, comprising:
Respectively from the container mirror image stored in the container mirror image saved in the physical host or in container mirror image server The middle container mirror image for obtaining GPU operation program each to be run and using.
Preferably, the hardware resource of the non-GPU, includes at least:
Occupancy ratio, memory size and the disk size of the usage quantity of core cpu, CPU.
Preferably, the preset binding granularity, comprising:
At least one computing unit at least one chip or GPU chip in physical display card.
Preferably, the container mirror image is to construct to instruct using linux container mirror image, or construct by container mirror image Tools build.
Preferably, each applicable container mirror image of GPU operation program to be run of the basis, each wait run GPU operation program creates corresponding container, comprising:
It calls linux container building instruction or container is created by container the build tool.
Preferably, further includes: after having run GPU operation program, the container of GPU operation program creation is saved as, after Continuous execution is described to obtain at least one GPU operation program to be run;
For each GPU operation program obtained, searches from the container saved and transported with the GPU obtained again Calculate the container that program matches;
The container to match with the GPU operation program is such as found, then the GPU operation program is submitted to and is matched with it Container, the hardware resource for the non-GPU that the container to match with the GPU operation program is configured using itself and the GPU of binding Hardware resource runs the GPU operation program;
If do not found the container to match with the GPU operation program, then the container mirror that the GPU operation program uses is obtained Picture;According to the container mirror image, the corresponding container of GPU operation program is created;The hardware resource of non-GPU is configured for the container;It presses It is that the container binds GPU hardware resource according to preset binding granularity;The GPU operation program is submitted into the container;Container benefit The hardware resource for the non-GPU being configured with itself and the GPU hardware resource of binding, run the GPU operation program.
In order to achieve the above objectives, the embodiment of the invention also provides one kind runs GPU fortune parallel on a physical host The realization device of program is calculated, described device includes:
GPU operation program obtaining unit, for obtaining at least one GPU operation program to be run;
Container mirror image obtaining unit, for obtaining the applicable container mirror image of GPU operation program each to be run respectively;Institute The container mirror image stated constructs in advance for the relevant information according to the mounted GPU of the physical host;The related letter of the GPU Breath includes at least: the ardware model number of mounted each GPU, every kind of model GPU driver and every kind of model GPU operation Frame;
Container creating unit is each to be shipped for the container mirror image applicable according to GPU operation program each to be run Capable GPU operation program creates corresponding container;
The hardware resource configuration unit of non-GPU, for configuring the hardware resource of non-GPU for each container;
GPU hardware resource binding unit, for being that each container binds GPU hardware resource according to preset binding granularity;
Program submits unit, for each GPU operation program to be submitted to corresponding container;
Container running unit, the hardware resource for the non-GPU being configured for each container using itself and the GPU of binding are hard Part resource runs GPU operation program.
Preferably, the GPU operation program obtaining unit, is specifically used for:
Obtain the GPU operation program being stored in the physical host;And/or GPU operation program is received by network.
It preferably, include description information relevant to GPU in the GPU operation program;It is wrapped in the description information Contain: the operation frame of the ardware model number of GPU required for the GPU operation program, the GPU driver and the GPU;
The container mirror image obtaining unit includes: that description information obtains subelement, coupling subelement and determines subelement;
The description information obtains subelement, relevant to GPU in GPU operation program each to be run for obtaining Description information;
The coupling subelement, it is and pre- for the description information of GPU operation program to be run each of will to be obtained The relevant information of the GPU of each container mirror image first constructed is matched;
The determining subelement, for being determined as the container mirror image of successful match GPU operation program to be run and being applicable in Container mirror image.
Preferably, the container mirror image be stored in the physical host or pre-set container mirror image server in;
The container mirror image obtaining unit, is specifically used for:
Respectively from the container mirror image stored in the container mirror image saved in the physical host or in container mirror image server It is middle to obtain the applicable container mirror image of GPU operation program each to be run.
Preferably, further includes: capacitor memory cell and container matching unit;
The capacitor memory cell, for after having run GPU operation program, saving as GPU operation program creation Container triggers the GPU operation program obtaining unit;
The container matching unit, for being directed to the GPU operation program obtaining unit each GPU operation journey obtained Sequence searches the container to match with the GPU operation program from the container saved;
If found, triggering described program submits unit, so that described program submits the operation of container described in unit triggers Unit runs the GPU operation program;
If do not found, the container mirror image obtaining unit is triggered.
A kind of implementation method running GPU operation program parallel on a physical host provided in an embodiment of the present invention and Device can be isolated the running environment calculated by container function, GPU hardware resource is tied to specified container, different GPU calculation procedure is run using different containers, is realized and is transported parallel on a physical host by the running environment of isolation The different GPU operation program of row can be avoided the inconsistent caused operation troubles for running environment occur.As it can be seen that the present invention is real Applying example may be implemented to improve virtualization technology to the degree of support of GPU, give full play to the powerful calculating ability of GPU.Certainly, implement Any product of the invention or method do not necessarily require achieving all the advantages described above at the same time.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is a kind of realization for running GPU operation program parallel on a physical host provided in an embodiment of the present invention Method flow schematic diagram;
Fig. 2 is a kind of realization for running GPU operation program parallel on a physical host provided in an embodiment of the present invention Apparatus structure schematic diagram.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
A kind of implementation method running GPU operation program parallel on a physical host provided in an embodiment of the present invention and Device can be isolated the running environment calculated by container function, GPU hardware resource is tied to specified container, different GPU calculation procedure uses different containers, quickly and easily disposes GPU computing environment in conjunction with Container Management tool, avoids the occurrence of The inconsistent caused operation troubles of running environment.As it can be seen that the present invention may be implemented to improve virtualization technology to the support journey of GPU Degree, gives full play to the powerful calculating ability of GPU.
Below by specific embodiment, the present invention is described in detail.
Fig. 1 is a kind of realization for running GPU operation program parallel on a physical host provided in an embodiment of the present invention Method flow schematic diagram includes the following steps:
S101, at least one GPU operation program to be run is obtained.
The GPU operation program is used as to operation program, can be the GPU operation program being stored in physical host, Be also possible to through the received GPU operation program of network transmission, GPU operation program herein can with more than one, such as: GPU Operation program can be one can also be it is multiple, sufficiently to use the powerful operational capability of GPU.
S102, the applicable container mirror image of GPU operation program each to be run is obtained respectively.
The embodiment of the present invention is that the operation of GPU operation program is isolated using container, therefore, needs early period to construct container mirror Picture.In the embodiment of the present invention, container mirror image is that the relevant information of the foundation mounted GPU of physical host constructs in advance, wherein The relevant information of the GPU includes at least: the ardware model number of mounted each GPU, the driver of every kind of model GPU and every The information such as the operation frame of kind model GPU.In practical application, different containers can be constructed respectively for the GPU of every kind of model Mirror image.The container mirror image constructed in advance can be stored in physical host local, can also upload to unified container mirroring service In device.
In addition, may include description information relevant to GPU in GPU operation program in practical application, and such as: GPU fortune The information such as GPU hardware model, GPU driver, GPU operation frame needed for calculating program.
In turn, GPU operation program can be according to the relevant description of GPU for including in the GPU operation program in this step Information is matched with the GPU relevant information of each container mirror image constructed in advance, to obtain GPU operation journey each to be run The applicable container mirror image of sequence.Such as: by GPU hardware model, GPU driver, GPU needed for GPU operation program to be run The information such as operation frame construct the above-mentioned relevant information progress of the GPU of different container mirror images from the GPU for every kind of model Match.The information such as the GPU hardware model needed for GPU operation program to be run, GPU driver, GPU operation frame and some GPU hardware model that container mirror image is supported, GPU driver, the information such as GPU operation frame are consistent, i.e., successful match when, then should Container mirror image is exactly the applicable container mirror image of the GPU operation program to be run.
The building of container mirror image needs on Linux physical host through linux container mirror image building instruction or container mirror It is constructed as the build tool.Wherein, linux container mirror image building instruction is holding in linux system instruction for constructing The instruction of device mirror image, container mirror image the build tool for example: Docker, certainly, in the application only be illustrated for above-mentioned, Container mirror image the build tool is not limited to this.
After the completion of the building of container mirror image, relevant container description of image letter can be added for each different container mirror image Breath, such as: the information such as operation frame of the ardware model number of GPU, the driver of GPU and GPU.
S103, the container mirror image being applicable according to GPU operation program each to be run, for GPU operation each to be run The corresponding container of program creation.
Container is the operation medium of GPU operation program, and container will be established on the basis of container mirror image, and the building of container is wanted Choose the container mirror image to match with GPU operation program., can be according to the packetized form of container mirror image in this step, calling pair The instruction answered creates corresponding container by cell therefor management tool.For example, if container mirror image is to pass through Linux The building instruction building of container mirror image, then instruction can be constructed by linux container and create corresponding container, appearance can also be passed through Device creates tool creation, wherein linux container building instruction is in linux system instruction for constructing the instruction of container;Such as Fruit container mirror image be by container mirror image the build tool (such as: Docker) construct, then can by linux container construct refer to It enables and constructs corresponding container, also can use container the build tool creation container.In practical applications, container mirror image the build tool It can be same the build tool with container the build tool, i.e. the build tool both can be used for constructing container mirror image, can also use In building container.
Container mirror image needed for creating container, in the container mirror image that can be saved from the physical host or container mirror image It is obtained in the container mirror image stored in server.
S104, the hardware resource that non-GPU is configured for each container.
Specifically, in this step, it can be with the hardware resource of dispensing containers GPU non-to CPU, memory, disk I/O and network etc. Limitation, for example the limitation usage quantity of core cpu, the occupancy ratio of CPU, memory size, disk size, network bandwidth etc. be non- The hardware resource of GPU.
It S105, is that each container binds GPU hardware resource according to preset binding granularity.
GPU hardware resource is tied to specified container according to preset binding granularity.Wherein, preset binding granularity packet It includes: at least one computing unit at least one chip or GPU chip in physical display card, that is, the granularity of container binding can Think a chip or multiple chips, or one or more computing units in GPU chip enable the container directly to access Video card physical equipment is to carry out GPU calculating.When physical display card has multiple chips, so that it may bind on one or more chips To a certain container, under the premise of the driver of video card can be supported, the granularity of binding can also be a meter of GPU chip Calculate unit, such as CUDA SM (Streaming Multiprocessors) or OpenCL CU (Computing Unit). Equally one or more computing units can be tied to a certain container, with thinner granularity division GPU computing resource.
S106, each GPU operation program is submitted into corresponding container.
GPU calculation procedure operates in container, it can be understood as runs in the operating system in container.GPU calculates journey Sequence is usually one or more compiled execution file, this step will exactly execute file and upload in container, by container To run GPU calculation procedure.
In practical application, user can log in physical host, utilize the GPU calculation procedure in container instrument operation container; Container remotely can also be directly logged in, GPU the to be run program calculated is then submitted to container, and it is corresponding to install configuration Library is relied on, to execute at least one GPU operation program to be run.That is, the program institute calculated in GPU not run In the case where the dependence library needed, need first to install the corresponding dependence library of configuration.Specific installation configuration method and the prior art It is identical, here, repeat no more.
When GPU operation program is submitted, then this GPU operation program is individually submitted to if it is a GPU operation program The container to match then needs multiple GPU operation programs being submitted to respective institute respectively if it is multiple GPU operation programs Matched container.
The hardware resource for the non-GPU that S107, each container are configured using itself and the GPU hardware resource of binding, operation GPU operation program.
GPU hardware resource can be directly accessed in the GPU operation program in container after binding GPU hardware resource, completely It does not need to do secondary development or adaptation to GPU operation program.
In the case where the non-GPU hardware resource of container and GPU hardware resource are supported, one or more GPU operation programs exist In the container to match, the calculating of GPU operation program is executed using the corresponding dependence library for having installed configuration.
In practical application, after having run GPU operation program, the container of GPU operation program creation can be saved as.Example Such as, container can be stored in physical host, or be stored in container server, in case subsequent use.In this way, ought obtain again to When at least one GPU operation program of operation, so that it may each GPU operation program obtained is directed to, first from the appearance saved The container to match with the GPU operation program is searched in device;The container to match with the GPU operation program is such as found, then will The GPU operation program submits to matched container, and the container to match with the GPU operation program is configured using itself Non- GPU hardware resource and binding GPU hardware resource, run the GPU operation program;As do not found and the GPU operation The container that program matches then obtains the container mirror image that the GPU operation program uses;According to the container mirror image, GPU fortune is created Calculate the corresponding container of program;The hardware resource of non-GPU is configured for the container;It is container binding according to preset binding granularity GPU hardware resource;The GPU operation program is submitted into the container;The hardware resource for the non-GPU that the container is configured using itself With the GPU hardware resource of binding, the GPU operation program is run.
Specifically, the above-mentioned operation GPU operation program to it is above-mentioned to run the process of GPU operation program to be run similar, It can refer to the above-mentioned process for running GPU operation program to be run, the embodiment of the present invention is not herein to the operation GPU operation program It is repeated.
Certainly, after having run GPU operation program, directly the container created for the GPU operation program can also be deleted, The hardware resource and GPU hardware resource for discharging the occupied non-GPU of this container can be by above-mentioned when needing corresponding container again Method rebuilds the container to match with GPU operation program.
Using the method for the embodiment of the present invention shown in FIG. 1, realize on a physical host through the operation of isolation Environment runs different GPU operation programs parallel, that is to say, that one or more GPU fortune can be disposed on a physical host It calculates program and running environment will not generate interference, while giving full play to GPU powerful calculating ability, efficiently support at multichannel data Reason, improves virtualization technology to the degree of support of GPU.
In practical application, user runs GPU calculation procedure by logging in physical host or long-range directly login container. It can be the different physical host access authority of different user settings, or be difference to guarantee the safety of physical host The different container access authority of user setting, or the physical host access authority and difference different for different user settings Container access authority.
For physical host access authority and or container access authority setting, can only in physical host be arranged visit It asks permission and is not provided with access authority in each container handling system, at this point, user has the access authority of physical host just Physical host can be logged on to, the container to match is then searched according to the one or more GPU operation programs to be run, into Enter the container handling system of container that matches;Access authority can also be not provided in physical host and in each container operation system Access authority is set in system, at this point, user is directly entered physical host, then logs on to and is wanted according to corresponding access authority The container handling system that one or more GPU operation programs of operation match;It can also be in physical host and each container It is respectively provided with access authority, at this point, after user has physical host access authority and corresponding container access authority at the same time, Ke Yijin Enter physical host and cell therefor.
By the access authority for configuring physical host and container, it is ensured that user be able to access that physical host and container or The operation of user is set to be confined to container.
Fig. 2 is a kind of realization for running GPU operation program parallel on a physical host provided in an embodiment of the present invention Apparatus structure schematic diagram, it is corresponding with process shown in FIG. 1, comprising: GPU operation program obtaining unit 201, container mirror image obtain Unit 202, container creating unit 203, the hardware resource configuration unit 204 of non-GPU, GPU hardware resource binding unit 205, journey Sequence submits unit 206 and container running unit 207.
Wherein, GPU operation program obtaining unit 201, for obtaining at least one GPU operation program to be run.
Container mirror image obtaining unit 202, for obtaining the applicable container mirror of GPU operation program each to be run respectively Picture;The container mirror image constructs in advance for the relevant information according to the mounted GPU of the physical host;The GPU's Relevant information includes at least: the driver and every kind of model GPU of the ardware model number of mounted each GPU, every kind of model GPU Operation frame.
Container creating unit 203 is each for the container mirror image applicable according to GPU operation program each to be run GPU operation program to be run creates corresponding container.
The hardware resource configuration unit 204 of non-GPU, for configuring the hardware resource of non-GPU for each container.
GPU hardware resource binding unit 205, for being that each container binds GPU hardware money according to preset binding granularity Source.
Program submits unit 206, for each GPU operation program to be submitted to corresponding container.
Container running unit 207, hardware resource and the binding of the non-GPU being configured for each container using itself GPU hardware resource runs GPU operation program.
The GPU operation program that GPU operation program obtaining unit 201 obtains can be the GPU fortune being stored in physical host Calculate program;And/or pass through the received GPU operation program of network.That is GPU operation program obtaining unit 201, specifically can be used for:
Obtain the GPU operation program being stored in the physical host;And/or GPU operation program is received by network.
It in practical applications, include description information relevant to GPU in GPU operation program;It is wrapped in the description information It includes: the operation frame of the ardware model number of GPU required for the GPU operation program, the GPU driver and the GPU;
Container mirror image obtaining unit may include: that description information obtains subelement, coupling subelement and determines subelement (figure It is not shown in 2);Wherein,
Description information obtains subelement, for obtaining the description relevant to GPU in GPU operation program each to be run Information;
Coupling subelement, for by obtain each of GPU operation program to be run the description information, with preparatory structure The relevant information of the GPU for each container mirror image built is matched;
Subelement is determined, for being determined as the applicable appearance of GPU operation program to be run for the container mirror image of successful match Device mirror image.
In practical applications, container mirror image can be stored in the physical host or pre-set container mirroring service In device;Container mirror image obtaining unit 202, specifically can be used for:
Respectively from the container mirror image stored in the container mirror image saved in the physical host or in container mirror image server It is middle to obtain the applicable container mirror image of GPU operation program each to be run.
In addition, device shown in Fig. 2 can also include: capacitor memory cell and container matching unit (being not shown in Fig. 2);
Wherein, capacitor memory cell, for after having run GPU operation program, saving as GPU operation program creation Container triggers GPU operation program obtaining unit 201;
Container matching unit, for being directed to each GPU operation program obtained of GPU operation program obtaining unit 201, from The container to match with the GPU operation program is searched in the container saved;
If found, trigger submits unit 206, so that program submits unit 206 to trigger container running unit 207, run the GPU operation program;
If do not searched, container mirror image obtaining unit 202 is triggered.
The device of the embodiment of the present invention shown in application drawing 2 realizes on a physical host through the operation ring of isolation Border runs different GPU operation programs parallel, that is to say, that one or more GPU operation can be disposed on a physical host Program and running environment will not generate interference, while GPU powerful calculating ability is given full play to, efficiently support multi-data processing, Virtualization technology is improved to the degree of support of GPU.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.
Those of ordinary skill in the art will appreciate that all or part of the steps in realization above method embodiment is can It is completed with instructing relevant hardware by program, the program can store in computer-readable storage medium, The storage medium designated herein obtained, such as: ROM/RAM, magnetic disk, CD.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (11)

1. a kind of implementation method for running GPU operation program parallel on a physical host characterized by comprising
Obtain at least one GPU operation program to be run;It include description relevant to GPU in the GPU operation program Information;Include: in the description information ardware model number of GPU required for the GPU operation program, the driver of the GPU and The operation frame of the GPU;
The applicable container mirror image of GPU operation program each to be run is obtained respectively;The container mirror image is according to the physics What the relevant information of the mounted GPU of host constructed in advance;The relevant information of the GPU includes at least: mounted each GPU Ardware model number, the driver of every kind of model GPU and the operation frame of every kind of model GPU;It is described obtain respectively it is each to be shipped The applicable container mirror image of capable GPU operation program, comprising: obtain relevant to GPU in GPU operation program each to be run Description information;By the description information of each of acquisition GPU operation program to be run, with each container mirror constructed in advance The relevant information of the GPU of picture is matched;By the container mirror image of successful match, it is determined as GPU operation program to be run and is applicable in Container mirror image
According to the container mirror image that each GPU operation program to be run is applicable in, for GPU operation program creation pair each to be run The container answered;
The hardware resource of non-GPU is configured for each container;
It is that each container binds GPU hardware resource according to preset binding granularity;The preset binding granularity, comprising: physics At least one computing unit at least one chip or GPU chip in video card;
Each GPU operation program is submitted into corresponding container;
The hardware resource for the non-GPU that each container is configured using itself and the GPU hardware resource of binding run GPU operation journey Sequence.
2. implementation method according to claim 1, which is characterized in that described to obtain at least one GPU operation to be run Program, comprising:
Obtain the GPU operation program being stored in the physical host;
And/or
GPU operation program is received by network.
3. implementation method according to claim 1, which is characterized in that the container mirror image is stored in the physical host Or in pre-set container mirror image server;
It is described to obtain the applicable container mirror image of GPU operation program each to be run respectively, comprising:
It is obtained from the container mirror image stored in the container mirror image saved in the physical host or in container mirror image server respectively Obtain the container mirror image that GPU operation program each to be run uses.
4. implementation method according to claim 1, which is characterized in that the hardware resource of the non-GPU includes at least:
Occupancy ratio, memory size and the disk size of the usage quantity of core cpu, CPU.
5. implementation method according to claim 1, which is characterized in that the container mirror image is to utilize linux container mirror As building instruction, or constructed by container mirror image the build tool.
6. implementation method according to claim 5, which is characterized in that each GPU operation program to be run of the basis Applicable container mirror image creates corresponding container for GPU operation program each to be run, comprising:
It calls linux container building instruction or container is created by container the build tool.
7. according to claim 1 to implementation method described in 6 any one, which is characterized in that further include:
After having run GPU operation program, the container of GPU operation program creation is saved as, it is to be shipped to continue to execute the acquisition At least one capable GPU operation program;
For each GPU operation program obtained, the appearance to match with the GPU operation program is searched from the container saved Device;
The container to match with the GPU operation program is such as found, then the GPU operation program is submitted into matched appearance Device, the hardware resource for the non-GPU that the container to match with the GPU operation program is configured using itself and the GPU hardware of binding Resource runs the GPU operation program;
If do not found the container to match with the GPU operation program, then the container mirror image that the GPU operation program uses is obtained; According to the container mirror image, the corresponding container of GPU operation program is created;The hardware resource of non-GPU is configured for the container;According to pre- If binding granularity be the container bind GPU hardware resource;The GPU operation program is submitted into the container;The container utilizes certainly The hardware resource for the non-GPU that body is configured and the GPU hardware resource of binding, run the GPU operation program.
8. a kind of realization device for running GPU operation program parallel on a physical host characterized by comprising
GPU operation program obtaining unit, for obtaining at least one GPU operation program to be run;The GPU operation program In include description information relevant to GPU;It include: the hardware of GPU required for the GPU operation program in the description information The operation frame of model, the GPU driver and the GPU;
Container mirror image obtaining unit, for obtaining the applicable container mirror image of GPU operation program each to be run respectively;Described Container mirror image constructs in advance for the relevant information according to the mounted GPU of the physical host;The relevant information of the GPU is extremely It less include: the operation frame of the ardware model number of mounted each GPU, the driver of every kind of model GPU and every kind of model GPU Frame;The container mirror image obtaining unit includes: that description information obtains subelement, coupling subelement and determines subelement;It is described to retouch Information acquisition subelement is stated, for obtaining the description information relevant to GPU in GPU operation program each to be run;It is described Coupling subelement, for by obtain each of GPU operation program to be run the description information, it is each with constructing in advance The relevant information of the GPU of container mirror image is matched;The determining subelement, for determining the container mirror image of successful match The container mirror image being applicable in for GPU operation program to be run;
Container creating unit, for the container mirror image applicable according to GPU operation program each to be run, each wait run GPU operation program creates corresponding container;
The hardware resource configuration unit of non-GPU, for configuring the hardware resource of non-GPU for each container;
GPU hardware resource binding unit, for being that each container binds GPU hardware resource according to preset binding granularity;It is described Preset binding granularity, comprising: at least one computing unit at least one chip or GPU chip in physical display card;
Program submits unit, for each GPU operation program to be submitted to corresponding container;
Container running unit, the hardware resource for the non-GPU being configured for each container using itself and the GPU hardware money of binding Source runs GPU operation program.
9. realization device according to claim 8, which is characterized in that the GPU operation program obtaining unit is specific to use In:
Obtain the GPU operation program being stored in the physical host;
And/or
GPU operation program is received by network.
10. realization device according to claim 9, which is characterized in that the container mirror image is stored in the physical host In or pre-set container mirror image server in;
The container mirror image obtaining unit, is specifically used for:
It is obtained from the container mirror image stored in the container mirror image saved in the physical host or in container mirror image server respectively Obtain the applicable container mirror image of GPU operation program each to be run.
11. according to realization device described in claim 8 to 10 any one, which is characterized in that further include: capacitor memory cell With container matching unit;
The capacitor memory cell, for after having run GPU operation program, saving as the container of GPU operation program creation, Trigger the GPU operation program obtaining unit;
The container matching unit, for being directed to the GPU operation program obtaining unit each GPU operation program obtained, The container to match with the GPU operation program is searched from the container saved;
If found, trigger described program and submit unit, so that described program submits the operation of container described in unit triggers single Member runs the GPU operation program;
If do not searched, the container mirror image obtaining unit is triggered.
CN201511024848.9A 2015-12-30 2015-12-30 The implementation method and device of parallel operation GPU operation program Active CN105677479B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201511024848.9A CN105677479B (en) 2015-12-30 2015-12-30 The implementation method and device of parallel operation GPU operation program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511024848.9A CN105677479B (en) 2015-12-30 2015-12-30 The implementation method and device of parallel operation GPU operation program

Publications (2)

Publication Number Publication Date
CN105677479A CN105677479A (en) 2016-06-15
CN105677479B true CN105677479B (en) 2019-05-10

Family

ID=56298242

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511024848.9A Active CN105677479B (en) 2015-12-30 2015-12-30 The implementation method and device of parallel operation GPU operation program

Country Status (1)

Country Link
CN (1) CN105677479B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106302632B (en) * 2016-07-21 2020-02-14 华为技术有限公司 Downloading method of basic mirror image and management node
CN107797843B (en) * 2016-09-02 2021-04-20 华为技术有限公司 Method and device for enhancing function of container
CN106445638A (en) * 2016-10-08 2017-02-22 深圳市云舒网络技术有限公司 Data acquisition and processing system based on container technology
CN106970822A (en) * 2017-02-20 2017-07-21 阿里巴巴集团控股有限公司 A kind of container creation method and device
CN108574712B (en) * 2017-03-13 2021-06-01 阿里巴巴集团控股有限公司 Method and device for creating container service cluster
CN108804217A (en) * 2017-04-26 2018-11-13 中兴通讯股份有限公司 A kind of resource scheduling device, resource scheduling system and resource regulating method
CN109041233B (en) 2017-05-05 2019-08-23 华为技术有限公司 A kind of method and apparatus of resource distribution
CN107229830A (en) * 2017-06-01 2017-10-03 上海联影医疗科技有限公司 Radiotherapy planning system and its task executing method
CN107454188A (en) * 2017-08-28 2017-12-08 郑州云海信息技术有限公司 A kind of container creation method and system
CN108958910B (en) * 2018-05-21 2020-12-18 福建省数字福建云计算运营有限公司 Task scheduling method and terminal based on heterogeneous environment
CN109656723A (en) * 2019-03-13 2019-04-19 联想(北京)有限公司 Container resource regulating method and device
CN112866321B (en) * 2019-11-28 2024-06-18 中兴通讯股份有限公司 Resource scheduling method, device and system
CN111045786B (en) * 2019-11-28 2020-07-24 北京大学 Container creation system and method based on mirror image layering technology in cloud environment
CN111047505A (en) * 2019-12-20 2020-04-21 北京浪潮数据技术有限公司 GPU multiplexing method, device, equipment and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103631634A (en) * 2012-08-24 2014-03-12 中国电信股份有限公司 Graphics processor virtualization achieving method and device
CN103761139A (en) * 2014-01-25 2014-04-30 湖南大学 General purpose computation virtualization implementation method based on dynamic library interception
CN105138389A (en) * 2015-07-30 2015-12-09 北京京东尚科信息技术有限公司 Method and system for managing virtual devices in cluster

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103631634A (en) * 2012-08-24 2014-03-12 中国电信股份有限公司 Graphics processor virtualization achieving method and device
CN103761139A (en) * 2014-01-25 2014-04-30 湖南大学 General purpose computation virtualization implementation method based on dynamic library interception
CN105138389A (en) * 2015-07-30 2015-12-09 北京京东尚科信息技术有限公司 Method and system for managing virtual devices in cluster

Also Published As

Publication number Publication date
CN105677479A (en) 2016-06-15

Similar Documents

Publication Publication Date Title
CN105677479B (en) The implementation method and device of parallel operation GPU operation program
CN100428168C (en) Method, system and program product for capturing central processing unit (CPU) utilization for a virtual machine
US10929275B2 (en) Automatic test stack creation via production system replication
DE102021207514A1 (en) DISAGGREGATED COMPUTING FOR DISTRIBUTED CONFIDENTIAL COMPUTING ENVIRONMENT
CN104541247B (en) System and method for adjusting cloud computing system
CN104737133B (en) Optimized using the Distributed Application of service group
CN110347596A (en) A kind of test method, device, system, electronic equipment and medium
Coady et al. Distributed cloud computing: Applications, status quo, and challenges
CN103729257B (en) Distributed parallel computing method and system
CN111105006B (en) Deep learning network training system and method
CN105956666B (en) A kind of machine learning method and system
CN109074265A (en) The preformed instruction of mobile cloud service
CN110515628A (en) Application deployment method and device
CN110945481B (en) Method for executing a tuple graph program across a network
CN109254854A (en) Asynchronous invoking method, computer installation and storage medium
CN105204917B (en) The method and device of loading configuration file in application program launching
CN110008019B (en) Method, device and system for sharing server resources
CN110325978A (en) Cyclic dart connection: network-efficient, later period materialization, distributed connection technique
CN109587997A (en) Method, electronic equipment and the computer readable storage medium of distribution server position
Margery et al. Openmask:{Multi-Threaded| Modular} animation and simulation {Kernel| Kit}: a general introduction
CN107391528B (en) Front-end component dependent information searching method and equipment
CN116414518A (en) Data locality of big data on Kubernetes
EP3069272B1 (en) Managing job status
CN109804365A (en) Elastic geography database copy method
CN109710303A (en) The multi version parallel developing method and system of interactive voice product

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant