CN110008028B

CN110008028B - Computing resource allocation method and device, computer equipment and storage medium

Info

Publication number: CN110008028B
Application number: CN201910285304.XA
Authority: CN
Inventors: 姚成吉; 杨越; 高华佐; 田忠博; 贾开
Original assignee: Beijing Kuangshi Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd
Priority date: 2019-04-10
Filing date: 2019-04-10
Publication date: 2021-08-06
Anticipated expiration: 2039-04-10
Also published as: CN110008028A

Abstract

The application relates to a computing resource allocation method, a computing resource allocation device, computer equipment and a storage medium. The method comprises the following steps: traversing a topological structure diagram of the distributed neural network to obtain an operation instruction of each machine; traversing the operation instruction of each machine according to the topological structure diagram to obtain a single-machine neural network corresponding to each machine; acquiring the state of the running instruction of each machine; and training or reasoning calculation is carried out in the single-machine neural network corresponding to each machine according to the state of the operation instruction of each machine. By adopting the method, resource allocation can be conveniently carried out in the distributed deep neural network, and the operation efficiency of the distributed deep neural network is improved.

Description

Computing resource allocation method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of information processing technologies, and in particular, to a method and an apparatus for allocating computing resources, a computer device, and a storage medium.

Background

With the development of internet technology, the deep neural network has been widely applied to the artificial intelligence fields of image recognition, voice processing and the like by virtue of the characteristics of good generalization, easy training and the like. Because the operation of the neural network has high requirements on the computing power and the memory capacity of the device, in order to operate on the device with low computing power and low memory capacity, the neural network is often required to be modified so that the neural network can operate in a distributed manner on a plurality of machines and a plurality of devices.

Distributed neural networks are capable of performing data-parallel, model-parallel, and hybrid-parallel methods. Data parallelism means that different machines have multiple copies of the same model, different machines are allocated to different data inputs, and then calculation results of all the machines are combined in a certain mode; model parallelism refers to the different machines in a distributed system being responsible for different parts of a single network model, such as: different operation operations of the neural network model are distributed to different machines, or one large operation is divided into a plurality of small operation operations and distributed to different machines; hybrid parallelism refers to the fact that each machine not only has different inputs, but also has some differences in the neural network structure.

However, the current resource allocation scheme in the distributed neural network is very cumbersome, and the operation efficiency is easily reduced if the resource allocation scheme is not properly processed.

Disclosure of Invention

In view of the foregoing, there is a need to provide a method, an apparatus, a computer device, and a storage medium for allocating computing resources in a distributed deep neural network, which can facilitate resource allocation in the distributed deep neural network and improve the operating efficiency of the distributed deep neural network.

A method of computing resource allocation, the method comprising:

traversing a topological structure diagram of the distributed neural network to obtain an operation instruction of each machine;

traversing the operation instruction of each machine according to the topological structure diagram to obtain a single-machine neural network corresponding to each machine;

acquiring the state of the running instruction of each machine;

and training or reasoning calculation is carried out in the single-machine neural network corresponding to each machine according to the state of the operation instruction of each machine.

In one embodiment, the traversing the topology structure diagram of the distributed neural network to obtain the operation instruction of each machine includes:

and traversing the topology structure diagram, and if the machine number of the operation instruction in the topology structure diagram is consistent with the currently operated machine number, taking the operation instruction as the operation instruction in the currently operated machine.

In one embodiment, the traversing the operation instruction of each machine according to the topology structure diagram to obtain a single-machine neural network corresponding to each machine includes:

traversing the operating instruction of each machine according to the topological structure diagram to obtain a machine number parameter and an equipment number parameter of the operating instruction of each machine;

judging whether a data copying instruction needs to be added between different devices or not and whether a network transmission instruction needs to be added between different machines or not according to the machine number parameter and the equipment number parameter;

constructing a forward subgraph and a backward subgraph of each single-machine neural network according to the topological structure diagram;

and performing gradient calculation on a plurality of running instructions with the same state in each single machine neural network to obtain the single machine neural network corresponding to each machine.

In one embodiment, the traversing the operation instruction of each machine according to the topology structure diagram to obtain a single-machine neural network corresponding to each machine further includes:

and constructing a forward sub-graph of each single-machine neural network according to the topology structure graph to obtain the single-machine neural network corresponding to each machine.

In one embodiment, the determining whether a data copy instruction needs to be added between different devices and whether a network transmission instruction needs to be added between different machines according to the machine number parameter and the device number parameter includes:

if the machine number parameter of the operation instruction is consistent with the input machine number of the operation instruction, and the equipment number parameter of the operation instruction is inconsistent with the input equipment number of the operation instruction, adding the data copy instruction between different equipment;

and if the machine number parameter of the operation instruction is not consistent with the input machine number of the operation instruction, adding the network transmission instruction between different machines.

In one embodiment, the constructing a forward sub-graph and a backward sub-graph of each of the single-machine neural networks according to the topology structure diagram includes:

constructing the forward subgraph through forward calculation to obtain an end point of each single-machine neural network;

and constructing the backward subgraph through backward calculation, and updating the state of the operation instruction in each single-machine neural network.

In one embodiment, the performing a gradient calculation on a plurality of operating instructions with the same state in each of the single-machine neural networks includes:

and adding the original gradients of a plurality of operating instructions with the same state to obtain the updated gradient of each operating instruction.

In one embodiment, the method further comprises: and constructing a topological structure diagram of the distributed neural network.

In one embodiment, the constructing the topology structure diagram of the distributed neural network includes:

distributing the operation instructions of each machine to an instruction list according to the state of the operation instructions of each machine;

calculating the operation instruction in the instruction list in the distributed neural network, and updating the state of the operation instruction in real time;

and after the calculation is finished, storing the state of each operation instruction in the instruction list.

In one embodiment, the allocating the operation instructions of the respective machines to an instruction list according to the states of the operation instructions of the respective machines includes:

and distributing a plurality of running instructions with the same state to the same instruction list.

In one embodiment, the calculating the operation instructions in the instruction list in the distributed neural network includes:

calculating the operation instruction in the instruction list in the distributed neural network to obtain a first calculation result;

and merging the first calculation results corresponding to the instruction lists to obtain a distributed calculation result.

In one embodiment, the method further comprises:

in the process of computing resource allocation and reasoning computation, regularly storing the state of the operation instruction; or

And in the process of computing resource allocation and reasoning computation, performing distributed storage on the state of the operation instruction.

In one embodiment, the execution instruction includes a single device execution instruction, a distributed execution instruction, and a parameter execution instruction.

In one embodiment, the parameters of the execution instruction further include a distributed attribute, a split execution instruction, and a merge execution instruction.

An apparatus for computing resource allocation, the apparatus comprising:

the operation instruction acquisition module is used for traversing the topological structure diagram of the distributed neural network to obtain operation instructions of all machines;

the single-machine neural network acquisition module is used for traversing the operation instruction of each machine according to the topology structure diagram to obtain a single-machine neural network corresponding to each machine;

the operation instruction state acquisition module is used for acquiring the states of the operation instructions of the machines;

and the computing module is used for training or performing inference computation in the single-machine neural network corresponding to each machine according to the state of the operation instruction of each machine.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

acquiring the state of the running instruction of each machine;

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

acquiring the state of the running instruction of each machine;

According to the computing resource allocation method, the device, the computer equipment and the storage medium, the operation instruction of each machine is extracted from the topological structure diagram of the distributed neural network, the single-machine neural network corresponding to each machine is obtained according to the operation instruction of each machine, and finally training or reasoning calculation is carried out in the single-machine neural network corresponding to each machine according to the state of the operation instruction of each machine, so that the convenience of storing and loading the state of the operation instruction in the distributed neural network is improved, meanwhile, the synchronous execution of computing resource allocation and reasoning calculation of a plurality of machines can be realized, and the operation efficiency of the distributed deep neural network is improved.

Drawings

FIG. 1 is a schematic diagram of a computing resource allocation system 100 in one embodiment;

FIG. 2 is a flow diagram that illustrates a computing resource allocation methodology, according to one embodiment;

FIG. 3 is a diagram of a topology of a distributed neural network in one embodiment;

FIG. 4 is a schematic diagram illustrating a process flow for implementing model transformation during training in one embodiment;

FIG. 5 is a flow diagram illustrating model transformation performed during inference calculation in one embodiment;

FIG. 6 is a schematic flow diagram illustrating the construction of a distributed neural network in one embodiment;

FIG. 7 is a block diagram of an apparatus for allocating computing resources in one embodiment;

FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, as shown in FIG. 1, there is provided a computing resource allocation system 100 comprising: the system comprises an allocation unit 101, a conversion unit 102, a storage unit 103, a loading unit 104 and a calculation unit 105, wherein the allocation unit 101 is used for allocating the operation instructions to corresponding instruction lists according to state identifiers of the operation instructions; the conversion unit 102 is configured to obtain a single-machine neural network corresponding to each machine according to the topology structure diagram of the distributed neural network; a storage unit 103 for storing a state of the operation instruction; a loading unit 104 for loading a state of the operation instruction; a computing unit 105, configured to perform inference computation in the distributed neural network or the stand-alone neural network. In addition, the computing resource allocation system 100 further comprises a design unit 106 for providing a set of primitives (i.e. parameters of the running instructions) that may describe the distributed neural network described above.

In one embodiment, as shown in FIG. 2, there is provided a computing resource allocation method operable on the computing resource allocation system shown in FIG. 1, the method comprising the steps of:

step S202, traversing the topological structure diagram of the distributed neural network to obtain the operation instruction of each machine.

Wherein, the distributed neural network refers to a multi-machine global neural network; the topology structure diagram of the distributed neural network refers to a topology structure diagram in which a plurality of machines perform distributed computation, please refer to fig. 3; the execution instructions are used to instruct the respective machine what kind of operations to perform, such as: the addition operation instruction is used for instructing the machine to perform addition operation on the input data, and the comparison operation instruction is used for instructing the machine to perform comparison operation on a plurality of input data.

Further, the operation instruction comprises a single-device operation instruction, a distributed operation instruction and a parameter operation instruction. The single-equipment operation instruction refers to an instruction which can only be operated on a single machine or equipment, and comprises a single-equipment operation instruction with a state and a single-equipment operation instruction without the state; the distributed operation instruction refers to an operation instruction with distributed attributes, that is, the distributed operation instruction can be operated in a distributed manner on a plurality of machines or devices, the distributed operation instruction comprises a distributed operation instruction with a state and a distributed operation instruction without a state, and one distributed operation instruction can be composed of one or more single device operation instructions; the parameter operation instruction refers to one of the operation instructions with the state, that is, the parameter operation instructions are all the operation instructions with the state, and the parameter operation instruction may be a single device operation instruction with the state or a distributed operation instruction with the state.

Specifically, the computing resource allocation system may traverse a topology structure diagram of the multi-machine global distributed neural network, and obtain an operation instruction of each machine according to a parameter of the operation instruction in the topology structure diagram.

And step S204, traversing the operation instruction of each machine according to the topology structure diagram to obtain a single-machine neural network corresponding to each machine.

A stand-alone neural network refers to a neural network that is capable of performing inferential computations on individual machines. Specifically, the computing resource allocation system may convert the distributed neural network into a single-machine neural network corresponding to each machine according to a certain conversion rule according to the operation instruction of each machine obtained in step S202.

Step S206, acquiring the states of the operation instructions of the respective machines.

In particular, the computing resource allocation system may obtain the state of the run instruction for each machine by its state identifier (tag).

As an alternative embodiment, after the training of the distributed neural network is completed, the computing resource allocation system may store the state of the single device operation instructions with state identifiers (tags) on each machine.

As another alternative, during the process of computing resource allocation and inference calculation, the computing resource allocation system may periodically store the state of the single device operation instruction with a state identifier (tag) on each machine according to a preset time period. Alternatively, the time period of the periodic storage may be set according to the allocation or operation requirement.

As another alternative, in the process of computing resource allocation and inference calculation, the computing resource allocation system may perform distributed storage on the states of the single device operation instructions with state identifiers (tags) on the respective machines according to a preset storage rule. Optionally, the storage rule of the distributed storage may be set according to the allocation or operation requirement, for example: a plurality of single device execution instructions having a state identifier (tag) of 0 may be stored on the machine having a machine number of 0, and a plurality of single device execution instructions having a state identifier (tag) of 1 may be stored on the machine having a machine number of 1.

And S208, training or reasoning calculation is carried out in the single-machine neural network corresponding to each machine according to the state of the operation instruction of each machine.

The reasoning calculation in the single-machine neural network corresponding to each machine means that the trained single-machine neural network corresponding to each machine is used for reasoning calculation. Specifically, the computing resource allocation system loads the state of a single device operating instruction stored in advance, and trains or performs inference calculation in a single-machine neural network corresponding to each machine according to the state of the single device operating instruction. Optionally, the computing resource allocation system may perform distributed loading on the pre-stored state of the single device operation instruction.

According to the computing resource allocation method, the operating instructions of all the machines are extracted from the topological structure diagram of the distributed neural network, the single-machine neural network corresponding to each machine is obtained according to the operating instructions of all the machines, and finally training or reasoning calculation is carried out in the single-machine neural network corresponding to each machine according to the state of the operating instructions of all the machines, so that the operating instruction state can be conveniently stored and loaded and the model conversion work from the multi-machine network to the single-machine network can be conveniently carried out in the overload of computing resource allocation and reasoning calculation executed by the distributed neural network, and the operating efficiency of the distributed deep neural network is improved.

In one embodiment, step S202 specifically includes: and traversing the topology structure diagram, and if the machine number of the operation instruction in the topology structure diagram is consistent with the currently operated machine number, taking the operation instruction as the operation instruction in the currently operated machine.

The parameters of the running instruction include a machine number parameter (node), a device number parameter (device), a state identifier (tag), distributed attributes (distributed attributes), an instruction list (placement context), a split running instruction (sub placement), and a merge running instruction (contact placement).

The machine number parameter (node) indicates on which machine the operation instruction operates, a plurality of machines are connected through a network, and one machine can comprise a plurality of devices; the device number parameter (device) refers to the device on which the operation instruction operates, and multiple devices in one machine do not need to be connected through a network; the state identifier (tag) is used for representing the state of the execution instruction, and can be used for distinguishing whether a plurality of execution instructions are in a shared state or not; distributed attributes (distributed attributes) are used to indicate the locations of calculation and output of the running instructions, each distributed attribute is composed of a node, a device, and a tag, and indicates which machines and which devices of each machine the distributed running instructions run on, and the state identifier of the distributed running instruction is used to describe the state sharing condition of multiple single device running instructions in the distributed running, for example: if the distributed operation instruction comprises 4 single-device operation instructions, defining tag of the distributed operation instruction as [0,1,0,1], indicating that the 0 th single-device operation instruction and the 2 nd single-device operation instruction in the distributed operation instruction are in a shared state, and indicating that the 1 st single-device operation instruction and the 3 rd single-device operation instruction are in a shared state; the instruction list (place context) can be used for describing the distributed neural network in an auxiliary mode, the instruction list is composed of a node, a device and a tag, and all the operation instructions in the same instruction list are distributed operation instructions; the splitting operation instruction (sub-placement) is a special operation instruction and is used for splitting the distributed operation instruction to obtain a part of single-device operation instructions; the merge operation instruction (contact placement) is also a special operation instruction, and is used for merging the plurality of distributed operation instructions, and the merged operation instruction includes all single device operation instructions in the plurality of distributed operation instructions.

Further, the parameters of the operating instructions also include a series of communication parameters, such as: allreduce, allgather, broadcast, etc., which is used to indicate what aggregate communication operations the run instruction performs.

Specifically, the computing resource allocation system traverses a topology structure diagram of a multi-machine global distributed neural network, and obtains machine number parameters of each operation instruction in the topology structure diagram, including obtaining the machine number parameter of each single-device operation instruction in the distributed operation instructions, and compares the obtained machine number parameter of the single-device operation instruction with the machine number of the current machine, if the obtained machine number parameter of the single-device operation instruction is consistent with the machine number of the current machine, the single-device operation instruction is reserved, and the single-device operation instruction is used as the operation instruction in the currently-operating machine.

In the above method for allocating computing resources, the machine number of the running instruction in the distributed neural network is compared with the machine number of the current machine, so as to obtain the running instruction in the current machine, and the running instruction of the single-machine neural network corresponding to each machine can be conveniently extracted, thereby improving the running efficiency of the distributed neural network.

In one embodiment, as shown in fig. 4, in the process of training the distributed neural network, the step S204 specifically includes the following steps:

step S2042, traversing the operation instruction of each machine according to the topology structure diagram to obtain the machine number parameter and the equipment number parameter of the operation instruction of each machine.

Specifically, the computing resource allocation system traverses the operation instructions of the machines obtained in step S202 according to the topology structure diagram of the multi-machine global distributed neural network, and obtains the machine number parameters and the device number parameters of the operation instructions of the machines, including obtaining the machine number parameters and the device number parameters of each single device operation instruction in the distributed operation instructions.

Step S2044, according to the machine number parameter and the device number parameter, determining whether a data copy command needs to be added between different devices, and whether a network transmission command needs to be added between different devices.

The data copying instruction is used for data transmission between different devices of the same machine; the network transmission instructions are used for data transmission between different machines through a network.

Specifically, the computing resource allocation system determines whether a data copy instruction needs to be added between different devices of the same machine and whether a network transmission instruction needs to be added between different machines according to the machine number parameter and the device number parameter of the operation instruction of each machine acquired in step S2042.

As an optional implementation manner, step S2044 specifically includes the following steps:

step S20442, if the machine number parameter of the operation instruction is consistent with the input machine number of the operation instruction, and the device number parameter of the operation instruction is inconsistent with the input device number of the operation instruction, adding the data copy instruction between different devices.

Specifically, the computing resource allocation system compares the machine number parameter of the single device operation instruction obtained in step S2042 with the input machine number of the operation instruction, and if the machine number parameter of the obtained single device operation instruction is identical to the input machine number of the operation instruction, it indicates that the single device operation instruction is input to the same machine. Further, the computing resource allocation system compares the device number parameter of the single device operation instruction obtained in step S2042 with the input device number of the operation instruction, and if the device number parameter of the obtained single device operation instruction is consistent with the input device number of the operation instruction, it indicates that the single device operation instruction is input to the same device of the same machine, and no additional processing is required; if the device number parameter of the obtained single device operation instruction is not consistent with the input device number of the operation instruction, it is indicated that the single device operation instruction is input to different devices of the same machine, and a data copy instruction needs to be added between different devices of the same machine, so that data transmission between different devices is facilitated.

Step S20444, if the machine number parameter of the operation instruction is not consistent with the input machine number of the operation instruction, adding the network transmission instruction between different machines.

Specifically, the computing resource allocation system compares the machine number parameter of the single device operation instruction obtained in step S2042 with the input machine number of the operation instruction, and if the machine number parameter of the obtained single device operation instruction is not consistent with the input machine number of the operation instruction, it indicates that the single device operation instruction is input to a different machine, and a network transmission instruction needs to be added between the different machines, so as to facilitate data transmission between the different machines.

Step S2046, according to the topology structure diagram, a forward subgraph and a backward subgraph of each single-machine neural network are constructed.

The forward subgraph refers to a topological structure diagram obtained by traversing the distributed neural network in a forward direction; the backward subgraph refers to a topological structure diagram obtained by reversely traversing the distributed neural network.

Specifically, the computing resource allocation system can construct a forward subgraph and a backward subgraph of the single-machine neural network corresponding to each machine in a forward traversal and reverse traversal mode according to the topological structure diagram of the multi-machine global distributed neural network.

As an optional implementation manner, step S2046 specifically includes the following steps:

step S20462, the forward subgraph is constructed through forward calculation, and the end point of each single-machine neural network is obtained.

The forward calculation is used for calculating the influence of the nodes of the input layer on the nodes of the hidden layer, namely, the distributed neural network is traversed forward according to the sequence of the input layer, the hidden layer and the output layer, and the influence of each node in the topological structure diagram on the nodes of the next layer is calculated.

Specifically, the computing resource allocation system judges which operation instructions need to be subjected to forward computation according to a topology structure diagram of the distributed neural network, constructs forward sub-graphs corresponding to each single-machine neural network by executing the forward computation, and obtains end points (endipoints) of the sub-graphs corresponding to the single-machine neural network.

Step S20464, constructing the backward subgraph through backward calculation, and performing state update on the operation instruction in each single-machine neural network.

The backward calculation is used for adjusting the weight relation in the neural network, reducing the deviation between the obtained output result and the actual result, constructing backward subgraphs corresponding to the single-machine neural networks by executing the backward calculation, and updating the state of the single-equipment operation instructions in the single-machine neural networks.

Step S2048, performing gradient calculation on the multiple operation instructions in the same state in each single machine neural network to obtain a single machine neural network corresponding to each machine.

Specifically, if a plurality of parameter operation instructions with the same state exist in the single-machine neural network, gradient calculation is performed on the plurality of parameter operation instructions with the same state in the process of performing backward calculation, the calculated gradient is used as a new gradient of each parameter operation instruction, and finally a single-machine neural network subgraph corresponding to each machine is obtained.

As an optional implementation manner, step S2048 specifically includes: and adding the original gradients of a plurality of operating instructions with the same state to obtain the updated gradient of each operating instruction.

Wherein, the original gradient of the operation instruction refers to the gradient of the operation instruction in the multi-machine global distributed neural network; the updated gradient of the running instruction refers to the gradient of the running instruction in the stand-alone neural network.

Specifically, in the process of performing backward calculation, the calculation resource allocation system performs gradient calculation on a plurality of parameter operation instructions in the same state through a communication parameter allreduce, and uses the calculated gradient as an update gradient of each parameter operation instruction.

In the computing resource allocation method, the multi-machine global distributed neural network is converted into the single-machine neural network corresponding to each machine according to a certain conversion rule, and model conversion from the global network to the single-machine network can be conveniently realized in the neural network training process.

In one embodiment, as shown in fig. 5, in the process of performing inference calculation on the distributed neural network, step S204 specifically includes the following steps:

step S2042a, traversing the operation instruction of each machine according to the topology structure diagram to obtain a machine number parameter and an equipment number parameter of the operation instruction of each machine.

Please refer to step S2042.

Step S2044a, according to the machine number parameter and the device number parameter, determining whether a data copy command needs to be added between different devices, and whether a network transmission command needs to be added between different devices.

Please refer to step S2044.

Step S2046a, according to the topology structure diagram, constructing a forward sub-graph of each single-machine neural network to obtain a single-machine neural network corresponding to each machine.

Referring to step S2046, unlike step S2046, in the process of performing inference calculation on the distributed neural network, it is not necessary to perform backward calculation to construct a backward subgraph, and therefore, it is not necessary to perform gradient calculation on the running instruction.

According to the computing resource allocation method, the multi-machine global distributed neural network is converted into the single-machine reasoning neural network corresponding to each machine according to a certain conversion rule, and model conversion from the global network to the single-machine network can be conveniently realized in the process of executing reasoning calculation in the trained neural network.

In one embodiment, as shown in FIG. 6, another computing resource allocation method is provided, which is operable on the computing resource allocation system shown in FIG. 1, the method comprising: before the calculation resource allocation and the inference calculation, a topological structure diagram of the distributed neural network is constructed, and the method specifically comprises the following steps:

step S302, according to the state of the operation instruction of each machine, allocating the operation instruction of each machine to an instruction list.

Specifically, the computing resource allocation system may obtain the state of the operation instruction of each machine by using a state identifier (tag) of the operation instruction of each machine, and allocate a plurality of operation instructions with the same state to the same instruction list.

Step S304, calculating the operation instruction in the instruction list in the distributed neural network, and updating the state of the operation instruction in real time.

As an optional implementation manner, step S302 specifically includes the following steps:

step S3022, calculating the operation instruction in the instruction list in the distributed neural network to obtain a first calculation result.

The first calculation result refers to a calculation result obtained by calculating the operation instructions in each instruction list in the distributed neural network, and the number of the first calculation results corresponds to the number of the instruction lists one to one.

Step S3024, merging the first calculation results corresponding to the instruction lists to obtain a distributed calculation result.

Specifically, the computing resource allocation system merges the plurality of first computations in step S3022 through a series of operations such as communication parameters allreduce, allgather, and broadcast, to obtain a distributed computation result.

For example, the distributed neural network includes two machine nodes 0 and 1, each machine includes two devices device0 and device1, two instruction lists pcon1 and pcon2 are preset, in the topology structure diagram shown in fig. 3, it is defined that both the two execution instructions data and conv are distributed execution instructions, the states of data and conv are the same, and the data and the conv both comprise 4 single-device operation instructions, the data and the conv are distributed into an instruction list pcon1 to execute distributed computation to obtain a first computation result y1 corresponding to the pcon1, the single-device operation instructions of the full connection layer in the neural network are distributed into an instruction list pcon2 to execute distributed computation to obtain a first computation result y2 corresponding to the pcon2, merging y1 and y2 through a series of operations such as communication parameters allreduce, allgather, broadcast and the like to finally obtain a distributed computing result y, where data _0_0 indicates that data is a single device operation instruction on node0 and device 0.

Step S306, after the calculation is completed, storing the state of each operation instruction in the instruction list.

Specifically, after the distributed neural network computation is completed, the computing resource allocation system may store the state of each run instruction in the respective instruction lists.

According to the computing resource allocation method, the operating instructions are allocated to different instruction lists according to the states of the operating instructions, and the operating instructions in the instruction lists are calculated in the distributed neural network, so that the distributed neural network can be conveniently constructed, and meanwhile, the states of the operating instructions are conveniently stored.

It should be understood that although the various steps in the flow charts of fig. 2-6 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-6 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 7, there is provided a computing resource allocation apparatus, including: an operation instruction obtaining module 401, a stand-alone neural network obtaining module 402, an operation instruction state obtaining module 403, and a calculating module 404, wherein:

the operation instruction acquisition module 401 is configured to traverse a topology structure diagram of the distributed neural network to obtain an operation instruction of each machine;

a stand-alone neural network acquisition module 402, configured to traverse the operation instruction of each machine according to the topology structure diagram to obtain a stand-alone neural network corresponding to each machine;

an operation instruction state obtaining module 403, configured to obtain states of operation instructions of the respective machines;

and the calculating module 404 is configured to perform training or inference calculation in the single-machine neural network corresponding to each machine according to the state of the operation instruction of each machine.

In one embodiment, the operation instruction obtaining module 401 is specifically configured to traverse the topology structure diagram, and if a machine number of an operation instruction in the topology structure diagram is consistent with a currently operating machine number, take the operation instruction as an operation instruction in a currently operating machine.

In one embodiment, the stand-alone neural network obtaining module 402 is specifically configured to traverse the operation instruction of each machine according to the topology structure diagram to obtain a machine number parameter and an equipment number parameter of the operation instruction of each machine; judging whether a data copying instruction needs to be added between different devices or not and whether a network transmission instruction needs to be added between different machines or not according to the machine number parameter and the equipment number parameter; constructing a forward subgraph and a backward subgraph of each single-machine neural network according to the topological structure diagram; and performing gradient calculation on a plurality of running instructions with the same state in each single machine neural network to obtain the single machine neural network corresponding to each machine.

In one embodiment, the stand-alone neural network obtaining module 402 is specifically configured to traverse the operation instruction of each machine according to the topology structure diagram to obtain a machine number parameter and an equipment number parameter of the operation instruction of each machine; judging whether a data copying instruction needs to be added between different devices or not and whether a network transmission instruction needs to be added between different machines or not according to the machine number parameter and the equipment number parameter; and constructing a forward sub-graph of each single-machine neural network according to the topology structure graph to obtain the single-machine neural network corresponding to each machine.

In one embodiment, the stand-alone neural network obtaining module 402 is specifically configured to add the data copy instruction between different devices if the machine number parameter of the operation instruction is consistent with the input machine number of the operation instruction, and the device number parameter of the operation instruction is inconsistent with the input device number of the operation instruction; and if the machine number parameter of the operation instruction is not consistent with the input machine number of the operation instruction, adding the network transmission instruction between different machines.

In one embodiment, the stand-alone neural network obtaining module 402 is specifically configured to construct the forward subgraph through forward computation to obtain an end point of each stand-alone neural network; and constructing the backward subgraph through backward calculation, and updating the state of the operation instruction in each single-machine neural network.

In one embodiment, the single-machine neural network obtaining module 402 is specifically configured to add original gradients of a plurality of operation instructions with the same state to obtain an updated gradient of each operation instruction.

In one embodiment, the apparatus further comprises a distributed neural network construction module 405 for constructing a topology structure diagram of the distributed neural network.

In one embodiment, the distributed neural network building module 405 is specifically configured to allocate the operation instructions of each machine to an instruction list according to the state of the operation instructions of each machine; calculating the operation instruction in the instruction list in the distributed neural network, and updating the state of the operation instruction in real time; and after the calculation is finished, storing the state of each operation instruction in the instruction list.

In one embodiment, the distributed neural network building module 405 is specifically configured to allocate a plurality of the operation instructions with the same state to the same instruction list.

In one embodiment, the distributed neural network constructing module 405 is specifically configured to calculate the operation instruction in the instruction list in the distributed neural network to obtain a first calculation result; and merging the first calculation results corresponding to the instruction lists to obtain a distributed calculation result.

In one embodiment, the apparatus further comprises a storage module 406, configured to periodically store the state of the operation instruction during the process of allocating the computing resource and performing inference computation; or in the process of computing resource allocation and reasoning computation, the state of the operation instruction is stored in a distributed mode.

For specific limitations of the computing resource allocation apparatus, reference may be made to the above limitations of the computing resource allocation method, which are not described herein again. The modules in the computing resource allocation apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is for storing computing resource allocation data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a computing resource allocation method.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

acquiring the state of the running instruction of each machine;

The steps of the method for allocating computing resources in any of the above embodiments may also be implemented when the computer program is executed by a processor.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

acquiring the state of the running instruction of each machine;

The computer program, when executed by a processor, may further implement the steps of the computing resource allocation method in any of the embodiments described above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of computing resource allocation, the method comprising:

traversing a topological structure diagram of the distributed neural network to obtain an operation instruction of each machine; the running instructions are used for instructing each machine to execute operations;

traversing the operation instruction of each machine according to the topological structure diagram, and converting the distributed neural network to obtain a single-machine neural network corresponding to each machine;

acquiring the state of the operation instruction of each machine through the state identifier of the operation instruction of each machine, wherein the state identifier is used for distinguishing whether a plurality of operation instructions are in a shared state or not;

2. The method of claim 1, wherein traversing the topology graph of the distributed neural network to obtain the operating instructions of each machine comprises:

3. The method of claim 1, wherein the transforming the distributed neural network into a single-machine neural network corresponding to each machine according to the operation instruction for traversing each machine in the topology structure diagram comprises:

4. The method of claim 1, wherein the step of converting the distributed neural network into a single-machine neural network corresponding to each machine according to the operation instruction for traversing each machine in the topology structure diagram further comprises:

5. The method according to claim 3 or 4, wherein the determining whether the data copy command needs to be added between different devices and whether the network transmission command needs to be added between different machines according to the machine number parameter and the device number parameter comprises:

6. The method of claim 3, wherein said constructing forward subgraphs and backward subgraphs of each of said stand-alone neural networks from said topological structure graph comprises:

7. The method of claim 3, wherein performing a gradient calculation on a plurality of said state-identical operational instructions in each of said individual neural networks comprises:

8. The method of claim 1, further comprising: and constructing a topological structure diagram of the distributed neural network.

9. The method of claim 8, wherein constructing the topology graph of the distributed neural network comprises:

10. The method of claim 9, wherein the assigning the operating instructions of the respective machines to an instruction list according to the states of the operating instructions of the respective machines comprises:

11. The method of claim 9, wherein computing the operational instructions in the instruction list in the distributed neural network comprises:

12. The method of claim 1, further comprising:

13. The method of claim 1, wherein the execution instructions comprise single device execution instructions, distributed execution instructions, and parameter execution instructions.

14. The method of claim 1, wherein the parameters of the execution instructions further comprise distributed attributes, split execution instructions, and merge execution instructions.

15. An apparatus for allocating computing resources, the apparatus comprising:

the operation instruction acquisition module is used for traversing the topological structure diagram of the distributed neural network to obtain operation instructions of all machines; the running instructions are used for instructing each machine to execute operations;

the single-machine neural network acquisition module is used for traversing the operation instruction of each machine according to the topological structure diagram and converting the distributed neural network to obtain a single-machine neural network corresponding to each machine;

the system comprises an operation instruction state acquisition module, a state identification module and a state identification module, wherein the operation instruction state acquisition module is used for acquiring the state of an operation instruction of each machine through a state identifier of the operation instruction of each machine, and the state identifier is used for distinguishing whether a plurality of operation instructions are in a shared state or not;

16. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 14 when executing the computer program.

17. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 14.