CN114925591A

CN114925591A - Automatic parallel strategy searching method based on polyhedron model modeling and related equipment

Info

Publication number: CN114925591A
Application number: CN202111646797.9A
Authority: CN
Inventors: 王进; 易泽轩; 李革; 张叶红; 张艳; 王晖; 曾炜
Original assignee: Peking University Shenzhen Graduate School; Peng Cheng Laboratory
Current assignee: Peking University Shenzhen Graduate School; Peng Cheng Laboratory
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2022-08-19

Abstract

The invention discloses an automatic parallel strategy searching method based on polyhedral model modeling and related equipment, wherein the method comprises the following steps: obtaining a model calculation graph of a deep learning algorithm according to a model object input by a user; converting the model calculation graph to obtain a converted model calculation graph; carrying out equalization processing on the converted model calculation graph to obtain an equalization calculation graph; according to the equilibrium calculation diagram, creating a polyhedral model example, and outputting a parallel strategy according to the polyhedral model example; and calling the bottom framework to execute the parallel strategy. According to the method, the model calculation diagram is converted and balanced, and after the polyhedral model instance is created under the framework of the polyhedral model, the parallel strategy is automatically output, so that different algorithm logics are modeled under the polyhedral model, the parallel strategy process is automatically output, the efficiency of parallel strategy search is improved, and the distributed training development and efficiency tuning difficulty of the deep learning algorithm are reduced.

Description

Automatic parallel strategy searching method based on polyhedron model modeling and related equipment

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an automatic parallel strategy searching method based on polyhedral model modeling and related equipment.

Background

In the last decade, deep learning techniques have been constantly refreshing records of tasks in the fields of vision, natural language, speech, search, recommendation, etc. The reason for this is described by a keyword as "large scale". The large-scale data enables the model to have enough knowledge to memorize, the large-scale parameter model enables the model to have the capacity of memorizing more data, and the large-scale high-performance computing power (represented by GPU) enables the training speed of the model to be improved by hundreds of times or even thousands of times. The development of data, models and computing power urges the field of large-scale deep learning, and the problems of how to split a multi-machine task, how to configure cluster training resources, how to balance training speed and convergence speed, how to train a model which cannot be trained by a single machine, elastic training, fault tolerance and the like are all important researches in the direction. The distributed training is the most effective means for solving the problems and improving the training efficiency, and the core purpose of the distributed training is to accelerate the training speed of the model.

At present, mainstream deep learning frameworks such as TensorFlow (TensorFlow is a symbolic mathematical system based on dataflow programming, and is widely applied to programming realization of various machine learning algorithms), pitorch (Pytorch is an open-source Python machine learning library, and is based on Torch and used for application programs such as natural language processing), Mindspore (Mindspore is a novel open-source deep learning training/reasoning framework suitable for end-edge cloud scenes), paddle (paddle) is a technically advanced and functionally-complete open-source deep learning platform integrating a deep learning core framework, tool components and a service platform), all have a multi-machine distributed training function, and a main parallel mode has data parallel (data parallel, which means a process of dividing a training data sample into a plurality of computing devices for distributed computing when an AI model performs distributed training), parallel and pipeline parallel (pipeline technology means a quasi-parallel processing realization of overlapping operation of a plurality of instructions when a program is executed), parallel and pipeline parallel (pipeline technology means a quasi-parallel processing realization technology of performing a quasi-parallel processing operation of a plurality of instructions The operation) and the like, however, the parallel modes need to be realized by calling a parallel segmentation API provided by an AI frame by an algorithm developer according to the characteristics of the algorithm model, the technical difficulty of distributed training of the AI algorithm is improved by the mode, meanwhile, the efficiency of the parallel training of the model is low due to the fact that the algorithm developer has insufficient mastery on the AI frame and the characteristics of computing equipment, and the difficulty of the algorithm development is improved and the efficiency of algorithm research is reduced by the specific distributed optimization work.

Aiming at the problem, an Mindspore framework provides an automatic parallel training function with a model, a FlexFlow framework also provides a search strategy based on 4-dimensional parallel strategy space modeling, a RanNC framework provides a pipeline parallel strategy automatic search middleware supporting a Pyrorch front end, however, because the parallel strategy search space scale is large (related to a computational graph scale and a resource space scale), the work is difficult to be practical in the aspect of automatic parallel search efficiency, for example, when the RanNC framework realizes pipeline parallel strategy search on a 4-node 32 card by a BERT-enlarge model with a 4.9B parameter quantity, the required strategy search time reaches more than 4 hours, the debugging and training time during model training and development is improved to a certain extent, and the efficiency is reduced. Thus, the prior art has yet to be improved and enhanced.

Disclosure of Invention

The invention mainly aims to provide an automatic parallel strategy searching method based on polyhedral model modeling and related equipment, and aims to solve the problems that in the prior art, when a large-scale deep learning model is trained, an algorithm developer needs to configure a parallel strategy by self, so that the training efficiency is low and the development difficulty is high.

In order to achieve the purpose, the invention adopts the following technical scheme:

an automatic parallel strategy searching method based on polyhedron model modeling comprises the following steps:

obtaining a model calculation graph of a deep learning algorithm according to a model object input by a user;

converting the model calculation graph to obtain a converted model calculation graph;

carrying out equalization processing on the converted model calculation graph to obtain an equalization calculation graph;

according to the equilibrium calculation diagram, creating a polyhedral model example, and outputting a parallel strategy according to the polyhedral model example;

and calling the bottom framework to execute the parallel strategy.

In the method for searching for an automatic parallel strategy based on polyhedral model modeling, the step of obtaining a model computation graph of a deep learning algorithm according to a model object input by a user specifically includes:

obtaining an algorithm model according to a model object input by a user;

after a numerical value is randomly input into the algorithm model, recording the calculation process of the algorithm model to obtain a model calculation graph;

or analyzing the model object through a python interpreter to generate a syntax tree, and analyzing the syntax tree to obtain the model calculation graph.

In the method for searching an automatic parallel strategy based on the polyhedral model modeling, the step of converting the model calculation diagram to obtain a converted model calculation diagram specifically includes:

and re-representing the model calculation diagram by using a predefined intermediate representation method to obtain the converted model calculation diagram.

In the method for searching for an automatic parallel strategy based on polyhedral model modeling, the step of performing equalization processing on the converted model computation graph to obtain an equalization computation graph specifically includes:

setting an average calculated quantity threshold value of the nodes, and comparing the calculated quantity nodes in the converted model calculation graph with the average calculated quantity threshold value;

and fusing adjacent calculation quantity nodes smaller than the average calculation threshold, and splitting the calculation quantity nodes larger than the average calculation threshold to obtain the balance calculation graph.

In the method for searching for an automatic parallel policy based on polyhedral model modeling, the step of creating a polyhedral model instance according to the equilibrium computation graph and outputting a parallel policy according to the polyhedral model instance specifically includes:

mapping the equilibrium calculation map on a polyhedron model to obtain a polyhedron optimization model;

inputting the equilibrium calculation chart into the polyhedral optimization model to obtain a polyhedral model example;

and outputting a parallel strategy according to the polyhedral model example and the quantity of the computing resources input by the user.

In the method for searching for an automatic parallel policy based on polyhedral model modeling, the step of invoking a bottom-layer framework to execute the parallel policy specifically includes:

an execution API of the underlying framework is invoked to execute the parallel policy.

In the automatic parallel strategy searching method based on the polyhedron model modeling, the model object refers to a single machine training code of a deep learning algorithm defined in advance by a user.

In the method for automatically and concurrently searching for a strategy based on polyhedral model modeling, the predefined intermediate representation includes: IRType, IRValue, IRNode, and IRGraph.

In the automatic parallel strategy searching method based on the polyhedron model modeling, the parallel strategy comprises a data parallel segmentation dimension and a pipeline parallel segmentation dimension.

In the polyhedron model modeling-based automatic parallel strategy searching method, the execution API is a running manager in an underlying AI framework.

An automated parallel policy search system, the automated parallel policy search system further comprising:

the calculation map generation module is used for obtaining a model calculation map of the deep learning algorithm according to the model object input by the user;

the calculation graph conversion module is used for converting the model calculation graph to obtain a converted model calculation graph;

the computation graph balancing module is used for carrying out balancing processing on the converted model computation graph to obtain a balancing computation graph;

the parallel strategy searching module is used for creating a polyhedral model example according to the equilibrium computation graph and outputting a parallel strategy according to the polyhedral model example;

and the parallel strategy execution module is used for calling the bottom layer framework to execute the parallel strategy.

A controller, the controller comprising: the system comprises a memory, a processor and a polyhedron model modeling based automatic parallel strategy searching program which is stored on the memory and can run on the processor, wherein the polyhedron model modeling based automatic parallel strategy searching program realizes the steps of the polyhedron model modeling based automatic parallel strategy searching method when being executed by the processor.

A computer-readable storage medium storing a polyhedral model modeling-based automatic parallel policy search program which, when executed by a processor, implements the steps of the polyhedral model modeling-based automatic parallel policy search method as described above.

Compared with the prior art, the automatic parallel strategy searching method based on the polyhedral model modeling and the related equipment provided by the invention comprise the following steps: obtaining a model calculation graph of a deep learning algorithm according to a model object input by a user; converting the model calculation graph to obtain a converted model calculation graph; carrying out equalization processing on the converted model calculation graph to obtain an equalization calculation graph; according to the equilibrium calculation diagram, a polyhedral model example is created, and a parallel strategy is output according to the polyhedral model example; and calling the bottom framework to execute the parallel strategy. According to the method, the generated model calculation diagram is converted and balanced, and the polyhedral model instance is created based on the framework of the polyhedral model, so that the parallel strategy is automatically output according to the polyhedral model instance, the process that algorithm logics under different frameworks are modeled under the polyhedral model, and the parallel strategy which can be efficiently executed is automatically output is realized, the efficiency of searching the parallel strategy is effectively improved, and meanwhile, the distributed training development and efficiency tuning difficulty of a deep learning algorithm are reduced.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of the automated parallel strategy search method based on polyhedral model modeling according to the present invention;

FIG. 2 is a flowchart of step S100 in the preferred embodiment of the method for automatic parallel strategy search based on polyhedron modeling according to the present invention;

FIG. 3 is a flowchart of step S300 of the method for automatic parallel strategy search based on polyhedron modeling according to the preferred embodiment of the present invention;

FIG. 4 is a schematic diagram of node splitting and node aggregation provided by the present invention;

FIG. 5 is a flowchart of step S400 in the preferred embodiment of the method for automatic parallel strategy search based on polyhedral model modeling according to the present invention;

FIG. 6 is a schematic diagram illustrating a segmentation of a computation graph in a data parallel mode according to the present invention;

FIG. 7 is a schematic diagram illustrating a segmentation of a computation graph in a pipeline parallel mode according to the present invention;

FIG. 8 is a functional block diagram of an automated parallel policy search system provided by the present invention;

FIG. 9 is an architectural relationship diagram of a Pythrch framework and an automatic parallel policy search system according to the present invention;

FIG. 10 is a schematic diagram illustrating an operating environment of a controller according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

Firstly, a multi-machine multi-card distributed training deep learning model becomes the most important technical scheme for accelerating the training efficiency of the model at present, the main distributed training modes include data parallel, model parallel, optimizer parallel, pipeline parallel, mixed parallel and the like, the parallel modes are also parallel functions mainly supported by the mainstream deep learning framework (such as TensorFlow, Pythrch, Mindspore and PaddlePaddle) at present, different AI frameworks have differences in usability and efficiency, however, the parallel modes almost all need an algorithm developer to realize the parallel modes by calling a framework API and simultaneously carry out training efficiency optimization manually, and therefore the parallel modes are very difficult for the algorithm developer who does not know the implementation mechanism and the cluster communication characteristics of the bottom-layer AI framework, and the complex implementation and debugging bring about great reduction of the efficiency.

On the other hand, because the search efficiency of the automatic parallel strategy is exponentially increased with the AI model computation graph and the cluster resource scale, the search efficiency of some existing automatic parallel search works cannot meet the efficiency requirement of a large-parameter model on a large-scale cluster, and therefore a high-efficiency multi-machine multi-card distributed training parallel strategy search method needs to be designed and realized to solve the problem of automatic parallel training strategy search of the large model, so that an algorithm developer can focus on only the development of algorithm logic and quickly realize distributed training on the AI cluster.

In order to solve the problems in the prior art, the invention provides an automatic parallel strategy searching method based on polyhedral model modeling and related equipment. The generated model calculation graph is converted and balanced, the calculation graph is balanced, and the polyhedral model example is created according to the balanced calculation graph under the framework of the polyhedral model, so that the parallel strategy is automatically output according to the polyhedral model example, the process of modeling algorithm logics under different frameworks under the polyhedral model and automatically outputting the parallel strategy capable of being efficiently executed is realized, the efficiency of searching the parallel strategy is effectively improved, and the difficulty of distributed training development and efficiency optimization of a deep learning algorithm is reduced.

The following describes a design scheme of an automatic parallel strategy search method based on polyhedral model modeling by using a specific exemplary embodiment, and it should be noted that the following embodiment is only used for explaining the technical scheme of the invention, and is not specifically limited:

referring to fig. 1, the method for searching an automatic parallel strategy based on polyhedral model modeling according to the present invention includes:

and S100, obtaining a model calculation graph of the deep learning algorithm according to the model object input by the user.

The model object refers to a stand-alone training code of a deep learning algorithm predefined by a user, such as a BERT model and a GPT3 model (GPT-3 is constructed by an independent AI research and deployment company OpenAI, and is a large-scale natural language model currently running on microsoft Azure).

Specifically, under different deep learning model frameworks (such as tensrflow, pitorch, Mindspore, and paddlepaddlel), corresponding model objects (algorithm logic predefined by a user) are different, so that a model calculation map of the deep learning algorithm needs to be obtained according to a single-machine training code of the deep learning algorithm based on the AI framework definition input by the user on the basis of adopting a corresponding method. The model computation graph is a representation form of a deep learning algorithm in an AI framework, a Directed Acyclic Graph (DAG) is generally used to represent a computation process of the deep learning algorithm at present, nodes of the computation graph represent computation operations, and edges represent tensor data dependencies between the computation operations of the algorithm. The process of generating the model computation graph described above functions as a computation process for representing an algorithm.

Further, referring to fig. 2, step 100 specifically includes:

s110, obtaining an algorithm model according to a model object input by a user;

s120, after a numerical value is randomly input into the algorithm model, recording the calculation process of the algorithm model to obtain a model calculation graph;

s130, or generating a syntax tree after analyzing the model object through a python interpreter, and analyzing the syntax tree to obtain the model calculation diagram.

Specifically, taking a Pytorch frame as an example, a model computation graph defined by an algorithm developer can be obtained by using a exit. Trace method is based on vector calculation tracking mode, after obtaining an algorithm model according to model object input by user, randomly inputting a numerical value according to input type requirement of algorithm, then recording each calculation process of algorithm, and these recorded calculation processes constitute a calculation graph; for example, if the algorithm model is to input a 32X32 picture and the output is the category label of the picture, the random input refers to randomly inputting a 32X32 data. Script method is based on the source code conversion mode, the basic principle is that firstly, the model object (user-defined algorithm logic) is analyzed into a syntax tree (tree structure for describing the world truth by a computer) through a python interpreter, and then a model calculation diagram is obtained by analyzing the syntax tree.

Please refer to fig. 1, S200, the model calculation graph is converted to obtain a converted model calculation graph.

Specifically, because there are a plurality of defining ways for the computation graphs under different frames, that is, the computation graphs under different frames represent differences, for example, a Torch computation Graph (Torch Graph) is used for a pytoch frame, it is necessary to convert the computation graphs of different frames into a computation Graph in a general intermediate representation form, whereas in a general case, the intermediate representation method of the computation Graph is manually defined by an expert and is a process for manually establishing rule mapping, so that the intermediate representation conversion of different computation graphs is performed based on respective rules.

Further, step S200 specifically includes:

and S210, re-representing the model calculation graph by using a predefined intermediate representation method to obtain the converted model calculation graph.

Specifically, the computation graphs under different frames are represented again by an intermediate representation method which is defined by a user in advance, so that the computation graphs under different frames are converted into the computation graphs used in the parallel strategy search, the intermediate representation method is a universal method, and the specific definitions are respectively 4 class definitions in the C + + implementation; wherein the predefined intermediate representation is: IRType (type), IRvalue (value), IRNode (node), and IRGraph (graph).

Please refer to fig. 1, S300, the transformed model calculation graph is equalized to obtain an equalization calculation graph.

Specifically, the computation workload of many different node operations in the computation graph is different, which results in imbalance of the computation graph, and in order to obtain small nodes with balanced sizes when nodes are split in the following steps and to improve the efficiency of the automatic search parallel strategy, the nodes in the computation graph need to be balanced, that is, node splitting and node aggregation operations are performed; the purpose of the calculation graph balancing process is to perform calculation balancing on the calculation graph in the transverse dimension and the longitudinal dimension, and the purpose of the calculation balancing is to re-aggregate or split nodes in the calculation graph, so that the calculation balance of the converted calculation graph in the two dimensions can be achieved, and uneven calculation distribution during parallel strategy search is avoided.

Further, referring to fig. 3, step S300 specifically includes:

s310, setting an average calculated quantity threshold value of the nodes, and comparing the calculated quantity nodes in the converted model calculation graph with the average calculated quantity threshold value;

and S320, fusing adjacent calculation amount nodes smaller than the average calculation threshold value, and splitting the calculation amount nodes larger than the average calculation threshold value to obtain a balance calculation graph.

Specifically, the equalization processing mainly includes performing aggregation or splitting operation on nodes in the model calculation graph; before node aggregation or node difference operation, firstly, setting an average calculated quantity threshold value of a node, traversing the calculated quantity nodes in the converted model calculation graph according to the node sequence, and comparing the calculated quantity nodes in the converted model calculation graph with the average calculated quantity threshold value; then, the node aggregation operation is: fusing adjacent calculated amount nodes smaller than the average calculation threshold; the node splitting operation is: splitting the computation amount nodes larger than the average computation threshold into a plurality of nodes according to the average computation amount threshold, wherein the splitting is usually performed on matrix multiplications larger than the average computation threshold, so that the computation amount of the fused and split nodes is equivalent to the average computation amount threshold; after the node aggregation or node splitting operation is completed, the equilibrium calculation graph can be obtained.

According to the invention, the balanced calculation graph is obtained by splitting the large nodes in the calculation graph or aggregating a plurality of adjacent small nodes, so that the calculated amount among the nodes is balanced, the problem of uneven calculation distribution during parallel strategy search can be effectively avoided, and the node segmentation efficiency is improved.

The node aggregation is to convert the calculation process of a plurality of nodes into one node on the representation, for example, 3 calculation nodes are aggregated, the 3 nodes before aggregation are three IRNode objects respectively, and a new IRNode object is formed after aggregation; the segmentation of the nodes with large calculation amount is determined according to the operator type of the specific nodes, not all the nodes can be split, for example, the matrix multiplication is a node which can be split, and the node is also a common basic operator in the AI model. The specific operation process of node splitting and node aggregation is shown in fig. 4, which represents the process of node segmentation with large computation amount and adjacent node aggregation with small computation amount.

With reference to fig. 1, S400, a polyhedron model instance is created according to the equilibrium computation graph, and a parallel policy is output according to the polyhedron model instance.

Specifically, the process belongs to training modeling based on a polyhedral optimization model and a parallel strategy searching process, and the training modeling process is to initialize under an established polyhedral optimization model framework according to a balanced computation graph and create a polyhedral optimization model instance, namely, to model into a specific object; then, the parallel policy search refers to, based on the polyhedral optimization model instance, according to a specified parallel mode (the specified parallel mode refers to a parallel mode that a developer needs to perform, such as data parallel, model parallel, optimizer parallel, pipeline parallel, hybrid parallel, and the like), taking the pipeline parallel policy search as an example: the policy values of the number of pipelines and the number of mini-lots are searched according to the pipeline mode specified by the developer, and the final parallel policy is generated. The parallel mode is a large framework, for example, data parallel refers to a process of distributing training data samples to a plurality of computing devices for distributed computing when an AI model is in distributed training, and the parallel strategy refers to a distributed training segmentation strategy when a specific algorithm model is executed.

The polyhedron model modeling is a common method for compiling optimization for a for loop in a compiler, and by representing the loop to a polyhedron model space, the loop parallel computing optimization can be directly realized through a mapping table of the polyhedron model, so that the training efficiency is improved; in the invention, a training calculation process of a deep learning model is expressed as multi-layer cycle operation, how to express the training process through a polyhedron model is defined, and finally common data parallelism, model parallelism and pipeline parallelism are unified under a polyhedron modeling framework.

Assume a deep learning model represented by a Directed Acyclic Graph (DAG) as a computational graph D (N, E), where N is a set of computational graph nodes and E is a set of edges of the computational graph. Therefore, the deep learning model training calculation process (actually, a modeling process) is expressed as follows (expressed in python programming language), so that the for loop is mapped on the polyhedral optimization model:

for e in range (Epoch _ num): // Epoch _ num: model training round number, i.e. the round number range (domain) of for loop;

for b in range (Batch _ num): // Batch _ num: the model trains the sample batch number of a round, namely the batch number range of the for cycle;

for nodes in nodes: // node: a node of the model; nodes: a set of nodes of the model computation graph;

out ═ Forward (node, b)// Forward (), denotes Forward calculation of the model;

for node in reverse (nodes): // reverse (): representing the reverse ordering of model computing nodes;

grad ═ backfred (node, b)// grad: representing the gradient of the model parameter; backward (): an inverse gradient calculation representing the model;

update (node)// update (): representing the parameter update process of the model.

Further, referring to fig. 5, the step S400 specifically includes:

s410, creating a polyhedral optimization model;

s420, initializing the polyhedral optimization model according to the balance calculation chart to obtain a polyhedral model example;

and S430, outputting a parallel strategy according to the polyhedral model example and the number of the computing resources input by the user. The parallel strategy comprises data parallel segmentation dimensionality, pipeline arrangement and the like.

Specifically, a balanced computation graph is mapped on a polyhedral model to obtain a polyhedral optimization model, wherein the mapping can be understood as affine transformation of a coordinate space, such as translation, rotation and other operations, then the balanced computation graph is input into the polyhedral optimization model to obtain a polyhedral model instance (a specific object), namely, a for loop is mapped on the polyhedral optimization model to obtain a polyhedral model instance (searching for a distributed optimization computation model), then, a geometrically linear transformation is performed to find a data parallel segmentation dimension or a pipeline parallel segmentation dimension of distributed training, and the number of computation resources input by a user is combined, for example, the computation resources are 8 GPU cards, so that data parallel can be segmented into 8 mini-batch cards from the data parallel segmentation dimension, and then, a parallel strategy is comprehensively output (data parallel segmentation dimension is calculated on the 8 GPU cards respectively), Pipeline parallel segmentation dimension and pipeline arrangement, etc.). The specific data parallel segmentation dimension and pipeline parallel segmentation dimension segmentation processes are respectively shown in fig. 6 and fig. 7, wherein the abscissa represents the number of sample batches for model training, and the ordinate represents the number of layers of model parameters, for example, the BERT basic model calculation process has 24 layers; points in the first quadrant represent the calculation process of a certain sample batch on a certain layer of the model, and arrows represent the data dependence or time dependence of the model training calculation process; a cut example of the model in distributed training is represented in a dotted line box; FIG. 6 is a diagram slicing example of the computation graph in the data parallel mode, in which slicing is performed directly from the batch direction; fig. 7 shows an example of the computation graph slicing in the pipeline parallel mode, slicing being performed in a lateral direction with a misalignment.

According to the method, the training calculation process of the deep learning model is uniformly described, the calculation process is described into the polyhedral model, and through the polyhedral model, the commonly used distributed parallel training modes such as data parallel and pipeline parallel can be unified through the obtained polyhedral optimization model, so that a feasible parallel strategy can be searched through simple mapping transformation of the polyhedral model.

Continuing with FIG. 1, S500, the underlying framework is invoked to execute the parallel policy. Wherein the execution API is a run manager (runtime interface) in the underlying AI framework.

Specifically, a data parallel segmentation dimension or a pipeline parallel segmentation dimension of distributed training is found under a polyhedral model, a parallel strategy is automatically output by combining the number of computing resources input by a user, and finally the parallel strategy is executed by calling a bottom-layer framework execution API. Wherein the API is an interactive interface.

Further, step S500 specifically includes:

s510, calling an execution API of the bottom framework to execute the parallel strategy. Wherein the execution API is an execution manager in the underlying AI framework.

Specifically, after a polyhedral model instance is created under a polyhedral model, the data parallel segmentation dimension or the pipeline parallel segmentation dimension of the distributed training is found through the polyhedral model instance, the distributed training execution strategy of the algorithm model is automatically output by combining the number of computing resources input by a user, and then a bottom-layer framework execution API is required to be called to realize the execution of the distributed training strategy of the algorithm model.

Further, referring to fig. 8, based on the above method for automatically searching parallel strategies based on polyhedral model modeling, the present invention further provides an automatically searching parallel strategies system, which further includes: the computational graph generating module 100, the computational graph converting module 200, the computational graph balancing module 300, the parallel policy searching module 400, and the parallel policy executing module 500.

Specifically, the computation graph generating module 100 is configured to obtain a model computation graph of a deep learning algorithm according to a model object input by a user; the computation graph conversion module 200 is configured to convert the model computation graph to obtain a converted model computation graph; the computation graph balancing module 300 is configured to perform balancing processing on the converted model computation graph to obtain a balanced computation graph; the parallel policy search module 400 is configured to create a polyhedral model instance according to the equilibrium computation graph, and output a parallel policy according to the polyhedral model instance; the parallel policy executing module 500 is configured to invoke a bottom framework to execute the parallel policy.

The realization of the automatic parallel strategy searching method based on the polyhedral model modeling can be regarded as a middleware for supporting an AI frame to realize the automatic parallel function, and as shown in FIG. 9, a relation diagram of the function and a Pythrch frame is provided. Firstly, a user adopts a Pythrch front-end API to define an algorithm, and the real user does not need to consider parallel segmentation of the algorithm but only needs to consider a single algorithm to realize logic; therefore, a user only needs to pay attention to the application method of the API provided by the pytorech frame for the algorithm, and the single-machine algorithm is realized through the API, as shown in fig. 9, the vertical arrangement on the left side is the basic architecture of the pytorech frame, the part on the right side is the automatic parallel policy search system, the main interfaces of the automatic parallel policy search system and the pytorech frame are the computer diagram representation of the algorithm and the bottom layer frame runtime interface, and the user is almost unaware of the automatic parallel policy search system.

Then, the algorithm defined by the user outputs a computation graph through a JIT (just-in-time compilation) module of the Pytrch, the computation graph is input into the automatic parallel strategy search system and then is successively input into a computation graph generation module, a computation graph conversion module, a computation graph equalization module and a parallel strategy search module, finally, a computation subgraph which is segmented according to the parallel strategy is output, and the computation subgraph is executed by calling a bottom running API (internal function API of the Pytrch framework) of the Pytrch.

Further, the present invention also provides a controller, which includes a processor 10, a memory 20 and a display 30. Fig. 10 shows only some of the components of the controller, but it should be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.

The memory 20 may in some embodiments be an internal storage unit of the controller, such as a hard disk or a memory of the controller. The memory 20 may also be an external storage device of the controller in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the controller. Further, the memory 20 may also include both an internal storage unit of the controller and an external storage device. The memory 20 is used for storing application software installed in the controller and various types of data. The memory 20 may also be used to temporarily store data that has been output or is to be output. In one embodiment, the memory 20 stores an automatic parallel policy search program 40, and the automatic parallel policy search program 40 is executable by the processor 10, so as to implement the automatic parallel policy search method based on polyhedral model modeling in the present invention.

The processor 10 may be, in some embodiments, a Central Processing Unit (CPU), a microprocessor or other data Processing chip, and is configured to run program codes stored in the memory 20 or process data, such as executing the automatic parallel strategy search method based on polyhedral model modeling.

The display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch panel, or the like in some embodiments. The display 30 is used for displaying information at the device and for displaying a visual user interface. The components 10-30 of the device communicate with each other via a system bus.

In one embodiment, the following steps are implemented when the processor 10 executes the automatic parallel policy search routine 40 in the memory 20:

and calling the bottom framework to execute the parallel strategy.

The step of obtaining the model computation graph of the deep learning algorithm according to the model object input by the user specifically includes:

after a numerical value is randomly input into the algorithm model, recording the calculation process of the algorithm model to obtain a model calculation graph; the model object refers to a single-machine training code of a deep learning algorithm defined by a user in advance.

Or analyzing the model object through a python interpreter to generate a syntax tree, and analyzing the syntax tree to obtain the model calculation diagram.

Wherein, the step of converting the model calculation diagram to obtain a converted model calculation diagram specifically includes:

and re-representing the model calculation graph by using a predefined intermediate representation method to obtain the converted model calculation graph.

Wherein, the step of performing equalization processing on the converted model calculation graph to obtain an equalization calculation graph specifically includes:

and splitting the large nodes in the converted model calculation graph, and performing aggregation operation on the adjacent small nodes in the converted model calculation graph to obtain the balanced calculation graph.

The step of creating a polyhedral model instance according to the equilibrium computation graph and outputting a parallel policy according to the polyhedral model instance specifically includes:

creating a polyhedral optimization model; initializing the polyhedral optimization model according to the equilibrium calculation graph to obtain a polyhedral model example;

The step of calling the bottom layer framework to execute the parallel policy specifically comprises the following steps:

Further, the present invention also provides a computer readable storage medium, wherein the computer readable storage medium stores an automatic parallel policy search program 40, and the automatic parallel policy search program 40, when executed by a processor, implements the steps of the automatic parallel policy search method based on polyhedral model modeling as described above; the steps of the automatic parallel strategy searching method based on the polyhedral model modeling are described in detail, and are not repeated herein.

In summary, the present invention provides an automatic parallel policy search method based on polyhedral model modeling and related devices, where the automatic parallel policy search method based on polyhedral model modeling includes: obtaining a model calculation graph of a deep learning algorithm according to a model object input by a user; converting the model calculation graph to obtain a converted model calculation graph; carrying out equalization processing on the converted model calculation graph to obtain an equalization calculation graph; according to the equilibrium calculation diagram, creating a polyhedral model example, and outputting a parallel strategy according to the polyhedral model example; and calling the bottom framework to execute the parallel strategy. According to the method, the model calculation diagram is converted and balanced, and after the polyhedral model instance is created under the framework of the polyhedral model, the parallel strategy is automatically output, so that different algorithm logics are modeled under the polyhedral model, the parallel strategy process is automatically output, the efficiency of parallel strategy search is improved, and the distributed training development and efficiency tuning difficulty of the deep learning algorithm are reduced.

It should be understood that equivalents and modifications of the technical solution and inventive concept thereof may occur to those skilled in the art, and all such modifications and alterations should fall within the scope of the appended claims.

Claims

1. The automatic parallel strategy searching method based on the polyhedron model modeling is characterized by comprising the following steps of:

and calling the bottom framework to execute the parallel strategy.

2. The method according to claim 1, wherein the step of obtaining the model computation graph of the deep learning algorithm according to the model object input by the user specifically comprises:

obtaining an algorithm model according to a model object input by a user;

3. The method according to claim 2, wherein the step of converting the model computation graph to obtain the converted model computation graph specifically comprises:

4. The method according to claim 3, wherein the step of performing equalization processing on the converted model computation graph to obtain an equalization computation graph specifically comprises:

5. The method according to claim 4, wherein the steps of creating a polyhedral model instance according to the equilibrium computation graph and outputting a parallel policy according to the polyhedral model instance specifically include:

creating a polyhedron optimization model;

initializing the polyhedral optimization model according to the equilibrium calculation graph to obtain a polyhedral model example;

6. The method according to claim 3, wherein the step of invoking the underlying framework to execute the parallel policy specifically comprises:

7. The method according to claim 1, wherein the model object refers to a stand-alone training code of a deep learning algorithm predefined by a user.

8. The automated parallel strategy searching method based on polyhedral model modeling according to claim 3, wherein the predefined intermediate representation comprises: IRType, IRValue, IRNode, and IRGraph.

9. The automated parallel strategy searching method based on polyhedron model modeling of claim 5, wherein the parallel strategy comprises data parallel segmentation dimension and pipeline parallel segmentation dimension.

10. The method of claim 6, wherein the execution API is a run manager in an underlying AI framework.

11. An automated parallel policy search system, the automated parallel policy search system comprising:

the calculation map generation module is used for obtaining a model calculation map according to the deep learning algorithm logic defined by the user;

12. A controller, characterized in that the controller comprises: memory, a processor and a polyhedral model modeling based automatic parallel strategy searching program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the polyhedral model modeling based automatic parallel strategy searching method according to any one of claims 1 to 9.

13. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a polyhedral model modeling based automatic parallel policy search program, which when executed by a processor, implements the steps of the polyhedral model modeling based automatic parallel policy search method according to any one of claims 1 to 9.