CN114925591A - Automatic parallel strategy searching method based on polyhedron model modeling and related equipment - Google Patents

Automatic parallel strategy searching method based on polyhedron model modeling and related equipment Download PDF

Info

Publication number
CN114925591A
CN114925591A CN202111646797.9A CN202111646797A CN114925591A CN 114925591 A CN114925591 A CN 114925591A CN 202111646797 A CN202111646797 A CN 202111646797A CN 114925591 A CN114925591 A CN 114925591A
Authority
CN
China
Prior art keywords
model
parallel
calculation
graph
polyhedral
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111646797.9A
Other languages
Chinese (zh)
Inventor
王进
易泽轩
李革
张叶红
张艳
王晖
曾炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Shenzhen Graduate School
Peng Cheng Laboratory
Original Assignee
Peking University Shenzhen Graduate School
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Shenzhen Graduate School, Peng Cheng Laboratory filed Critical Peking University Shenzhen Graduate School
Priority to CN202111646797.9A priority Critical patent/CN114925591A/en
Publication of CN114925591A publication Critical patent/CN114925591A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/08Probabilistic or stochastic CAD

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an automatic parallel strategy searching method based on polyhedral model modeling and related equipment, wherein the method comprises the following steps: obtaining a model calculation graph of a deep learning algorithm according to a model object input by a user; converting the model calculation graph to obtain a converted model calculation graph; carrying out equalization processing on the converted model calculation graph to obtain an equalization calculation graph; according to the equilibrium calculation diagram, creating a polyhedral model example, and outputting a parallel strategy according to the polyhedral model example; and calling the bottom framework to execute the parallel strategy. According to the method, the model calculation diagram is converted and balanced, and after the polyhedral model instance is created under the framework of the polyhedral model, the parallel strategy is automatically output, so that different algorithm logics are modeled under the polyhedral model, the parallel strategy process is automatically output, the efficiency of parallel strategy search is improved, and the distributed training development and efficiency tuning difficulty of the deep learning algorithm are reduced.

Description

Automatic parallel strategy searching method based on polyhedron model modeling and related equipment
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an automatic parallel strategy searching method based on polyhedral model modeling and related equipment.
Background
In the last decade, deep learning techniques have been constantly refreshing records of tasks in the fields of vision, natural language, speech, search, recommendation, etc. The reason for this is described by a keyword as "large scale". The large-scale data enables the model to have enough knowledge to memorize, the large-scale parameter model enables the model to have the capacity of memorizing more data, and the large-scale high-performance computing power (represented by GPU) enables the training speed of the model to be improved by hundreds of times or even thousands of times. The development of data, models and computing power urges the field of large-scale deep learning, and the problems of how to split a multi-machine task, how to configure cluster training resources, how to balance training speed and convergence speed, how to train a model which cannot be trained by a single machine, elastic training, fault tolerance and the like are all important researches in the direction. The distributed training is the most effective means for solving the problems and improving the training efficiency, and the core purpose of the distributed training is to accelerate the training speed of the model.
At present, mainstream deep learning frameworks such as TensorFlow (TensorFlow is a symbolic mathematical system based on dataflow programming, and is widely applied to programming realization of various machine learning algorithms), pitorch (Pytorch is an open-source Python machine learning library, and is based on Torch and used for application programs such as natural language processing), Mindspore (Mindspore is a novel open-source deep learning training/reasoning framework suitable for end-edge cloud scenes), paddle (paddle) is a technically advanced and functionally-complete open-source deep learning platform integrating a deep learning core framework, tool components and a service platform), all have a multi-machine distributed training function, and a main parallel mode has data parallel (data parallel, which means a process of dividing a training data sample into a plurality of computing devices for distributed computing when an AI model performs distributed training), parallel and pipeline parallel (pipeline technology means a quasi-parallel processing realization of overlapping operation of a plurality of instructions when a program is executed), parallel and pipeline parallel (pipeline technology means a quasi-parallel processing realization technology of performing a quasi-parallel processing operation of a plurality of instructions The operation) and the like, however, the parallel modes need to be realized by calling a parallel segmentation API provided by an AI frame by an algorithm developer according to the characteristics of the algorithm model, the technical difficulty of distributed training of the AI algorithm is improved by the mode, meanwhile, the efficiency of the parallel training of the model is low due to the fact that the algorithm developer has insufficient mastery on the AI frame and the characteristics of computing equipment, and the difficulty of the algorithm development is improved and the efficiency of algorithm research is reduced by the specific distributed optimization work.
Aiming at the problem, an Mindspore framework provides an automatic parallel training function with a model, a FlexFlow framework also provides a search strategy based on 4-dimensional parallel strategy space modeling, a RanNC framework provides a pipeline parallel strategy automatic search middleware supporting a Pyrorch front end, however, because the parallel strategy search space scale is large (related to a computational graph scale and a resource space scale), the work is difficult to be practical in the aspect of automatic parallel search efficiency, for example, when the RanNC framework realizes pipeline parallel strategy search on a 4-node 32 card by a BERT-enlarge model with a 4.9B parameter quantity, the required strategy search time reaches more than 4 hours, the debugging and training time during model training and development is improved to a certain extent, and the efficiency is reduced. Thus, the prior art has yet to be improved and enhanced.
Disclosure of Invention
The invention mainly aims to provide an automatic parallel strategy searching method based on polyhedral model modeling and related equipment, and aims to solve the problems that in the prior art, when a large-scale deep learning model is trained, an algorithm developer needs to configure a parallel strategy by self, so that the training efficiency is low and the development difficulty is high.
In order to achieve the purpose, the invention adopts the following technical scheme:
an automatic parallel strategy searching method based on polyhedron model modeling comprises the following steps:
obtaining a model calculation graph of a deep learning algorithm according to a model object input by a user;
converting the model calculation graph to obtain a converted model calculation graph;
carrying out equalization processing on the converted model calculation graph to obtain an equalization calculation graph;
according to the equilibrium calculation diagram, creating a polyhedral model example, and outputting a parallel strategy according to the polyhedral model example;
and calling the bottom framework to execute the parallel strategy.
In the method for searching for an automatic parallel strategy based on polyhedral model modeling, the step of obtaining a model computation graph of a deep learning algorithm according to a model object input by a user specifically includes:
obtaining an algorithm model according to a model object input by a user;
after a numerical value is randomly input into the algorithm model, recording the calculation process of the algorithm model to obtain a model calculation graph;
or analyzing the model object through a python interpreter to generate a syntax tree, and analyzing the syntax tree to obtain the model calculation graph.
In the method for searching an automatic parallel strategy based on the polyhedral model modeling, the step of converting the model calculation diagram to obtain a converted model calculation diagram specifically includes:
and re-representing the model calculation diagram by using a predefined intermediate representation method to obtain the converted model calculation diagram.
In the method for searching for an automatic parallel strategy based on polyhedral model modeling, the step of performing equalization processing on the converted model computation graph to obtain an equalization computation graph specifically includes:
setting an average calculated quantity threshold value of the nodes, and comparing the calculated quantity nodes in the converted model calculation graph with the average calculated quantity threshold value;
and fusing adjacent calculation quantity nodes smaller than the average calculation threshold, and splitting the calculation quantity nodes larger than the average calculation threshold to obtain the balance calculation graph.
In the method for searching for an automatic parallel policy based on polyhedral model modeling, the step of creating a polyhedral model instance according to the equilibrium computation graph and outputting a parallel policy according to the polyhedral model instance specifically includes:
mapping the equilibrium calculation map on a polyhedron model to obtain a polyhedron optimization model;
inputting the equilibrium calculation chart into the polyhedral optimization model to obtain a polyhedral model example;
and outputting a parallel strategy according to the polyhedral model example and the quantity of the computing resources input by the user.
In the method for searching for an automatic parallel policy based on polyhedral model modeling, the step of invoking a bottom-layer framework to execute the parallel policy specifically includes:
an execution API of the underlying framework is invoked to execute the parallel policy.
In the automatic parallel strategy searching method based on the polyhedron model modeling, the model object refers to a single machine training code of a deep learning algorithm defined in advance by a user.
In the method for automatically and concurrently searching for a strategy based on polyhedral model modeling, the predefined intermediate representation includes: IRType, IRValue, IRNode, and IRGraph.
In the automatic parallel strategy searching method based on the polyhedron model modeling, the parallel strategy comprises a data parallel segmentation dimension and a pipeline parallel segmentation dimension.
In the polyhedron model modeling-based automatic parallel strategy searching method, the execution API is a running manager in an underlying AI framework.
An automated parallel policy search system, the automated parallel policy search system further comprising:
the calculation map generation module is used for obtaining a model calculation map of the deep learning algorithm according to the model object input by the user;
the calculation graph conversion module is used for converting the model calculation graph to obtain a converted model calculation graph;
the computation graph balancing module is used for carrying out balancing processing on the converted model computation graph to obtain a balancing computation graph;
the parallel strategy searching module is used for creating a polyhedral model example according to the equilibrium computation graph and outputting a parallel strategy according to the polyhedral model example;
and the parallel strategy execution module is used for calling the bottom layer framework to execute the parallel strategy.
A controller, the controller comprising: the system comprises a memory, a processor and a polyhedron model modeling based automatic parallel strategy searching program which is stored on the memory and can run on the processor, wherein the polyhedron model modeling based automatic parallel strategy searching program realizes the steps of the polyhedron model modeling based automatic parallel strategy searching method when being executed by the processor.
A computer-readable storage medium storing a polyhedral model modeling-based automatic parallel policy search program which, when executed by a processor, implements the steps of the polyhedral model modeling-based automatic parallel policy search method as described above.
Compared with the prior art, the automatic parallel strategy searching method based on the polyhedral model modeling and the related equipment provided by the invention comprise the following steps: obtaining a model calculation graph of a deep learning algorithm according to a model object input by a user; converting the model calculation graph to obtain a converted model calculation graph; carrying out equalization processing on the converted model calculation graph to obtain an equalization calculation graph; according to the equilibrium calculation diagram, a polyhedral model example is created, and a parallel strategy is output according to the polyhedral model example; and calling the bottom framework to execute the parallel strategy. According to the method, the generated model calculation diagram is converted and balanced, and the polyhedral model instance is created based on the framework of the polyhedral model, so that the parallel strategy is automatically output according to the polyhedral model instance, the process that algorithm logics under different frameworks are modeled under the polyhedral model, and the parallel strategy which can be efficiently executed is automatically output is realized, the efficiency of searching the parallel strategy is effectively improved, and meanwhile, the distributed training development and efficiency tuning difficulty of a deep learning algorithm are reduced.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of the automated parallel strategy search method based on polyhedral model modeling according to the present invention;
FIG. 2 is a flowchart of step S100 in the preferred embodiment of the method for automatic parallel strategy search based on polyhedron modeling according to the present invention;
FIG. 3 is a flowchart of step S300 of the method for automatic parallel strategy search based on polyhedron modeling according to the preferred embodiment of the present invention;
FIG. 4 is a schematic diagram of node splitting and node aggregation provided by the present invention;
FIG. 5 is a flowchart of step S400 in the preferred embodiment of the method for automatic parallel strategy search based on polyhedral model modeling according to the present invention;
FIG. 6 is a schematic diagram illustrating a segmentation of a computation graph in a data parallel mode according to the present invention;
FIG. 7 is a schematic diagram illustrating a segmentation of a computation graph in a pipeline parallel mode according to the present invention;
FIG. 8 is a functional block diagram of an automated parallel policy search system provided by the present invention;
FIG. 9 is an architectural relationship diagram of a Pythrch framework and an automatic parallel policy search system according to the present invention;
FIG. 10 is a schematic diagram illustrating an operating environment of a controller according to a preferred embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Firstly, a multi-machine multi-card distributed training deep learning model becomes the most important technical scheme for accelerating the training efficiency of the model at present, the main distributed training modes include data parallel, model parallel, optimizer parallel, pipeline parallel, mixed parallel and the like, the parallel modes are also parallel functions mainly supported by the mainstream deep learning framework (such as TensorFlow, Pythrch, Mindspore and PaddlePaddle) at present, different AI frameworks have differences in usability and efficiency, however, the parallel modes almost all need an algorithm developer to realize the parallel modes by calling a framework API and simultaneously carry out training efficiency optimization manually, and therefore the parallel modes are very difficult for the algorithm developer who does not know the implementation mechanism and the cluster communication characteristics of the bottom-layer AI framework, and the complex implementation and debugging bring about great reduction of the efficiency.
On the other hand, because the search efficiency of the automatic parallel strategy is exponentially increased with the AI model computation graph and the cluster resource scale, the search efficiency of some existing automatic parallel search works cannot meet the efficiency requirement of a large-parameter model on a large-scale cluster, and therefore a high-efficiency multi-machine multi-card distributed training parallel strategy search method needs to be designed and realized to solve the problem of automatic parallel training strategy search of the large model, so that an algorithm developer can focus on only the development of algorithm logic and quickly realize distributed training on the AI cluster.
In order to solve the problems in the prior art, the invention provides an automatic parallel strategy searching method based on polyhedral model modeling and related equipment. The generated model calculation graph is converted and balanced, the calculation graph is balanced, and the polyhedral model example is created according to the balanced calculation graph under the framework of the polyhedral model, so that the parallel strategy is automatically output according to the polyhedral model example, the process of modeling algorithm logics under different frameworks under the polyhedral model and automatically outputting the parallel strategy capable of being efficiently executed is realized, the efficiency of searching the parallel strategy is effectively improved, and the difficulty of distributed training development and efficiency optimization of a deep learning algorithm is reduced.
The following describes a design scheme of an automatic parallel strategy search method based on polyhedral model modeling by using a specific exemplary embodiment, and it should be noted that the following embodiment is only used for explaining the technical scheme of the invention, and is not specifically limited:
referring to fig. 1, the method for searching an automatic parallel strategy based on polyhedral model modeling according to the present invention includes:
and S100, obtaining a model calculation graph of the deep learning algorithm according to the model object input by the user.
The model object refers to a stand-alone training code of a deep learning algorithm predefined by a user, such as a BERT model and a GPT3 model (GPT-3 is constructed by an independent AI research and deployment company OpenAI, and is a large-scale natural language model currently running on microsoft Azure).
Specifically, under different deep learning model frameworks (such as tensrflow, pitorch, Mindspore, and paddlepaddlel), corresponding model objects (algorithm logic predefined by a user) are different, so that a model calculation map of the deep learning algorithm needs to be obtained according to a single-machine training code of the deep learning algorithm based on the AI framework definition input by the user on the basis of adopting a corresponding method. The model computation graph is a representation form of a deep learning algorithm in an AI framework, a Directed Acyclic Graph (DAG) is generally used to represent a computation process of the deep learning algorithm at present, nodes of the computation graph represent computation operations, and edges represent tensor data dependencies between the computation operations of the algorithm. The process of generating the model computation graph described above functions as a computation process for representing an algorithm.
Further, referring to fig. 2, step 100 specifically includes:
s110, obtaining an algorithm model according to a model object input by a user;
s120, after a numerical value is randomly input into the algorithm model, recording the calculation process of the algorithm model to obtain a model calculation graph;
s130, or generating a syntax tree after analyzing the model object through a python interpreter, and analyzing the syntax tree to obtain the model calculation diagram.
Specifically, taking a Pytorch frame as an example, a model computation graph defined by an algorithm developer can be obtained by using a exit. Trace method is based on vector calculation tracking mode, after obtaining an algorithm model according to model object input by user, randomly inputting a numerical value according to input type requirement of algorithm, then recording each calculation process of algorithm, and these recorded calculation processes constitute a calculation graph; for example, if the algorithm model is to input a 32X32 picture and the output is the category label of the picture, the random input refers to randomly inputting a 32X32 data. Script method is based on the source code conversion mode, the basic principle is that firstly, the model object (user-defined algorithm logic) is analyzed into a syntax tree (tree structure for describing the world truth by a computer) through a python interpreter, and then a model calculation diagram is obtained by analyzing the syntax tree.
Please refer to fig. 1, S200, the model calculation graph is converted to obtain a converted model calculation graph.
Specifically, because there are a plurality of defining ways for the computation graphs under different frames, that is, the computation graphs under different frames represent differences, for example, a Torch computation Graph (Torch Graph) is used for a pytoch frame, it is necessary to convert the computation graphs of different frames into a computation Graph in a general intermediate representation form, whereas in a general case, the intermediate representation method of the computation Graph is manually defined by an expert and is a process for manually establishing rule mapping, so that the intermediate representation conversion of different computation graphs is performed based on respective rules.
Further, step S200 specifically includes:
and S210, re-representing the model calculation graph by using a predefined intermediate representation method to obtain the converted model calculation graph.
Specifically, the computation graphs under different frames are represented again by an intermediate representation method which is defined by a user in advance, so that the computation graphs under different frames are converted into the computation graphs used in the parallel strategy search, the intermediate representation method is a universal method, and the specific definitions are respectively 4 class definitions in the C + + implementation; wherein the predefined intermediate representation is: IRType (type), IRvalue (value), IRNode (node), and IRGraph (graph).
Please refer to fig. 1, S300, the transformed model calculation graph is equalized to obtain an equalization calculation graph.
Specifically, the computation workload of many different node operations in the computation graph is different, which results in imbalance of the computation graph, and in order to obtain small nodes with balanced sizes when nodes are split in the following steps and to improve the efficiency of the automatic search parallel strategy, the nodes in the computation graph need to be balanced, that is, node splitting and node aggregation operations are performed; the purpose of the calculation graph balancing process is to perform calculation balancing on the calculation graph in the transverse dimension and the longitudinal dimension, and the purpose of the calculation balancing is to re-aggregate or split nodes in the calculation graph, so that the calculation balance of the converted calculation graph in the two dimensions can be achieved, and uneven calculation distribution during parallel strategy search is avoided.
Further, referring to fig. 3, step S300 specifically includes:
s310, setting an average calculated quantity threshold value of the nodes, and comparing the calculated quantity nodes in the converted model calculation graph with the average calculated quantity threshold value;
and S320, fusing adjacent calculation amount nodes smaller than the average calculation threshold value, and splitting the calculation amount nodes larger than the average calculation threshold value to obtain a balance calculation graph.
Specifically, the equalization processing mainly includes performing aggregation or splitting operation on nodes in the model calculation graph; before node aggregation or node difference operation, firstly, setting an average calculated quantity threshold value of a node, traversing the calculated quantity nodes in the converted model calculation graph according to the node sequence, and comparing the calculated quantity nodes in the converted model calculation graph with the average calculated quantity threshold value; then, the node aggregation operation is: fusing adjacent calculated amount nodes smaller than the average calculation threshold; the node splitting operation is: splitting the computation amount nodes larger than the average computation threshold into a plurality of nodes according to the average computation amount threshold, wherein the splitting is usually performed on matrix multiplications larger than the average computation threshold, so that the computation amount of the fused and split nodes is equivalent to the average computation amount threshold; after the node aggregation or node splitting operation is completed, the equilibrium calculation graph can be obtained.
According to the invention, the balanced calculation graph is obtained by splitting the large nodes in the calculation graph or aggregating a plurality of adjacent small nodes, so that the calculated amount among the nodes is balanced, the problem of uneven calculation distribution during parallel strategy search can be effectively avoided, and the node segmentation efficiency is improved.
The node aggregation is to convert the calculation process of a plurality of nodes into one node on the representation, for example, 3 calculation nodes are aggregated, the 3 nodes before aggregation are three IRNode objects respectively, and a new IRNode object is formed after aggregation; the segmentation of the nodes with large calculation amount is determined according to the operator type of the specific nodes, not all the nodes can be split, for example, the matrix multiplication is a node which can be split, and the node is also a common basic operator in the AI model. The specific operation process of node splitting and node aggregation is shown in fig. 4, which represents the process of node segmentation with large computation amount and adjacent node aggregation with small computation amount.
With reference to fig. 1, S400, a polyhedron model instance is created according to the equilibrium computation graph, and a parallel policy is output according to the polyhedron model instance.
Specifically, the process belongs to training modeling based on a polyhedral optimization model and a parallel strategy searching process, and the training modeling process is to initialize under an established polyhedral optimization model framework according to a balanced computation graph and create a polyhedral optimization model instance, namely, to model into a specific object; then, the parallel policy search refers to, based on the polyhedral optimization model instance, according to a specified parallel mode (the specified parallel mode refers to a parallel mode that a developer needs to perform, such as data parallel, model parallel, optimizer parallel, pipeline parallel, hybrid parallel, and the like), taking the pipeline parallel policy search as an example: the policy values of the number of pipelines and the number of mini-lots are searched according to the pipeline mode specified by the developer, and the final parallel policy is generated. The parallel mode is a large framework, for example, data parallel refers to a process of distributing training data samples to a plurality of computing devices for distributed computing when an AI model is in distributed training, and the parallel strategy refers to a distributed training segmentation strategy when a specific algorithm model is executed.
The polyhedron model modeling is a common method for compiling optimization for a for loop in a compiler, and by representing the loop to a polyhedron model space, the loop parallel computing optimization can be directly realized through a mapping table of the polyhedron model, so that the training efficiency is improved; in the invention, a training calculation process of a deep learning model is expressed as multi-layer cycle operation, how to express the training process through a polyhedron model is defined, and finally common data parallelism, model parallelism and pipeline parallelism are unified under a polyhedron modeling framework.
Assume a deep learning model represented by a Directed Acyclic Graph (DAG) as a computational graph D (N, E), where N is a set of computational graph nodes and E is a set of edges of the computational graph. Therefore, the deep learning model training calculation process (actually, a modeling process) is expressed as follows (expressed in python programming language), so that the for loop is mapped on the polyhedral optimization model:
for e in range (Epoch _ num): // Epoch _ num: model training round number, i.e. the round number range (domain) of for loop;
for b in range (Batch _ num): // Batch _ num: the model trains the sample batch number of a round, namely the batch number range of the for cycle;
for nodes in nodes: // node: a node of the model; nodes: a set of nodes of the model computation graph;
out ═ Forward (node, b)// Forward (), denotes Forward calculation of the model;
for node in reverse (nodes): // reverse (): representing the reverse ordering of model computing nodes;
grad ═ backfred (node, b)// grad: representing the gradient of the model parameter; backward (): an inverse gradient calculation representing the model;
update (node)// update (): representing the parameter update process of the model.
Further, referring to fig. 5, the step S400 specifically includes:
s410, creating a polyhedral optimization model;
s420, initializing the polyhedral optimization model according to the balance calculation chart to obtain a polyhedral model example;
and S430, outputting a parallel strategy according to the polyhedral model example and the number of the computing resources input by the user. The parallel strategy comprises data parallel segmentation dimensionality, pipeline arrangement and the like.
Specifically, a balanced computation graph is mapped on a polyhedral model to obtain a polyhedral optimization model, wherein the mapping can be understood as affine transformation of a coordinate space, such as translation, rotation and other operations, then the balanced computation graph is input into the polyhedral optimization model to obtain a polyhedral model instance (a specific object), namely, a for loop is mapped on the polyhedral optimization model to obtain a polyhedral model instance (searching for a distributed optimization computation model), then, a geometrically linear transformation is performed to find a data parallel segmentation dimension or a pipeline parallel segmentation dimension of distributed training, and the number of computation resources input by a user is combined, for example, the computation resources are 8 GPU cards, so that data parallel can be segmented into 8 mini-batch cards from the data parallel segmentation dimension, and then, a parallel strategy is comprehensively output (data parallel segmentation dimension is calculated on the 8 GPU cards respectively), Pipeline parallel segmentation dimension and pipeline arrangement, etc.). The specific data parallel segmentation dimension and pipeline parallel segmentation dimension segmentation processes are respectively shown in fig. 6 and fig. 7, wherein the abscissa represents the number of sample batches for model training, and the ordinate represents the number of layers of model parameters, for example, the BERT basic model calculation process has 24 layers; points in the first quadrant represent the calculation process of a certain sample batch on a certain layer of the model, and arrows represent the data dependence or time dependence of the model training calculation process; a cut example of the model in distributed training is represented in a dotted line box; FIG. 6 is a diagram slicing example of the computation graph in the data parallel mode, in which slicing is performed directly from the batch direction; fig. 7 shows an example of the computation graph slicing in the pipeline parallel mode, slicing being performed in a lateral direction with a misalignment.
According to the method, the training calculation process of the deep learning model is uniformly described, the calculation process is described into the polyhedral model, and through the polyhedral model, the commonly used distributed parallel training modes such as data parallel and pipeline parallel can be unified through the obtained polyhedral optimization model, so that a feasible parallel strategy can be searched through simple mapping transformation of the polyhedral model.
Continuing with FIG. 1, S500, the underlying framework is invoked to execute the parallel policy. Wherein the execution API is a run manager (runtime interface) in the underlying AI framework.
Specifically, a data parallel segmentation dimension or a pipeline parallel segmentation dimension of distributed training is found under a polyhedral model, a parallel strategy is automatically output by combining the number of computing resources input by a user, and finally the parallel strategy is executed by calling a bottom-layer framework execution API. Wherein the API is an interactive interface.
Further, step S500 specifically includes:
s510, calling an execution API of the bottom framework to execute the parallel strategy. Wherein the execution API is an execution manager in the underlying AI framework.
Specifically, after a polyhedral model instance is created under a polyhedral model, the data parallel segmentation dimension or the pipeline parallel segmentation dimension of the distributed training is found through the polyhedral model instance, the distributed training execution strategy of the algorithm model is automatically output by combining the number of computing resources input by a user, and then a bottom-layer framework execution API is required to be called to realize the execution of the distributed training strategy of the algorithm model.
Further, referring to fig. 8, based on the above method for automatically searching parallel strategies based on polyhedral model modeling, the present invention further provides an automatically searching parallel strategies system, which further includes: the computational graph generating module 100, the computational graph converting module 200, the computational graph balancing module 300, the parallel policy searching module 400, and the parallel policy executing module 500.
Specifically, the computation graph generating module 100 is configured to obtain a model computation graph of a deep learning algorithm according to a model object input by a user; the computation graph conversion module 200 is configured to convert the model computation graph to obtain a converted model computation graph; the computation graph balancing module 300 is configured to perform balancing processing on the converted model computation graph to obtain a balanced computation graph; the parallel policy search module 400 is configured to create a polyhedral model instance according to the equilibrium computation graph, and output a parallel policy according to the polyhedral model instance; the parallel policy executing module 500 is configured to invoke a bottom framework to execute the parallel policy.
The realization of the automatic parallel strategy searching method based on the polyhedral model modeling can be regarded as a middleware for supporting an AI frame to realize the automatic parallel function, and as shown in FIG. 9, a relation diagram of the function and a Pythrch frame is provided. Firstly, a user adopts a Pythrch front-end API to define an algorithm, and the real user does not need to consider parallel segmentation of the algorithm but only needs to consider a single algorithm to realize logic; therefore, a user only needs to pay attention to the application method of the API provided by the pytorech frame for the algorithm, and the single-machine algorithm is realized through the API, as shown in fig. 9, the vertical arrangement on the left side is the basic architecture of the pytorech frame, the part on the right side is the automatic parallel policy search system, the main interfaces of the automatic parallel policy search system and the pytorech frame are the computer diagram representation of the algorithm and the bottom layer frame runtime interface, and the user is almost unaware of the automatic parallel policy search system.
Then, the algorithm defined by the user outputs a computation graph through a JIT (just-in-time compilation) module of the Pytrch, the computation graph is input into the automatic parallel strategy search system and then is successively input into a computation graph generation module, a computation graph conversion module, a computation graph equalization module and a parallel strategy search module, finally, a computation subgraph which is segmented according to the parallel strategy is output, and the computation subgraph is executed by calling a bottom running API (internal function API of the Pytrch framework) of the Pytrch.
Further, the present invention also provides a controller, which includes a processor 10, a memory 20 and a display 30. Fig. 10 shows only some of the components of the controller, but it should be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.
The memory 20 may in some embodiments be an internal storage unit of the controller, such as a hard disk or a memory of the controller. The memory 20 may also be an external storage device of the controller in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the controller. Further, the memory 20 may also include both an internal storage unit of the controller and an external storage device. The memory 20 is used for storing application software installed in the controller and various types of data. The memory 20 may also be used to temporarily store data that has been output or is to be output. In one embodiment, the memory 20 stores an automatic parallel policy search program 40, and the automatic parallel policy search program 40 is executable by the processor 10, so as to implement the automatic parallel policy search method based on polyhedral model modeling in the present invention.
The processor 10 may be, in some embodiments, a Central Processing Unit (CPU), a microprocessor or other data Processing chip, and is configured to run program codes stored in the memory 20 or process data, such as executing the automatic parallel strategy search method based on polyhedral model modeling.
The display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch panel, or the like in some embodiments. The display 30 is used for displaying information at the device and for displaying a visual user interface. The components 10-30 of the device communicate with each other via a system bus.
In one embodiment, the following steps are implemented when the processor 10 executes the automatic parallel policy search routine 40 in the memory 20:
obtaining a model calculation graph of a deep learning algorithm according to a model object input by a user;
converting the model calculation graph to obtain a converted model calculation graph;
carrying out equalization processing on the converted model calculation graph to obtain an equalization calculation graph;
according to the equilibrium calculation diagram, creating a polyhedral model example, and outputting a parallel strategy according to the polyhedral model example;
and calling the bottom framework to execute the parallel strategy.
The step of obtaining the model computation graph of the deep learning algorithm according to the model object input by the user specifically includes:
after a numerical value is randomly input into the algorithm model, recording the calculation process of the algorithm model to obtain a model calculation graph; the model object refers to a single-machine training code of a deep learning algorithm defined by a user in advance.
Or analyzing the model object through a python interpreter to generate a syntax tree, and analyzing the syntax tree to obtain the model calculation diagram.
Wherein, the step of converting the model calculation diagram to obtain a converted model calculation diagram specifically includes:
and re-representing the model calculation graph by using a predefined intermediate representation method to obtain the converted model calculation graph.
Wherein, the step of performing equalization processing on the converted model calculation graph to obtain an equalization calculation graph specifically includes:
and splitting the large nodes in the converted model calculation graph, and performing aggregation operation on the adjacent small nodes in the converted model calculation graph to obtain the balanced calculation graph.
The step of creating a polyhedral model instance according to the equilibrium computation graph and outputting a parallel policy according to the polyhedral model instance specifically includes:
creating a polyhedral optimization model; initializing the polyhedral optimization model according to the equilibrium calculation graph to obtain a polyhedral model example;
and outputting a parallel strategy according to the polyhedral model example and the quantity of the computing resources input by the user.
The step of calling the bottom layer framework to execute the parallel policy specifically comprises the following steps:
an execution API of the underlying framework is invoked to execute the parallel policy.
Further, the present invention also provides a computer readable storage medium, wherein the computer readable storage medium stores an automatic parallel policy search program 40, and the automatic parallel policy search program 40, when executed by a processor, implements the steps of the automatic parallel policy search method based on polyhedral model modeling as described above; the steps of the automatic parallel strategy searching method based on the polyhedral model modeling are described in detail, and are not repeated herein.
In summary, the present invention provides an automatic parallel policy search method based on polyhedral model modeling and related devices, where the automatic parallel policy search method based on polyhedral model modeling includes: obtaining a model calculation graph of a deep learning algorithm according to a model object input by a user; converting the model calculation graph to obtain a converted model calculation graph; carrying out equalization processing on the converted model calculation graph to obtain an equalization calculation graph; according to the equilibrium calculation diagram, creating a polyhedral model example, and outputting a parallel strategy according to the polyhedral model example; and calling the bottom framework to execute the parallel strategy. According to the method, the model calculation diagram is converted and balanced, and after the polyhedral model instance is created under the framework of the polyhedral model, the parallel strategy is automatically output, so that different algorithm logics are modeled under the polyhedral model, the parallel strategy process is automatically output, the efficiency of parallel strategy search is improved, and the distributed training development and efficiency tuning difficulty of the deep learning algorithm are reduced.
It should be understood that equivalents and modifications of the technical solution and inventive concept thereof may occur to those skilled in the art, and all such modifications and alterations should fall within the scope of the appended claims.

Claims (13)

1. The automatic parallel strategy searching method based on the polyhedron model modeling is characterized by comprising the following steps of:
obtaining a model calculation graph of a deep learning algorithm according to a model object input by a user;
converting the model calculation graph to obtain a converted model calculation graph;
carrying out equalization processing on the converted model calculation graph to obtain an equalization calculation graph;
according to the equilibrium calculation diagram, creating a polyhedral model example, and outputting a parallel strategy according to the polyhedral model example;
and calling the bottom framework to execute the parallel strategy.
2. The method according to claim 1, wherein the step of obtaining the model computation graph of the deep learning algorithm according to the model object input by the user specifically comprises:
obtaining an algorithm model according to a model object input by a user;
after a numerical value is randomly input into the algorithm model, recording the calculation process of the algorithm model to obtain a model calculation graph;
or analyzing the model object through a python interpreter to generate a syntax tree, and analyzing the syntax tree to obtain the model calculation diagram.
3. The method according to claim 2, wherein the step of converting the model computation graph to obtain the converted model computation graph specifically comprises:
and re-representing the model calculation diagram by using a predefined intermediate representation method to obtain the converted model calculation diagram.
4. The method according to claim 3, wherein the step of performing equalization processing on the converted model computation graph to obtain an equalization computation graph specifically comprises:
setting an average calculated quantity threshold value of the nodes, and comparing the calculated quantity nodes in the converted model calculation graph with the average calculated quantity threshold value;
and fusing adjacent calculation quantity nodes smaller than the average calculation threshold, and splitting the calculation quantity nodes larger than the average calculation threshold to obtain the balance calculation graph.
5. The method according to claim 4, wherein the steps of creating a polyhedral model instance according to the equilibrium computation graph and outputting a parallel policy according to the polyhedral model instance specifically include:
creating a polyhedron optimization model;
initializing the polyhedral optimization model according to the equilibrium calculation graph to obtain a polyhedral model example;
and outputting a parallel strategy according to the polyhedral model example and the quantity of the computing resources input by the user.
6. The method according to claim 3, wherein the step of invoking the underlying framework to execute the parallel policy specifically comprises:
an execution API of the underlying framework is invoked to execute the parallel policy.
7. The method according to claim 1, wherein the model object refers to a stand-alone training code of a deep learning algorithm predefined by a user.
8. The automated parallel strategy searching method based on polyhedral model modeling according to claim 3, wherein the predefined intermediate representation comprises: IRType, IRValue, IRNode, and IRGraph.
9. The automated parallel strategy searching method based on polyhedron model modeling of claim 5, wherein the parallel strategy comprises data parallel segmentation dimension and pipeline parallel segmentation dimension.
10. The method of claim 6, wherein the execution API is a run manager in an underlying AI framework.
11. An automated parallel policy search system, the automated parallel policy search system comprising:
the calculation map generation module is used for obtaining a model calculation map according to the deep learning algorithm logic defined by the user;
the calculation graph conversion module is used for converting the model calculation graph to obtain a converted model calculation graph;
the computation graph balancing module is used for carrying out balancing processing on the converted model computation graph to obtain a balancing computation graph;
the parallel strategy searching module is used for creating a polyhedral model example according to the equilibrium computation graph and outputting a parallel strategy according to the polyhedral model example;
and the parallel strategy execution module is used for calling the bottom layer framework to execute the parallel strategy.
12. A controller, characterized in that the controller comprises: memory, a processor and a polyhedral model modeling based automatic parallel strategy searching program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the polyhedral model modeling based automatic parallel strategy searching method according to any one of claims 1 to 9.
13. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a polyhedral model modeling based automatic parallel policy search program, which when executed by a processor, implements the steps of the polyhedral model modeling based automatic parallel policy search method according to any one of claims 1 to 9.
CN202111646797.9A 2021-12-29 2021-12-29 Automatic parallel strategy searching method based on polyhedron model modeling and related equipment Pending CN114925591A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111646797.9A CN114925591A (en) 2021-12-29 2021-12-29 Automatic parallel strategy searching method based on polyhedron model modeling and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111646797.9A CN114925591A (en) 2021-12-29 2021-12-29 Automatic parallel strategy searching method based on polyhedron model modeling and related equipment

Publications (1)

Publication Number Publication Date
CN114925591A true CN114925591A (en) 2022-08-19

Family

ID=82804123

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111646797.9A Pending CN114925591A (en) 2021-12-29 2021-12-29 Automatic parallel strategy searching method based on polyhedron model modeling and related equipment

Country Status (1)

Country Link
CN (1) CN114925591A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115730681A (en) * 2022-11-11 2023-03-03 北京百度网讯科技有限公司 Model training method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115730681A (en) * 2022-11-11 2023-03-03 北京百度网讯科技有限公司 Model training method, device, equipment and storage medium
CN115730681B (en) * 2022-11-11 2023-08-15 北京百度网讯科技有限公司 Model training method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN115543639B (en) Optimization method for performing deep learning tasks in distributed mode and distributed system
Fei et al. A dataflow-based scientific workflow composition framework
US10678573B2 (en) System and method for simulating virtual machine (VM) placement in virtual datacenters
Chen et al. MRGIS: A MapReduce-Enabled high performance workflow system for GIS
US20200034750A1 (en) Generating artificial training data for machine-learning
US9684493B2 (en) R-language integration with a declarative machine learning language
JP2020013608A (en) Data processing graph compilation
Lin et al. Efficient GPU computation using task graph parallelism
CN112559053B (en) Data synchronization processing method and device for reconfigurable processor
Valencia-Cabrera et al. Simulation challenges in membrane computing
Doka et al. Mix ‘n’match multi-engine analytics
CN115525287A (en) Multi-stage compiler architecture
US20240070512A1 (en) Quantum computing system and method
Guo et al. Automated exploration and implementation of distributed cnn inference at the edge
CN114925591A (en) Automatic parallel strategy searching method based on polyhedron model modeling and related equipment
CN117291260A (en) Deep learning framework adaptation method, deep learning framework adaptation device, deep learning framework adaptation equipment, deep learning framework adaptation storage medium and deep learning framework adaptation product
US11573777B2 (en) Method and apparatus for enabling autonomous acceleration of dataflow AI applications
CN113420466B (en) Cross-platform automatic performance optimization oriented unit computing component and method
US10795682B2 (en) Generating vector based selection control statements
Olden Performance Analysis of SWE Implementations based on modern parallel Runtime Systems
Pllana et al. A novel approach for hybrid performance modelling and prediction of large-scale computing systems
WO2023071509A1 (en) Model compilation method and apparatus, and model running system
Gibson Deep learning on a low power gpu
Arjavalingam et al. HASTE: Serverless DAG Execution Optimizer
Eckardt et al. Simulation of a Simplified Ecosystem to Study the Influence of Environmental Factors on Bee Populations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination