CN116225452A

CN116225452A - Multi-level intermediate code-based graph neural network compiling optimization method

Info

Publication number: CN116225452A
Application number: CN202310227947.5A
Authority: CN
Inventors: 卢冶; 仪德智; 杨航
Original assignee: Nankai University
Current assignee: Nankai University
Priority date: 2023-03-10
Filing date: 2023-03-10
Publication date: 2023-06-06

Abstract

The invention provides a graph neural network compiling optimization method based on a multi-level intermediate code, which belongs to the technical field of graph neural networks and comprises the following steps: step 1: preprocessing the graph neural network model, and then training the model; step 2: converting the trained graph neural network model into ONNX format; step 3: converting the ONNX-formatted graph neural network model into IR by using an ONNX-MLIR front end; step 4: performing IR compiling optimization; step 5: an executable program is generated. According to the invention, the MLIR is utilized to carry out fine granularity optimization on the intermediate codes of the GNNs model at different levels, and the portability of the GNNs model codes on different hardware platforms is improved.

Description

Multi-level intermediate code-based graph neural network compiling optimization method

Technical Field

The invention belongs to the technical field of graphic neural networks, and particularly relates to a graphic neural network compiling optimization method based on multi-level intermediaries.

Background

In recent years, neural networks have grown in popularity in various pattern recognition and data mining studies. In the past, most machine learning tasks had to rely on manual feature extraction that was time consuming and laborious and of low accuracy. The work of feature extraction has now been replaced by various types of deep neural network (Deep Neural Networks, DNN) models. Many successful deep learning applications are based on data in the European style space, such as speech sequences in natural language processing, and images in computer vision, and are characterized by fixed rules and sequences of arrangement of nodes within the data. However, most data does not originate from euclidean space in practical application scenarios. Conventional deep learning will therefore have difficulty in efficient feature extraction of data in such scenarios. For example: the data of hardware circuit design, traffic network, biological molecular structure, etc. do not belong to European space, but are stored in the form of graph. The study of the graph neural networks (Graph Neural Networks, GNNs) is to better learn and extract the structural features of the graph data and make more accurate predictions of downstream applications.

Although GNNs exhibit excellent learning ability in solving various tasks related to graph structures, they have problems of poor portability, long time consumption for reasoning, bulkiness of model, and the like, as in DNNs. In contrast, GNNs have the following three specificities relative to DNNs, resulting in an optimization process that does not fully follow the paradigm of DNNs: a messaging mechanism between gnns nodes; 2. the widespread use of sparse matrices in GNNs; the relative irregularities of data access by gnns. For these characteristics, the compiling optimization of GNNs has a wide optimization space to be explored.

Multilevel middleware (Multi-Level Intermediate Representation, MLIR) is a sub-item under the underlying virtual machine (Low Level Virtual Machine, LLVM) compiler architecture system. MLIR provides a reusable, scalable infrastructure for code generation, conversion and optimization of compiler front-end intermediaries/intermediate representations (Intermediate Representation, IR). Compared to the traditional compiler construction method, the use of MLIR has several advantages: 1. everything can be customized. The infrastructure provides a minimum set of built-in functions with which a user can have maximum freedom of design to customize the personalized IR.2. The multi-stage IR drops stepwise. The traditional compilers are all one-step descending from high-level semantics to lower-level semantics, such as C to assembly language, and the huge gap crossing the two is not only a difficult process, but also brings inconvenience to fine-granularity compiling optimization. The multi-layer IR stepping down allows optimization at different levels, even by integrating high-level semantic information with low-level semantics. 3. The modules are reusable. Before the generation of MLIR, each time a new domain-specific language is developed, each link needs to be redeveloped from the front end to the back end. But some language characteristics are common among different languages. With the MLIR infrastructure, code reuse can be made simpler and more efficient. 4. Better portability. As long as within the MLIR infrastructure, different IR can be converted by different dialects (dialects). Therefore, codes of different front ends can be more conveniently operated in different back ends after being transferred by IR, and portability of the front end codes is improved. The above characteristics are important reasons why MLIR can get a great deal of attention in the area of neural network compilers. There are two major neural network related dialects: the torch-MLIR and ONNX-MLIR are both supported only in limited ways for conventional DNNs, i.e., conversion and optimization of GNNs based on MLIR is still a blank.

Based on the background, the patent explores a compiling optimization method of the GNNs based on a multi-level intermediate code compiler infrastructure.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides a graph neural network compiling optimization method based on multi-level intermediate codes, which utilizes MLIR to carry out fine-grained optimization on the intermediate codes of GNNs model at different levels and improves the portability of the GNNs model codes on different hardware platforms.

The technical scheme adopted by the invention is as follows: a graph neural network compiling optimization method based on multi-level intermediaries comprises the following steps:

step 1: preprocessing the graph neural network model, and then training the model;

step 2: converting the trained graph neural network model into ONNX format;

step 3: converting the ONNX-formatted graph neural network model into IR by using an ONNX-MLIR front end;

step 4: performing IR compiling optimization;

step 5: an executable program is generated.

Further, in step 1, preprocessing includes rewriting the sparse tensor data type to a dense tensor type, and reconstructing the model using the torch.

Further, in step 4, the IR compilation optimization includes a layer IR optimization and an operator layer IR optimization,

layer IR optimization includes operator decomposition, shape inference, graph rewriting, and constant propagation;

the operator layer IR optimization includes loop unrolling, loop scheduling, loop blocking and memory allocation.

Further, in step 5, the optimized IR is converted into c++ code, and then an executable program is generated.

Further, the process of converting IR into c++ code is:

s1: specific definitions of the individual operations are defined against the dialect manual provided by the MLIR authorities;

s2: traversing the whole IR, and identifying each operation, and variable types, variable names and variable limits contained in each operation by using a regular expression;

s3: the C++ code for each operation is replaced with a template according to the definition of each operation.

Compared with the prior art, the invention has the following beneficial effects:

1. according to the invention, the GNNs model is optimized by using the MLIR, the running speed of the model after the MLIR optimization is greatly improved, compared with the model in PyTorch, the running speed is improved by about 5.53 times, and compared with the model in the C++ version which is not optimized, the running speed is improved by about 2.14 times.

2. The invention can lead the GNNs model to be compatible with ONNX standard by preprocessing the GNNs model.

3. According to the method, the sparse matrix is optimized, and the operations such as pre-access and the like of the sparse matrix are completed during compiling through optimization such as shape inference and the like, so that the performance is greatly improved during running; and developing a dialect of an information transmission mechanism, and optimizing the circulation in IR in a customized manner according to the characteristics of message transmission among nodes.

4. The invention extends the portability of GNNs models on different platforms using MLIR. IR is converted into C++ codes by design, so that transfer from Python to MLIR is realized, and finally the conversion into the C++ codes is realized; and then deployed onto the FPGA by means of HLS for further heterogeneous acceleration.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention;

FIG. 2 is a graph showing comparison of operating speeds in different environments according to an embodiment of the present invention.

Detailed Description

In order to make the technical scheme of the present invention better understood by those skilled in the art, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

The embodiment of the invention provides a graph neural network compiling optimization method based on multi-level intermediaries, which is shown in fig. 1 and comprises the following steps:

step 1: preprocessing the graph neural network model, wherein the preprocessing comprises the steps of rewriting sparse tensor data types into dense tensor types, reconstructing the model by using a torch.nn basis class, and then training the model.

The ONNX standard ecology is more traditional-oriented at present, and some data structures and operators in GNNs are not supported yet. Corresponding modifications to the original GNNs model are required to successfully convert. In order for the GNNs model to be compatible with the ONNX standard, the sparse tensor data types involved in the computation need to be all rewritten to dense tensor types. Because of compatibility issues, neither of the classes nor data types defined by the two common libraries PyTorch Geometric (PyG) and Deep Graph Library (DGL) that build the graph neural network can be used. The torch.nn-base class reconstruction model should be used prior to conversion.

Step 2: and converting the trained graph neural network model into ONNX format.

Step 3: the ONNX format of the neural network model is converted to IR using an ONNX-MLIR front end.

Two layers of IR are provided in ONNX-MLIR, the first layer being a higher layer of IR defined by the ONNX-MLIR dialect corresponding to the layers in the TVM and the second layer being a lower level of IR defined by the krnl dialect corresponding to the operator layers in the TVM.

Step 4: IR compilation optimization is performed. IR compilation optimization includes layer IR optimization and operator layer IR optimization.

Layer IR optimization includes operator decomposition, shape inference, graph rewriting, and constant propagation.

Operator decomposition: in ONNX, many operations can be replaced with other basic operations to achieve better performance. For example, the ReduceL1 operator is calculated mathematically by summing the absolute values of the elements in vector X, reducel1= ReduceSum (Absx). Operator decomposition is implemented, the operations that need to be replaced can be identified using the pattern rewrite module in the tablemen tool provided by MLIR, and the C++ function implementation can also be written by itself.

Shape estimation: type inference has been done at the time of semantic analysis, but the graph neural network model is tensor computationally intensive, with inference of the operands of each operation and its results not yet completed. Completing shape inference at compile time will accelerate runtime. The return type of an operation can generally be inferred from the derived return shape and element type. MLIR provides the InterShapeTypeTopInterface interface for implementing tensor-type shape inference.

Graph rewriting: since the computation in the neural network is represented by a dataflow graph, it is widely used in neural network optimization. In MLIR, pattern matching and graph overwriting can be performed using the tablegen tool. For example, the following rules describe that both onnx.add and onnx.matmul operators can be replaced with a single onnx.gemm operation.

In pat, the first input is a nested operation definition

AddOp (MatMulOp: resm1, m 2), m 3), represents that after MatMulOp ends its result is input as an input into the AddOp operator together with another variable. The second input is the pattern to which the matched pattern is to be replaced.

Constant propagation: constant propagation follows two principles: 1. if all inputs to an operation are constant, then the output is calculated and the operation is deleted during the compilation process; 2. if there is a mixture of constant and non-constant inputs, the operation is normalized, either converting it to constant or converting it to non-constant. The rules of constant propagation differ for different operations. But writing the propagated rule requires the following three steps to be completed: 1. registering for the constant propagation process of the operation using the tabnegen tool and declaring its mode; 2. allocating a buffer for input and output constants; 3. and calculating the result and writing the result into the allocated buffer area.

In neural network workload, the computational kernel is locally simple in structure, especially in a loop nested structure. The calculation of hyper-rectangles has arithmetic semantics of very intuitive rules, so the calculation of neural network models is very suitable for representation in polyhedral optimization models. The knl dialect aims to put loop optimization and scalar semantic optimization in the same level of IR. It can not only provide the readability of the polyhedral model representation, but also make the program semantics (or execution content) and program scheduling (how and when to execute) independent. In summary, the knl dialect can not only optimize the program, but also take on the function of optimizing the scheduling, which is lacking in other existing optimizing systems. The optimizations that are common in knrl dialects are all loop-based calculations. At this level, the krnl dialect cooperates with the built-in dialect of the MLIR, e.g., memref, and memory management can also be performed by allocating memory for the output tensor.

Step 5: an executable program is generated.

For example, for LLVM backend, after IR is optimized using knrl dialect, the optimized IR can be converted into the target program; the optimized IR can be directly lowered to LLVM IR using built-in dialects of MLIR, and then run directly on the LLVM backend.

Furthermore, FPGA accelerators are typically written using high level hierarchical synthesis (High level synthesis, HLS), which is c++ based. The traditional FPGA board deployment needs to carry out preliminary design verification on Python, and then equivalent C++ codes are written manually. However, implementation of c++ code is more difficult than Python. Each simulation verification of c++ code in HLS takes a lot of time, so hardware verification is always a bottleneck of the hardware design cycle, and typically, verification and testing takes 60% to 70% of the total cycle time of the hardware design. The automatic conversion of the Python model to c++ code by means of MLIR will save a lot of time for model rewrite and simulation verification.

Therefore, for the c++ backend, the optimized IR may be converted into c++ code first, and then the executable program may be generated.

The process of IR conversion to c++ code is:

According to the embodiment of the invention, the operation speed is improved to serve as an index to evaluate the effect of the operator and the cyclic optimization in the IR, and the time of model reasoning once is used as an index of evaluation. The model tested included: the PyTorch model uses a model implemented in C++, and the model after optimization and conversion to C++ using MLIR. Wherein, the model optimization level compiled by C++ is O ₀ A level. As can be seen from FIG. 2, the MLIR-optimized model has an approximately 5.53-fold increase in velocity relative to the model in PyTorch, and an approximately 2.14-fold increase in velocity compared to the non-optimized C++ version. The reasons for the performance improvement are considered by this patent to be the following two reasons:

1. after the code is converted into C++, the mechanism of the C++ compiler which is compiled before execution is faster than the mechanism of the interpreter in Python which is executed at the same time.

2. The code can bring improvement of performance after the optimization of ONNX-MLIR two-layer IR abstraction, such as redundant operation elimination, polyhedral model optimization and the like.

The present invention has been described in detail by way of examples, but the description is merely exemplary of the invention and should not be construed as limiting the scope of the invention. The scope of the invention is defined by the claims. In the technical scheme of the invention, or under the inspired by the technical scheme of the invention, similar technical schemes are designed to achieve the technical effects, or equivalent changes and improvements to the application scope are still included in the protection scope of the patent coverage of the invention.

Claims

1. The graph neural network compiling optimization method based on the multi-level intermediate codes is characterized by comprising the following steps of:

step 2: converting the trained graph neural network model into ONNX format;

step 4: performing IR compiling optimization;

step 5: an executable program is generated.

2. The method of claim 1, wherein in step 1, preprocessing includes rewriting sparse tensor data types to dense tensor types, and using a torch.

3. The method of claim 1, wherein in step 4, the IR compilation optimization comprises layer IR optimization and operator layer IR optimization,

4. The method for optimizing graph neural network compilation based on multi-level intermediaries according to claim 1, wherein in step 5, optimized IR is converted into c++ code, and then executable program is generated.

5. The method for optimizing compilation of a graph neural network based on multi-level intermediaries according to claim 4, wherein the process of converting IR into c++ code is as follows: