CN116225452A - Multi-level intermediate code-based graph neural network compiling optimization method - Google Patents

Multi-level intermediate code-based graph neural network compiling optimization method Download PDF

Info

Publication number
CN116225452A
CN116225452A CN202310227947.5A CN202310227947A CN116225452A CN 116225452 A CN116225452 A CN 116225452A CN 202310227947 A CN202310227947 A CN 202310227947A CN 116225452 A CN116225452 A CN 116225452A
Authority
CN
China
Prior art keywords
neural network
optimization
graph neural
model
mlir
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310227947.5A
Other languages
Chinese (zh)
Inventor
卢冶
仪德智
杨航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nankai University
Original Assignee
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nankai University filed Critical Nankai University
Priority to CN202310227947.5A priority Critical patent/CN116225452A/en
Publication of CN116225452A publication Critical patent/CN116225452A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/447Target code generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a graph neural network compiling optimization method based on a multi-level intermediate code, which belongs to the technical field of graph neural networks and comprises the following steps: step 1: preprocessing the graph neural network model, and then training the model; step 2: converting the trained graph neural network model into ONNX format; step 3: converting the ONNX-formatted graph neural network model into IR by using an ONNX-MLIR front end; step 4: performing IR compiling optimization; step 5: an executable program is generated. According to the invention, the MLIR is utilized to carry out fine granularity optimization on the intermediate codes of the GNNs model at different levels, and the portability of the GNNs model codes on different hardware platforms is improved.

Description

Multi-level intermediate code-based graph neural network compiling optimization method
Technical Field
The invention belongs to the technical field of graphic neural networks, and particularly relates to a graphic neural network compiling optimization method based on multi-level intermediaries.
Background
In recent years, neural networks have grown in popularity in various pattern recognition and data mining studies. In the past, most machine learning tasks had to rely on manual feature extraction that was time consuming and laborious and of low accuracy. The work of feature extraction has now been replaced by various types of deep neural network (Deep Neural Networks, DNN) models. Many successful deep learning applications are based on data in the European style space, such as speech sequences in natural language processing, and images in computer vision, and are characterized by fixed rules and sequences of arrangement of nodes within the data. However, most data does not originate from euclidean space in practical application scenarios. Conventional deep learning will therefore have difficulty in efficient feature extraction of data in such scenarios. For example: the data of hardware circuit design, traffic network, biological molecular structure, etc. do not belong to European space, but are stored in the form of graph. The study of the graph neural networks (Graph Neural Networks, GNNs) is to better learn and extract the structural features of the graph data and make more accurate predictions of downstream applications.
Although GNNs exhibit excellent learning ability in solving various tasks related to graph structures, they have problems of poor portability, long time consumption for reasoning, bulkiness of model, and the like, as in DNNs. In contrast, GNNs have the following three specificities relative to DNNs, resulting in an optimization process that does not fully follow the paradigm of DNNs: a messaging mechanism between gnns nodes; 2. the widespread use of sparse matrices in GNNs; the relative irregularities of data access by gnns. For these characteristics, the compiling optimization of GNNs has a wide optimization space to be explored.
Multilevel middleware (Multi-Level Intermediate Representation, MLIR) is a sub-item under the underlying virtual machine (Low Level Virtual Machine, LLVM) compiler architecture system. MLIR provides a reusable, scalable infrastructure for code generation, conversion and optimization of compiler front-end intermediaries/intermediate representations (Intermediate Representation, IR). Compared to the traditional compiler construction method, the use of MLIR has several advantages: 1. everything can be customized. The infrastructure provides a minimum set of built-in functions with which a user can have maximum freedom of design to customize the personalized IR.2. The multi-stage IR drops stepwise. The traditional compilers are all one-step descending from high-level semantics to lower-level semantics, such as C to assembly language, and the huge gap crossing the two is not only a difficult process, but also brings inconvenience to fine-granularity compiling optimization. The multi-layer IR stepping down allows optimization at different levels, even by integrating high-level semantic information with low-level semantics. 3. The modules are reusable. Before the generation of MLIR, each time a new domain-specific language is developed, each link needs to be redeveloped from the front end to the back end. But some language characteristics are common among different languages. With the MLIR infrastructure, code reuse can be made simpler and more efficient. 4. Better portability. As long as within the MLIR infrastructure, different IR can be converted by different dialects (dialects). Therefore, codes of different front ends can be more conveniently operated in different back ends after being transferred by IR, and portability of the front end codes is improved. The above characteristics are important reasons why MLIR can get a great deal of attention in the area of neural network compilers. There are two major neural network related dialects: the torch-MLIR and ONNX-MLIR are both supported only in limited ways for conventional DNNs, i.e., conversion and optimization of GNNs based on MLIR is still a blank.
Based on the background, the patent explores a compiling optimization method of the GNNs based on a multi-level intermediate code compiler infrastructure.
Disclosure of Invention
Aiming at the technical problems in the prior art, the invention provides a graph neural network compiling optimization method based on multi-level intermediate codes, which utilizes MLIR to carry out fine-grained optimization on the intermediate codes of GNNs model at different levels and improves the portability of the GNNs model codes on different hardware platforms.
The technical scheme adopted by the invention is as follows: a graph neural network compiling optimization method based on multi-level intermediaries comprises the following steps:
step 1: preprocessing the graph neural network model, and then training the model;
step 2: converting the trained graph neural network model into ONNX format;
step 3: converting the ONNX-formatted graph neural network model into IR by using an ONNX-MLIR front end;
step 4: performing IR compiling optimization;
step 5: an executable program is generated.
Further, in step 1, preprocessing includes rewriting the sparse tensor data type to a dense tensor type, and reconstructing the model using the torch.
Further, in step 4, the IR compilation optimization includes a layer IR optimization and an operator layer IR optimization,
layer IR optimization includes operator decomposition, shape inference, graph rewriting, and constant propagation;
the operator layer IR optimization includes loop unrolling, loop scheduling, loop blocking and memory allocation.
Further, in step 5, the optimized IR is converted into c++ code, and then an executable program is generated.
Further, the process of converting IR into c++ code is:
s1: specific definitions of the individual operations are defined against the dialect manual provided by the MLIR authorities;
s2: traversing the whole IR, and identifying each operation, and variable types, variable names and variable limits contained in each operation by using a regular expression;
s3: the C++ code for each operation is replaced with a template according to the definition of each operation.
Compared with the prior art, the invention has the following beneficial effects:
1. according to the invention, the GNNs model is optimized by using the MLIR, the running speed of the model after the MLIR optimization is greatly improved, compared with the model in PyTorch, the running speed is improved by about 5.53 times, and compared with the model in the C++ version which is not optimized, the running speed is improved by about 2.14 times.
2. The invention can lead the GNNs model to be compatible with ONNX standard by preprocessing the GNNs model.
3. According to the method, the sparse matrix is optimized, and the operations such as pre-access and the like of the sparse matrix are completed during compiling through optimization such as shape inference and the like, so that the performance is greatly improved during running; and developing a dialect of an information transmission mechanism, and optimizing the circulation in IR in a customized manner according to the characteristics of message transmission among nodes.
4. The invention extends the portability of GNNs models on different platforms using MLIR. IR is converted into C++ codes by design, so that transfer from Python to MLIR is realized, and finally the conversion into the C++ codes is realized; and then deployed onto the FPGA by means of HLS for further heterogeneous acceleration.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention;
FIG. 2 is a graph showing comparison of operating speeds in different environments according to an embodiment of the present invention.
Detailed Description
In order to make the technical scheme of the present invention better understood by those skilled in the art, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
The embodiment of the invention provides a graph neural network compiling optimization method based on multi-level intermediaries, which is shown in fig. 1 and comprises the following steps:
step 1: preprocessing the graph neural network model, wherein the preprocessing comprises the steps of rewriting sparse tensor data types into dense tensor types, reconstructing the model by using a torch.nn basis class, and then training the model.
The ONNX standard ecology is more traditional-oriented at present, and some data structures and operators in GNNs are not supported yet. Corresponding modifications to the original GNNs model are required to successfully convert. In order for the GNNs model to be compatible with the ONNX standard, the sparse tensor data types involved in the computation need to be all rewritten to dense tensor types. Because of compatibility issues, neither of the classes nor data types defined by the two common libraries PyTorch Geometric (PyG) and Deep Graph Library (DGL) that build the graph neural network can be used. The torch.nn-base class reconstruction model should be used prior to conversion.
Step 2: and converting the trained graph neural network model into ONNX format.
Step 3: the ONNX format of the neural network model is converted to IR using an ONNX-MLIR front end.
Two layers of IR are provided in ONNX-MLIR, the first layer being a higher layer of IR defined by the ONNX-MLIR dialect corresponding to the layers in the TVM and the second layer being a lower level of IR defined by the krnl dialect corresponding to the operator layers in the TVM.
Step 4: IR compilation optimization is performed. IR compilation optimization includes layer IR optimization and operator layer IR optimization.
Layer IR optimization includes operator decomposition, shape inference, graph rewriting, and constant propagation.
Operator decomposition: in ONNX, many operations can be replaced with other basic operations to achieve better performance. For example, the ReduceL1 operator is calculated mathematically by summing the absolute values of the elements in vector X, reducel1= ReduceSum (Absx). Operator decomposition is implemented, the operations that need to be replaced can be identified using the pattern rewrite module in the tablemen tool provided by MLIR, and the C++ function implementation can also be written by itself.
Shape estimation: type inference has been done at the time of semantic analysis, but the graph neural network model is tensor computationally intensive, with inference of the operands of each operation and its results not yet completed. Completing shape inference at compile time will accelerate runtime. The return type of an operation can generally be inferred from the derived return shape and element type. MLIR provides the InterShapeTypeTopInterface interface for implementing tensor-type shape inference.
Graph rewriting: since the computation in the neural network is represented by a dataflow graph, it is widely used in neural network optimization. In MLIR, pattern matching and graph overwriting can be performed using the tablegen tool. For example, the following rules describe that both onnx.add and onnx.matmul operators can be replaced with a single onnx.gemm operation.
Figure BDA0004119095370000051
In pat, the first input is a nested operation definition
AddOp (MatMulOp: resm1, m 2), m 3), represents that after MatMulOp ends its result is input as an input into the AddOp operator together with another variable. The second input is the pattern to which the matched pattern is to be replaced.
Constant propagation: constant propagation follows two principles: 1. if all inputs to an operation are constant, then the output is calculated and the operation is deleted during the compilation process; 2. if there is a mixture of constant and non-constant inputs, the operation is normalized, either converting it to constant or converting it to non-constant. The rules of constant propagation differ for different operations. But writing the propagated rule requires the following three steps to be completed: 1. registering for the constant propagation process of the operation using the tabnegen tool and declaring its mode; 2. allocating a buffer for input and output constants; 3. and calculating the result and writing the result into the allocated buffer area.
The operator layer IR optimization includes loop unrolling, loop scheduling, loop blocking and memory allocation.
In neural network workload, the computational kernel is locally simple in structure, especially in a loop nested structure. The calculation of hyper-rectangles has arithmetic semantics of very intuitive rules, so the calculation of neural network models is very suitable for representation in polyhedral optimization models. The knl dialect aims to put loop optimization and scalar semantic optimization in the same level of IR. It can not only provide the readability of the polyhedral model representation, but also make the program semantics (or execution content) and program scheduling (how and when to execute) independent. In summary, the knl dialect can not only optimize the program, but also take on the function of optimizing the scheduling, which is lacking in other existing optimizing systems. The optimizations that are common in knrl dialects are all loop-based calculations. At this level, the krnl dialect cooperates with the built-in dialect of the MLIR, e.g., memref, and memory management can also be performed by allocating memory for the output tensor.
Step 5: an executable program is generated.
For example, for LLVM backend, after IR is optimized using knrl dialect, the optimized IR can be converted into the target program; the optimized IR can be directly lowered to LLVM IR using built-in dialects of MLIR, and then run directly on the LLVM backend.
Furthermore, FPGA accelerators are typically written using high level hierarchical synthesis (High level synthesis, HLS), which is c++ based. The traditional FPGA board deployment needs to carry out preliminary design verification on Python, and then equivalent C++ codes are written manually. However, implementation of c++ code is more difficult than Python. Each simulation verification of c++ code in HLS takes a lot of time, so hardware verification is always a bottleneck of the hardware design cycle, and typically, verification and testing takes 60% to 70% of the total cycle time of the hardware design. The automatic conversion of the Python model to c++ code by means of MLIR will save a lot of time for model rewrite and simulation verification.
Therefore, for the c++ backend, the optimized IR may be converted into c++ code first, and then the executable program may be generated.
The process of IR conversion to c++ code is:
s1: specific definitions of the individual operations are defined against the dialect manual provided by the MLIR authorities;
s2: traversing the whole IR, and identifying each operation, and variable types, variable names and variable limits contained in each operation by using a regular expression;
s3: the C++ code for each operation is replaced with a template according to the definition of each operation.
According to the embodiment of the invention, the operation speed is improved to serve as an index to evaluate the effect of the operator and the cyclic optimization in the IR, and the time of model reasoning once is used as an index of evaluation. The model tested included: the PyTorch model uses a model implemented in C++, and the model after optimization and conversion to C++ using MLIR. Wherein, the model optimization level compiled by C++ is O 0 A level. As can be seen from FIG. 2, the MLIR-optimized model has an approximately 5.53-fold increase in velocity relative to the model in PyTorch, and an approximately 2.14-fold increase in velocity compared to the non-optimized C++ version. The reasons for the performance improvement are considered by this patent to be the following two reasons:
1. after the code is converted into C++, the mechanism of the C++ compiler which is compiled before execution is faster than the mechanism of the interpreter in Python which is executed at the same time.
2. The code can bring improvement of performance after the optimization of ONNX-MLIR two-layer IR abstraction, such as redundant operation elimination, polyhedral model optimization and the like.
The present invention has been described in detail by way of examples, but the description is merely exemplary of the invention and should not be construed as limiting the scope of the invention. The scope of the invention is defined by the claims. In the technical scheme of the invention, or under the inspired by the technical scheme of the invention, similar technical schemes are designed to achieve the technical effects, or equivalent changes and improvements to the application scope are still included in the protection scope of the patent coverage of the invention.

Claims (5)

1. The graph neural network compiling optimization method based on the multi-level intermediate codes is characterized by comprising the following steps of:
step 1: preprocessing the graph neural network model, and then training the model;
step 2: converting the trained graph neural network model into ONNX format;
step 3: converting the ONNX-formatted graph neural network model into IR by using an ONNX-MLIR front end;
step 4: performing IR compiling optimization;
step 5: an executable program is generated.
2. The method of claim 1, wherein in step 1, preprocessing includes rewriting sparse tensor data types to dense tensor types, and using a torch.
3. The method of claim 1, wherein in step 4, the IR compilation optimization comprises layer IR optimization and operator layer IR optimization,
layer IR optimization includes operator decomposition, shape inference, graph rewriting, and constant propagation;
the operator layer IR optimization includes loop unrolling, loop scheduling, loop blocking and memory allocation.
4. The method for optimizing graph neural network compilation based on multi-level intermediaries according to claim 1, wherein in step 5, optimized IR is converted into c++ code, and then executable program is generated.
5. The method for optimizing compilation of a graph neural network based on multi-level intermediaries according to claim 4, wherein the process of converting IR into c++ code is as follows:
s1: specific definitions of the individual operations are defined against the dialect manual provided by the MLIR authorities;
s2: traversing the whole IR, and identifying each operation, and variable types, variable names and variable limits contained in each operation by using a regular expression;
s3: the C++ code for each operation is replaced with a template according to the definition of each operation.
CN202310227947.5A 2023-03-10 2023-03-10 Multi-level intermediate code-based graph neural network compiling optimization method Pending CN116225452A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310227947.5A CN116225452A (en) 2023-03-10 2023-03-10 Multi-level intermediate code-based graph neural network compiling optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310227947.5A CN116225452A (en) 2023-03-10 2023-03-10 Multi-level intermediate code-based graph neural network compiling optimization method

Publications (1)

Publication Number Publication Date
CN116225452A true CN116225452A (en) 2023-06-06

Family

ID=86569263

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310227947.5A Pending CN116225452A (en) 2023-03-10 2023-03-10 Multi-level intermediate code-based graph neural network compiling optimization method

Country Status (1)

Country Link
CN (1) CN116225452A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117877623A (en) * 2023-12-13 2024-04-12 重庆大学 Optimal molecular substructure selection method based on multi-level interpretability characterization

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117877623A (en) * 2023-12-13 2024-04-12 重庆大学 Optimal molecular substructure selection method based on multi-level interpretability characterization

Similar Documents

Publication Publication Date Title
Huang et al. Gamepad: A learning environment for theorem proving
CN110187885B (en) Intermediate code generation method and device for quantum program compiling
Jin et al. Compiling onnx neural network models using mlir
Feldman et al. Translator writing systems
US8839212B2 (en) Method, apparatus and computer program product for automatically generating a computer program using consume, simplify and produce semantics with normalize, transpose and distribute operations
CN112100054A (en) Data management and control oriented program static analysis method and system
CN116225452A (en) Multi-level intermediate code-based graph neural network compiling optimization method
US11847436B2 (en) Machine learning (ML) model-based compiler
CN116560666B (en) AI front end unified computing method, device and medium based on multi-level code generation
CN112527304B (en) Self-adaptive node fusion compiling optimization method based on heterogeneous platform
CN116755667A (en) Low-code DSL language development method and device
Adams Modular grammars for programming language prototyping
KR102610431B1 (en) Apparatus and method for generating summary of program source code based on ai analysis
Rudi et al. CodeFlow: A code generation system for Flash-X orchestration runtime
CN1661552B (en) Process language for microprocessors with finite resources
Wang et al. Computation graph
Piñeiro et al. Perldoop2: A big data-oriented source-to-source Perl-Java compiler
Chang et al. Support NNEF execution model for NNAPI
Singh et al. Design and implementation of compiler
CN101957772A (en) Compiler for generating lightweight comparison instruction
Di Giacomo Metacasanova: a High-performance Meta-compiler for Domain-specific Languages
Urlea Optimal program variant generation for hybrid manycore systems
CN118312154A (en) Compiler generating method, compiler, and storage medium
Erbas A General-Purpose Machine Reasoning Engine
Feldman Edward W. Czeck

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination