CN112527304B - Self-adaptive node fusion compiling optimization method based on heterogeneous platform - Google Patents

Self-adaptive node fusion compiling optimization method based on heterogeneous platform Download PDF

Info

Publication number
CN112527304B
CN112527304B CN201910885756.1A CN201910885756A CN112527304B CN 112527304 B CN112527304 B CN 112527304B CN 201910885756 A CN201910885756 A CN 201910885756A CN 112527304 B CN112527304 B CN 112527304B
Authority
CN
China
Prior art keywords
dag
node
fusion
subgraph
heterogeneous platform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910885756.1A
Other languages
Chinese (zh)
Other versions
CN112527304A (en
Inventor
王飞
沈莉
吴伟
胡浩
钱宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201910885756.1A priority Critical patent/CN112527304B/en
Publication of CN112527304A publication Critical patent/CN112527304A/en
Application granted granted Critical
Publication of CN112527304B publication Critical patent/CN112527304B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention discloses a self-adaptive node fusion compiling and optimizing method based on a heterogeneous platform, which comprises the following steps of: s1, generating intermediate representation; s2, identifying a DAG fusion subgraph; s3, node fusion strategy; s4, cost evaluation; s5, adaptively selecting a node fusion strategy, namely adaptively selecting an optimal node fusion strategy according to the k fusion strategy cost obtained by calculation in S4 by combining the use conditions of a register, a cache and a memory at the rear end of a target; s6, fusing target related nodes, transferring the control flow and data flow relation of the DAG subgraph obtained by matching in the S23 to the fused DAG subgraph generated by the node fusion strategy selected in the S5 according to the node fusion strategy, replacing the DAG subgraph before fusion by using the fused DAG subgraph, and transferring to the S22; and S7, generating an object code, namely compiling the DAG after the degradation is finished by the compiler to generate a heterogeneous platform code. The method provides accurate guidance for node fusion optimization of the heterogeneous platform, can further excavate the potential of a heterogeneous platform composite instruction, and improves the performance of the heterogeneous platform.

Description

Self-adaptive node fusion compiling optimization method based on heterogeneous platform
Technical Field
The invention relates to a self-adaptive node fusion compiling optimization method based on a heterogeneous platform, and belongs to the technical field of compiler optimization.
Background
Reduced instruction set computers and complex instruction set computers are two architectures of current CPUs that differ in different CPU design concepts and methods. Early CPUs were all complex instruction set architectures designed to perform the required computational tasks with a minimum of machine language instructions. For a long time, the performance of computers has often been improved by increasing the complexity of the hardware, and a typical complex instruction computer contains at least 300 instructions, and some instructions even exceed 500 instructions. Although a complex instruction set computer can achieve a large performance improvement, for a typical program, 80% of instructions used in the calculation process only account for 20% of the instruction system of a processor, so that a huge imbalance exists between instructions and cost. Furthermore, although Very Large Scale Integration (VLSI) technology is now reaching a high level, it is difficult to implement all the hardware of a complex instruction set computer on one chip, which also hinders the development of single chip computers. The reduced instruction set system contains only those instructions that are frequently used and provides some of the necessary instructions to support the operating system and high-level languages. Computers using a reduced instruction set are not only simple in manufacturing process but also inexpensive.
The compound instruction is a special instruction which is added on the basis of the basic simplified instruction set and is used for improving the performance of the program and increasing the instruction parallelism. The appearance of compound instructions can be said to mark that a simplified instruction set computer and a complex instruction computer are gradually merging, for example, a common compound instruction, namely a multiply-add instruction, is a special multiply-add unit to complete multiply-add operation, and for some subjects of machine learning and scientific calculation, the multiply-add instruction is used quite frequently. The most common expression y = x × w + b in neural networks can be done by a multiply-add instruction, where x is [ x ] 1 ,x 2 ,…,x n ]W is [ w ] 1 ,w 2 ,…,w n ] T And b is a constant. There are also other compound instructions to speed up certain issues, so the potential of the CPU can be further released by using the compound instruction, and the performance of the CPU can be improved. The complex instruction completes complex functions through dedicated hardware logic, and compared with software implementation, the hardware implementation efficiency is higher. The instructions are widely used for improving the execution efficiency of the topic and achieving a good acceleration effect.
The node fusion optimization technology adopted by the traditional compiler mainly generates a compound instruction by calling a built-in function interface in source code or performing template matching by using intermediate representation and the like. The built-in function calling mode is strongly related to the back-end instruction information, which limits the optimization of nodes irrelevant to the target to a certain extent, is not beneficial to the development of the compiler optimization technology, and increases the complexity of developing programs by programmers. The template matching mode is used for generating the compound instruction by matching a subgraph and then replacing the subgraph with the corresponding compound instruction, and the mode does not fully consider the influence of an instruction set, data flow and control flow information on the compound instruction, so that the performance of the compound instruction of the generated executable file cannot be fully exerted, and the performance of the compound instruction of the processor is not favorably improved. The method has the advantages of simplicity and easiness in implementation, but information such as back-end characteristics and current data flow is not fully considered, so that the generated instruction sequence cannot achieve the expected acceleration effect, even backward acceleration can be caused, and the performance of the compound instruction is greatly limited.
Disclosure of Invention
The invention aims to provide a self-adaptive node fusion compiling and optimizing method based on a heterogeneous platform, which provides accurate guidance for node fusion optimization of the heterogeneous platform, can further excavate the potential of a heterogeneous platform compound instruction, and improves the performance of the heterogeneous platform.
In order to achieve the purpose, the invention adopts the technical scheme that: a self-adaptive node fusion compiling and optimizing method based on a heterogeneous platform comprises the following steps:
s1, the source program generates an intermediate representation DAG of the compiler through the compiling processing of the compiler, carries out degradation processing on the DAG, and carries out the following operations on the DAG at a DAG degradation stage:
s2, performing DAG fusion subgraph recognition, and further comprising the following steps:
s21, carrying out topological sequencing on the DAG to obtain a topological sequence, and adding nodes in the DAG into a node fusion optimization work list according to the sequence of the topological sequence;
s22, the compiler sequentially takes out a node of the work list from the first node of the work list generated in the S21, deletes the node from the work list, checks the operation code, the operand value type and the result value type of the node, if the operation code, the operand value type and the structure value type of the node are legal, the node can perform node fusion, and performs S23, otherwise, the compiler continues to perform S22 until the work list is empty, and then goes to S71;
s23, taking the nodes extracted in S22 as root nodes, using a graph matching algorithm according to a DAG sub-graph matching template at the rear end of the compiler to find all n DAG sub-graphs which take the nodes extracted in S22 as the root nodes and can carry out node fusion, and turning to S24;
s24, if the DAG subgraph capable of carrying out node fusion is not found in the S23, turning to S22, otherwise, turning to S31;
the n DAG subgraphs which can be subjected to node fusion and are found in S3 and S23 correspond to n node fusion strategies one by one, the k DAG subgraph which can be subjected to node fusion and is found in S23 is subjected to node fusion according to the k node fusion strategy, wherein k =1,2,3,4,.
S4, fusion strategy cost evaluation, namely calculating the cost spent on operating the instructions in the instruction sequence after converting the fused kth DAG sub-graph generated in the S3 into the instruction sequence according to the data references of all nodes of the fused DAG sub-graph in the S3 and the instruction set information of the heterogeneous platform, wherein the cost comprises the spent clock period number, the spent register number and the occupied memory size, and turning to S51;
s5, adaptively selecting a node fusion strategy, namely adaptively selecting an optimal node fusion strategy according to the k fusion strategy cost obtained by calculation in S4 by combining the use conditions of a register, a cache and a memory at the rear end of the target, namely the node fusion strategy with the best performance improvement effect on the rear end of the target, and turning to S6;
s6, fusing target related nodes, namely transferring the control flow and data flow relation of the DAG subgraph obtained by matching in the S23 to the fused DAG subgraph generated by the node fusion strategy selected in the S5 according to the node fusion strategy selected in the S5, replacing the DAG subgraph before fusion by using the fused DAG subgraph, and transferring to the S22;
and S7, generating an object code, namely compiling the DAG after the degradation is finished by the compiler to generate a heterogeneous platform code.
The further improved scheme in the technical scheme is as follows:
1. in the above solution, the work list is a linear data structure, and includes all nodes to be processed.
2. In the above scheme, different root nodes correspond to different DAG subgraph matching templates, and the DAG subgraph matching template is also a DAG subgraph.
3. In the above scheme, one node in the DAG corresponds to one instruction in the instruction set of the heterogeneous platform.
4. In the above scheme, the DAG subgraph obtained by matching in S23 is the DAG subgraph corresponding to the merged DAG subgraph and before node merging optimization.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
the invention discloses a self-adaptive node fusion compiling optimization method based on a heterogeneous platform, which is characterized in that a self-adaptive node fusion compiling optimization interface and a self-adaptive node fusion compiling optimization algorithm are provided on the heterogeneous platform, the cost of sub-graphs before and after fusion is evaluated by utilizing data flow and control flow information of a DAG graph in a DAG degradation stage and combining instruction set information at the rear end of a target, and an optimal node fusion optimization strategy is selected in a self-adaptive mode according to an evaluation result, so that more efficient program codes are generated, the DAG graph is simplified, the complexity of other optimizations is reduced, more possibilities are provided for other optimizations, meanwhile, accurate guidance is provided for the node fusion optimization of the heterogeneous platform, the potential of composite instructions of the heterogeneous platform can be further excavated, and the performance of the heterogeneous platform is improved.
Drawings
FIG. 1 is a flow chart of a self-adaptive node fusion compiling and optimizing method based on a heterogeneous platform.
Detailed Description
Example (b): a self-adaptive node fusion compiling optimization method based on a heterogeneous platform is based on a large-scale heterogeneous system and comprises the following steps:
s1, the source program generates an intermediate representation DAG of the compiler through the compiling processing of the compiler, the DAG is subjected to degradation processing, and the following operations are carried out on the DAG at a DAG degradation stage:
s2, performing DAG fusion subgraph recognition, and further comprising the following steps:
s21, carrying out topological sequencing on the DAG to obtain a topological sequence, and adding nodes in the DAG into a node fusion optimization work list according to the sequence of the topological sequence;
s22, the compiler sequentially takes out a node of the work list from the first node of the work list generated in the S21, deletes the node from the work list, checks the operation code, the operand value type and the result value type of the node, if the operation code, the operand value type and the structure value type of the node are legal, the node can perform node fusion, and performs S23, otherwise, the compiler continues to perform S22 until the work list is empty, and then goes to S71;
s23, taking the node taken out from S22 as a root node, matching a template according to a DAG subgraph at the rear end of a compiler, wherein the matched template refers to a Pattern template, the Pattern is a data structure of the compiler and is used for template matching, the input of the Pattern template is a DAG subgraph, the output of the Pattern template is also a DAG subgraph, the work done by the Pattern is to convert the input DAG subgraph into the output DAG subgraph, and n DAG subgraphs which can be subjected to node fusion and take all the nodes taken out from S22 as the root node are found by using a graph matching algorithm and then the operation is switched to S24;
s24, if the DAG subgraph capable of carrying out node fusion is not found in the S23, turning to S22, otherwise, turning to S31;
the n DAG subgraphs which can be subjected to node fusion and are found in S3 and S23 correspond to n node fusion strategies one by one, the k DAG subgraph which can be subjected to node fusion and is found in S23 is subjected to node fusion according to the k node fusion strategy, wherein k =1,2,3,4.
S4, evaluating fusion strategy cost, namely calculating the cost spent on running the instructions in the instruction sequence after converting the fused kth DAG sub-graph generated in the S3 into the instruction sequence according to the data reference of all nodes of the fused DAG sub-graph and the instruction set information of the heterogeneous platform in the S3, wherein the cost comprises the spent clock cycle number, the spent register number and the occupied memory size, and turning to S51;
s5, adaptively selecting a node fusion strategy, namely adaptively selecting an optimal node fusion strategy according to the k-th fusion strategy cost obtained by calculation in S4 by combining the use conditions of a register, a cache and a memory at the rear end of the target, namely selecting the node fusion strategy with the best performance improving effect on the rear end of the target, and if the cache has less residual resources, selecting the fusion strategy with lower access and storage costs and turning to S6;
s6, fusing target related nodes, namely transferring the control flow and data flow relation of the DAG subgraph obtained by matching in the S23 to the fused DAG subgraph generated by the node fusion strategy selected in the S5 according to the node fusion strategy selected in the S5, replacing the DAG subgraph before fusion by using the fused DAG subgraph, and transferring to the S22;
and S7, generating an object code, namely compiling the DAG after the degradation is finished by the compiler to generate a heterogeneous platform code.
The worklist is a linear data structure containing all nodes to be processed.
Different root nodes correspond to different DAG subgraph matching templates, which are also one DAG subgraph.
One node in the DAG corresponds to one instruction in the instruction set of the heterogeneous platform.
And the DAG subgraph obtained by matching in S23 is the DAG subgraph corresponding to the merged DAG subgraph and before node merging optimization.
The examples are further explained below:
the specific flow of the invention is shown in fig. 1, in the process of optimizing and degrading the DAG graph by the compiler, traversing the DAG graph from the root node according to the topology sequence, identifying the DAG fusion subgraph by taking each node as the root node, evaluating the cost of various node fusion strategies according to DAG control flow and data flow information and instruction set information of a rear-end feature platform, and adaptively selecting the optimal node fusion optimization strategy according to the cost.
The specific process is as follows:
1) Generating an intermediate representation
a) Compiling the source program by a compiler to generate an intermediate representation DAG of the compiler, and turning to 2 a);
2) DAG fusion subgraph recognition
a) In a DAG degradation stage, carrying out topological sequencing on a DAG to obtain a topological sequence, adding nodes in the DAG into a working list (the working list is a linear data structure and comprises all nodes to be processed) according to the sequence of the topological sequence, and turning to 2 b);
b) Taking out the first node of the work list and deleting the first node from the work list, checking the operation code, the operand value type and the result value type of the node, if the node can be subjected to node fusion, carrying out 2 c), otherwise, continuing to carry out 2 b) until the work list is empty, and turning to 7 a);
c) Taking the node found in 2 b) as a root node, matching templates according to DAG subgraphs at the back end (different root nodes correspond to different templates, and the template is also a DAG subgraph), and finding all n DAG subgraphs which can be subjected to node fusion and take the node found in 2 b) as the root node by using a graph matching algorithm, and turning to 2 d);
d) If 2 c) does not find a DAG subgraph capable of node fusion, then go to 2 b), otherwise go to 3 a);
3) Node fusion strategy n
a) According to the node fusion strategy n, carrying out node fusion (multiple nodes are fused into one node) on the nth DAG subgraph found by 2 c) to generate a DAG subgraph fused (matched to one subgraph through template matching and then replaced to another subgraph), recording all nodes of the fused DAG subgraph, and turning to 4 a);
4) Cost assessment
a) According to data reference of nodes and instruction set information (one node in the DAG corresponds to one instruction in the instruction set) of the heterogeneous platform, evaluating the cost spent on operating the instruction after the fused DAG subgraph subjected to the 3 a) node fusion strategy n is converted into an instruction sequence, and then, turning to 5 a, wherein the cost comprises the spent clock period number, the number of registers, the size of occupied memory and the like;
5) Adaptive selection node fusion strategy
a) N fusion policy costs obtained by calculation according to 4 a), and adaptively selecting an optimal node fusion policy (a node fusion policy with the best performance effect on the target rear end improvement, such as a cache with less residual resources and a fusion policy with less access cost can be selected) by combining the use conditions of a target rear end register, a cache and a memory, and turning to 6 a);
6) Target-dependent node fusion
a) According to the node fusion strategy selected by 5 a), transferring the control flow and data flow relationship of the DAG subgraph (corresponding to the fused DAG subgraph and the node before optimization fusion) obtained by matching 2 c) to the fused DAG subgraph generated by the node fusion strategy selected by 5 a) and replacing the DAG subgraph before fusion by using the fused DAG subgraph to transfer to 2 b);
7) Generating object code
a) After the DAG demotion is completed, the compiler compiles the DAG to generate a heterogeneous platform code.
When the self-adaptive node fusion compiling and optimizing method based on the heterogeneous platform is adopted, a self-adaptive node fusion compiling and optimizing interface and a self-adaptive node fusion compiling and optimizing algorithm are provided on the heterogeneous platform, in a DAG degradation stage, cost evaluation is carried out on sub-images before and after fusion by using data flow and control flow information of a DAG image and combining instruction set information at the rear end of a target, and an optimal node fusion optimizing strategy is selected in a self-adaptive mode according to an evaluation result, so that more efficient program codes are generated, the DAG image is simplified, the complexity of other optimizations is reduced, more possibilities are provided for other optimizations, meanwhile, accurate guidance is provided for node fusion optimizing of the heterogeneous platform, the potential of composite instructions of the heterogeneous platform can be further mined, and the performance of the heterogeneous platform is improved.
To facilitate a better understanding of the invention, the terms used herein will be briefly explained as follows:
DAG (Directed acyclic graph): directed acyclic graph, an intermediate representation in compilation optimization, for degradation and optimization of the intermediate representation.
Topological sorting: a directed acyclic graph G is topologically ordered by arranging all vertices in G into a linear sequence such that any pair of vertices u and v in the graph, if an edge < u, v > belongs to E (G), then u appears before v in the linear sequence.
Topological sequence: the linear sequence obtained by topological sorting of the directed acyclic graph is called a topological sequence.
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (5)

1. A self-adaptive node fusion compiling optimization method based on a heterogeneous platform is characterized by comprising the following steps: the method comprises the following steps:
s1, the source program generates an intermediate representation DAG of the compiler through the compiling processing of the compiler, the DAG is subjected to degradation processing, and the following operations are carried out on the DAG at a DAG degradation stage:
s2, performing DAG fusion subgraph recognition, and further comprising the following steps:
s21, carrying out topological sequencing on the DAG to obtain a topological sequence, and adding nodes in the DAG into a node fusion optimization work list according to the sequence of the topological sequence;
s22, the compiler sequentially takes out a node of the work list from the first node of the work list generated in the S21, deletes the node from the work list, checks the operation code, the operand value type and the result value type of the node, if the operation code, the operand value type and the structure value type of the node are legal, the node performs node fusion, executes S23, otherwise, the compiler continues to perform S22 until the work list is empty, and then turns to S71;
s23, taking the node taken out of the S22 as a root node, finding all n DAG subgraphs which take the node taken out of the S22 as the root node and can carry out node fusion by using a graph matching algorithm according to a DAG subgraph matching template at the rear end of the compiler, and turning to S24;
s24, if the DAG subgraph capable of carrying out node fusion is not found in the S23, turning to S22, otherwise, turning to S31;
the n DAG subgraphs which can be subjected to node fusion and are found in S3 and S23 correspond to n node fusion strategies one by one, the k DAG subgraph which can be subjected to node fusion and is found in S23 is subjected to node fusion according to the k node fusion strategy, wherein k =1,2,3,4,.
S4, evaluating fusion strategy cost, namely calculating the cost spent on running the instructions in the instruction sequence after converting the fused kth DAG sub-graph generated in the S3 into the instruction sequence according to the data reference of all nodes of the fused DAG sub-graph and the instruction set information of the heterogeneous platform in the S3, wherein the cost comprises the spent clock cycle number, the spent register number and the occupied memory size, and turning to S51;
s5, adaptively selecting a node fusion strategy, namely adaptively selecting an optimal node fusion strategy according to the k fusion strategy cost obtained by calculation in S4 by combining the use conditions of a register, a cache and a memory at the rear end of the target, namely the node fusion strategy with the best performance improvement effect on the rear end of the target, and turning to S6;
s6, fusing target related nodes, namely transferring the control flow and data flow relation of the DAG subgraph obtained by matching in the S23 to the fused DAG subgraph generated by the node fusion strategy selected in the S5 according to the node fusion strategy selected in the S5, replacing the DAG subgraph before fusion by using the fused DAG subgraph, and transferring to the S22;
and S7, generating an object code, namely compiling the DAG after the degradation is finished by the compiler to generate a heterogeneous platform code.
2. The adaptive node fusion compilation optimization method based on the heterogeneous platform as claimed in claim 1, wherein: the worklist is a linear data structure containing all nodes to be processed.
3. The adaptive node fusion compilation optimization method based on the heterogeneous platform according to claim 1, characterized in that: different root nodes correspond to different DAG subgraph matching templates, which are also one DAG subgraph.
4. The adaptive node fusion compilation optimization method based on the heterogeneous platform according to claim 1, characterized in that: a node in the DAG corresponds to an instruction in the instruction set of the heterogeneous platform.
5. The adaptive node fusion compilation optimization method based on the heterogeneous platform as claimed in claim 1, wherein: and the DAG subgraph obtained by matching in S23 is the DAG subgraph corresponding to the merged DAG subgraph and before node merging optimization.
CN201910885756.1A 2019-09-19 2019-09-19 Self-adaptive node fusion compiling optimization method based on heterogeneous platform Active CN112527304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910885756.1A CN112527304B (en) 2019-09-19 2019-09-19 Self-adaptive node fusion compiling optimization method based on heterogeneous platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910885756.1A CN112527304B (en) 2019-09-19 2019-09-19 Self-adaptive node fusion compiling optimization method based on heterogeneous platform

Publications (2)

Publication Number Publication Date
CN112527304A CN112527304A (en) 2021-03-19
CN112527304B true CN112527304B (en) 2022-10-04

Family

ID=74974025

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910885756.1A Active CN112527304B (en) 2019-09-19 2019-09-19 Self-adaptive node fusion compiling optimization method based on heterogeneous platform

Country Status (1)

Country Link
CN (1) CN112527304B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240231910A9 (en) * 2022-10-19 2024-07-11 Mediatek Inc. Optimization of Scratchpad Memory Allocation for Heterogeneous Devices Using A Cooperative Compiler Framework
CN116302114B (en) * 2023-02-24 2024-01-23 进迭时空(珠海)科技有限公司 Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150089484A1 (en) * 2013-09-24 2015-03-26 Qualcomm Incorporated Fast, Combined Forwards-Backwards Pass Global Optimization Framework for Dynamic Compilers
CN109933327A (en) * 2019-02-02 2019-06-25 中国科学院计算技术研究所 OpenCL compiler method and system based on code fusion compiler framework

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150089484A1 (en) * 2013-09-24 2015-03-26 Qualcomm Incorporated Fast, Combined Forwards-Backwards Pass Global Optimization Framework for Dynamic Compilers
CN109933327A (en) * 2019-02-02 2019-06-25 中国科学院计算技术研究所 OpenCL compiler method and system based on code fusion compiler framework

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《A novel scheme for Compiler Optimization Framework》;N.A.B. Sankar Chebolu 等;《2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)》;20151231;全文 *
《面向DSP的零开销循环编译优化》;项利萍 等;《电脑知识与技术》;20150430;全文 *

Also Published As

Publication number Publication date
CN112527304A (en) 2021-03-19

Similar Documents

Publication Publication Date Title
CN110187885B (en) Intermediate code generation method and device for quantum program compiling
CN103858099B (en) The method and system applied for execution, the circuit with machine instruction
US8645935B2 (en) Automatic parallelization using binary rewriting
CN110633248A (en) Sequence optimization in a high performance computing environment
JP2007528059A (en) Systems and methods for software modeling, abstraction, and analysis
WO2017205118A1 (en) Sample driven profile guided optimization with precise correlation
WO2021000971A1 (en) Method and device for generating operation data and related product
CN108197027B (en) Software performance optimization method, storable medium, computer program
WO2021258692A1 (en) Multi-chip compatible compiling method and device
KR102013582B1 (en) Apparatus and method for detecting error and determining corresponding position in source code of mixed mode application program source code thereof
US9256437B2 (en) Code generation method, and information processing apparatus
JP2015207318A (en) Method and system for parallelization of sequential computer program codes
CN112527304B (en) Self-adaptive node fusion compiling optimization method based on heterogeneous platform
US20100250564A1 (en) Translating a comprehension into code for execution on a single instruction, multiple data (simd) execution
WO2024065867A1 (en) Memory optimization method and apparatus used for neural network compilation
CN115809063A (en) Storage process compiling method, system, electronic equipment and storage medium
US8117604B2 (en) Architecture cloning for power PC processors
CN104750533B (en) C program Compilation Method and compiler
CN103942082A (en) Complier optimization method for eliminating redundant storage access operations
US20170269931A1 (en) Method and Computing System for Handling Instruction Execution Using Affine Register File on Graphic Processing Unit
CN105447285A (en) Method for improving OpenCL hardware execution efficiency
CN116560666B (en) AI front end unified computing method, device and medium based on multi-level code generation
CN112558977B (en) Polyhedron optimization method oriented to heterogeneous many-core rear end based cost model
Sbirlea et al. Dfgr an intermediate graph representation for macro-dataflow programs
CN112416313B (en) Compiling method supporting large integer data type and operator

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant