CN105700933A - Parallelization and loop optimization method and system for a high-level language of reconfigurable processor - Google Patents

Parallelization and loop optimization method and system for a high-level language of reconfigurable processor Download PDF

Info

Publication number
CN105700933A
CN105700933A CN201610018726.7A CN201610018726A CN105700933A CN 105700933 A CN105700933 A CN 105700933A CN 201610018726 A CN201610018726 A CN 201610018726A CN 105700933 A CN105700933 A CN 105700933A
Authority
CN
China
Prior art keywords
parallelization
language
kernel function
function part
polyhedral model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610018726.7A
Other languages
Chinese (zh)
Inventor
田***
绳伟光
何卫锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201610018726.7A priority Critical patent/CN105700933A/en
Publication of CN105700933A publication Critical patent/CN105700933A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/451Code distribution
    • G06F8/452Loops
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/456Parallelism detection

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The present invention provides a parallelization and loop optimization method and system for a high-level language of a reconfigurable processor, and proposes a set of end-to-end language conversion system for a generic reconfigurable processor. As for the reconfigurable processor, calculation of a core loop in an intensive application requires parallelization calculation on a reconfigurable part, a C language cannot satisfy parallel characteristics of the reconfigurable processor, therefore, a serial part and a parallel part in the application require to be packaged separately and to be optimized according to system characteristics, and finally a new set of language is generated; when determining input and output data types and lengths of a kernel function, a compilation decls.h method is used, so that complexity of the system is simplified, and applicability of the system is greatly improved; and in a loop optimization process, a polyhedral model is used for processing, so that applicability of the system is more extensive, and transplantation of the system on different architectures is easier.

Description

The parallelization of the high-level language of reconfigurable processor and loop optimization method and system
Technical field
The present invention relates to the realization of the method and system of the automatically parallelizing of high-level language based on general reconfigurable processor and loop optimization。
Background technology
Along with operand and the operator of calculating on a single chip get more and more, the processor architecture with multinuclear of parallelization has had become as main flow。In conventional processor computation schema, it is generally divided into two categories below。Traditional general-purpose computations based on von Neumann processor has extremely strong motility, but executive mode, limited arithmetic element and memory bandwidth that its instruction stream drives make its overall performance and power consumption unsatisfactory。Dedicated computing can for specific optimizing application structure and circuit, it is not necessary to instruction set, and it is fast that it performs speed, low in energy consumption。But it is very poor that special-purpose computing system also exists fatal defect, motility and autgmentability, the more complicated application constantly developed is tended not to by simply having extended。Reconfigurable Computation is exactly a kind of calculation motility of software and the high efficiency of hardware combined occurred under this background。
For a reconfigurable processor, it includes a general processor and several reconfigurable computing units。General processor is used for controlling the calculating process of Reconfigurable Computation unit, Parallel Task Scheduling and execution serial task。Reconfigurable Computation unit is responsible for being reconstructed according to configuration information and calculating。Thus, reconstruction structure is adapted to the calculating of different algorithms and application。Although there now have been a variety of reconfigurable computing architecture, but original C language can not meet the needs of parallel computation, and how more efficient programming inspires the high efficiency of PEA as much as possible and had become as an apparent difficult problem and challenge。
For a kind of reconstruction structure, it is necessary to realize a task compiler matched with it and a parallelization extend after C language version。CUDA allows developer to utilize graphics calculations unit (GPU) to carry out the programming of special-purpose。But its programming process is very loaded down with trivial details, it is necessary to programmer oneself goes to carry out Memory Allocation and loop optimization。OpenMP supports the multinuclear of C, C++ and Fortran language is programmed。Although it with the addition of the labelling of a lot of parallelization in a program, but it is merely capable of carrying out optimization and the calculating of multithreading。Above both parallel languages all can not carry out task scheduling on reconstruction structure。
Summary of the invention
It is an object of the invention to provide the parallelization of the high-level language of a kind of reconfigurable processor and loop optimization method and system, C language can transform into GR-C language automatically and realization is directed to the loop optimization of concrete reconstruction structure, so that the performance boost of reconstruction structure。
For solving the problems referred to above, the present invention provides parallelization and the loop optimization method of the high-level language of a kind of reconfigurable processor, including:
Obtain decls.h file, inputted according to described decls.h file and export intermediate file, obtain task function and the parameter information of pea function from described input and output intermediate file;
From C code, extract kernel function part, utilize polyhedral model that the kernel function part proposed is optimized, to generate the GR-C language of kernel function part;
According to described parameter information, the GR-C language of described kernel function part is write back to the kernel function part of C code, to generate final GR-C code。
Further, in the above-mentioned methods, from C code, extract kernel function part, utilize polyhedral model that the kernel function part proposed is optimized, to generate the GR-C language of kernel function part, including:
The static dependencies analysis of input C code, is converted into polyhedral model by the static dependencies analysis of described C code;
Described polyhedral model is optimized, to obtain the polyhedral model of parallelization;
Polyhedral model according to described parallelization generates the GR-C language of kernel function part。
Further, in the above-mentioned methods, the GR-C language of kernel function part is generated according to the polyhedral model of described parallelization, including:
Use CLooG instrument that the polyhedral model of described parallelization is generated the GR-C language of kernel function part。
Further, in the above-mentioned methods, the static dependencies analysis inputting C code includes:
LooPo framework is used to carry out code scans and dependency analysis。
Further, in the above-mentioned methods, described polyhedral model is optimized, to obtain in the step of the polyhedral model of parallelization,
Rewrite based on PLUTO model, obtained the framework of affine transformation;
Use PipLib as ILP computer;
According to described framework and ILP computer, described polyhedral model is optimized, to obtain the polyhedral model of parallelization。
Further, in the above-mentioned methods, in the step be optimized described polyhedral model, adopt following loop optimization order that the kernel function part proposed is optimized:
First, all states are separated, and be grouped according to dependence and loop boundary;
Then, it is circulated fusion to each group, by suitable state fusion to together;
Then, the execution cycle needed for the operand often organized determines the parameter of loop unrolling;
Finally, it is circulated expansion by calculated parameter, obtains suitable length of the cycle and representation。
Another side according to the present invention, it is provided that the parallelization of the high-level language of a kind of reconfigurable processor and loop optimization system, including:
Acquisition module, is used for obtaining decls.h file, is inputted according to described decls.h file and exports intermediate file, obtains task function and the parameter information of pea function from described input and output intermediate file;
Optimize module, for extracting kernel function part from C code, utilize polyhedral model that the kernel function part proposed is optimized, to generate the GR-C language of kernel function part;
Generation module, for writing back to the kernel function part of C code, to generate final GR-C code according to described parameter information by the GR-C language of described kernel function part。
Further, in said system, described optimization module, including:
Conversion unit, for inputting the static dependencies analysis of C code, is converted into polyhedral model by the static dependencies analysis of described C code;
Optimize unit, for described polyhedral model is optimized, to obtain the polyhedral model of parallelization;
Generate unit, generate the GR-C language of kernel function part for the polyhedral model according to described parallelization。
Further, in said system, described generation unit, for using CLooG instrument that the polyhedral model of described parallelization is generated the GR-C language of kernel function part。
Further, in said system, described conversion unit, it is used for using LooPo framework to carry out code scans and dependency analysis。
Further, in said system, described optimization unit, for having rewritten based on PLUTO model, obtain the framework of affine transformation;Use PipLib as ILP computer;According to described framework and ILP computer, described polyhedral model is optimized, to obtain the polyhedral model of parallelization。
Further, in said system, described optimization unit adopts following loop optimization order that the kernel function part proposed is optimized:
First, all states are separated, and be grouped according to dependence and loop boundary;
Then, it is circulated fusion to each group, by suitable state fusion to together;
Then, the execution cycle needed for the operand often organized determines the parameter of loop unrolling;
Finally, it is circulated expansion by calculated parameter, obtains suitable length of the cycle and representation。
Compared with prior art, the present invention is directed to general reconfigurable processor and propose a set of end-to-end language conversion system, for reconfigurable processor, core loop in compute-intensive applications requires over restructural part and carries out parallel computation, and this allows for C language and can not meet his parallel characteristics, utilize the system in the present invention, so needing to encapsulate the serial section in application program and parallel section respectively, and it is optimized according to system performance, ultimately generating a set of novel language, the suitability is very wide。It addition, when determining the data type of input and output of kernel function and length, have employed the method allowing programmer write decls.h, which greatly simplifies the complexity of system, and the suitability of system is greatly improved。In addition, in the process being circulated optimization, present invention utilizes polyhedral model to process, this makes system suitability more extensive too, whatever reconstruction structure, have only to change the order of the round-robin method of loop optimization part, it is possible to obtaining general solution, system transplanting on a different architecture is simpler。
Accompanying drawing explanation
Fig. 1 is the flow chart of the parallelization of the high-level language of the reconfigurable processor of one embodiment of the invention and loop optimization method;
Fig. 2 is the flow chart of the parallelization of the high-level language of the reconfigurable processor of one embodiment of the present invention and loop optimization method;
Fig. 3 is the original C code kernel of one embodiment of the invention schematic diagram represented;
Fig. 4 be one embodiment of the invention GR-C code function statement and definition schematic diagram;
Fig. 5 is the schematic diagram of the GR-C code main function kernel part of one embodiment of the invention。
Detailed description of the invention
Understandable for enabling the above-mentioned purpose of the present invention, feature and advantage to become apparent from, below in conjunction with the drawings and specific embodiments, the present invention is further detailed explanation。
Embodiment one
As it is shown in figure 1, the present invention provides parallelization and the loop optimization method of the high-level language of a kind of reconfigurable processor, including:
Step S1, obtains decls.h file, is inputted according to described decls.h file and export intermediate file, obtains task function and the parameter information of pea function from described input and output intermediate file;Concrete, in the GR-C application write, task function (_ _ GR-C_taskvoidkernel_name ()) is utilized to represent kernel function, and core loop code therein utilizes pea function (_ _ GR-C_peavoidkernel_name_PEA ()) to be indicated, Task function is used for producing the executable code of ARM7, is responsible for carrying data and scheduling PEA;Pea function is used for producing the PEA reconfigurable configuration information needed, and in task function, includes the statement and definition that need to use the variable arrived;Main memory and PEA memorizer need to move into the data taken out of;Also has data address in main memory and PEA memorizer, in task function, use _ _ gr_MemcpyStoG () and _ _ gr_MemcpyGtoS () carries out the realization of data carrying, available #pragmascop and the #pragmaendscop of programmer marks kernel position, a given decls.h file simultaneously, described decls.h file includes the name variable and the data length that need the data carrying out between main memory and PEA memorizer to copy。It is below the form of decls.h file:
#pragmaarray_decls_in/outkernel_number
typevariable_name[length]memory_length
typevariable_name[length][length]memory_length
#progmaend_array_decls_in/outkernel_number
After learning data length, system can in PEA memorizer storage allocation address, generally can start distribution and reserved general address to calculation result data from the first address of internal memory;
Step S2, extracts kernel function part from C code, utilizes polyhedral model that the kernel function part proposed is optimized, to generate the GR-C language of kernel function part;Detailed, application for computation-intensive, program always has some focuses and occupies most of program runtime, especially those nested cyclic parts, these hotspot's definitions are become kernel function by us, reconstruction structure carries out the calculating of compute-intensive applications, it is exactly that kernel function part is carried out parallel computation on reconfigurable arrays, non-kenel function part is calculated on main core processing device, so the core content of the present invention is exactly say that the function wrapping of kernel function part becomes some can be compiled the form of device identification
Step S3, writes back to the kernel function part of C code, to generate final GR-C code according to described parameter information by the GR-C language of described kernel function part。The present embodiment can be very clear and definite the program that C programmer is converted to GR-C language, the same characteristic for other parallel language can reach same effect。Function for complicated multiple kernel, it is possible to the parameter increasing-N repeatedly processes。A set of end-to-end language conversion system is proposed for general reconfigurable processor, for reconfigurable processor, core loop in compute-intensive applications requires over restructural part and carries out parallel computation, and this allows for C language and can not meet his parallel characteristics, utilizing the system in the present invention, so needing to encapsulate the serial section in application program and parallel section respectively, and being optimized according to system performance, ultimately generating a set of novel language, the suitability is very wide。It addition, when determining the data type of input and output of kernel function and length, have employed the method allowing programmer write decls.h, which greatly simplifies the complexity of system, and the suitability of system is greatly improved。In addition, in the process being circulated optimization, present invention utilizes polyhedral model to process, this makes system suitability more extensive too, whatever reconstruction structure, have only to change the order of the round-robin method of loop optimization part, it is possible to obtaining general solution, system transplanting on a different architecture is simpler。
Preferably, as in figure 2 it is shown, step S2, from C code, extract kernel function part, utilize polyhedral model that the kernel function part proposed is optimized, to generate the GR-C language of kernel function part, including:
Step S21, the static dependencies analysis of input C code, the static dependencies analysis of described C code is converted into polyhedral model;
Step S22, is optimized described polyhedral model, to obtain the polyhedral model of parallelization;
Step S23, generates the GR-C language of kernel function part according to the polyhedral model of described parallelization。
Preferably, step S23, the GR-C language of kernel function part is generated according to the polyhedral model of described parallelization, including:
Use CLooG instrument that the polyhedral model of described parallelization is generated the GR-C language of kernel function part。At this, use CLooG as the basis of Code Generator。
Preferably, the static dependencies analysis inputting C code includes:
LooPo framework is used to carry out code scans and dependency analysis。LooPo is the source compiler to the polyhedral model in source, and the analysis of it is built-in multiple polyhedral model, owing to the first step of the present invention is to convert C code to polyhedral model, it is very perfect for using this instrument undoubtedly。
Preferably, step S22, described polyhedral model is optimized, to obtain in the step of the polyhedral model of parallelization,
Rewrite based on PLUTO model, obtained the framework of affine transformation;
Use PipLib as ILP computer;
According to described framework and ILP computer, described polyhedral model is optimized, to obtain the polyhedral model of parallelization。Concrete, PLUTO is an automatically parallelizing for multiple nucleus system and local optimization tool, and its core conversion is to carry out affine transformation by circulation splicing and fusion, and the present invention has rewritten his optimization process to meet the computing demand of our reconfigurable system。
Preferably, in the step be optimized described polyhedral model, adopt following loop optimization order that the kernel function part proposed is optimized:
First, all states are separated, and be grouped according to dependence and loop boundary;
Then, it is circulated fusion to each group, by suitable state fusion to together;
Then, the execution cycle needed for the operand often organized determines the parameter of loop unrolling;
Finally, it is circulated expansion by calculated parameter, obtains suitable length of the cycle and representation。Concrete, at this, for a pea function, wherein there are one or several SCOP (static cost control part), namely do not include the longest cyclic sequence of while, simultaneously the loop boundary of all of which and control variable are all the affine transformations of cyclic variable, in polyhedral model, these statement lists are shown as a series of state, each state has loop boundary and the operating process of himself, need to analyze the dependence of each state, and search out the execution sequence of its optimum。
Below for certain program that can be parallel, the present invention is illustrated。
Fig. 3 be certain can in concurrent program can cardiopulmonary bypass in beating heart part, be identified by #pragmascop and #pragmaendscop。And given the decls.h file shown in Fig. 4。
First with GR-C_pre to decls.h process, it is decomposed into two files of .array_decls_in and .array_decls_out, and utilize GR-C_pre_help to carry out pretreatment, extract the title and length that need the data being operated and be respectively written in two intermediate files of Arrays_in.cfg and Arrays_out.cfg;
Then the parsing of C code is carried out for the kernel number in input parameter, #pragmascop and #pragmaendscop is identified that kernel code section extracts by the order according to kernel, it is analyzed after the data structure that content after extraction is converted into polyhedral model and optimizes, for the example that we provide, systematic analysis goes out every statement in circulation and is respectively provided with the dependence of context, the optimization of order can not be carried out, so not carrying out too deep optimization, write as the easy-to-handle circulation form of compiler, and by its generating code;
Then the generation of final GR-C code is carried out, the parameter information of GR-C_task and GR-C_kernel function is extracted by two intermediate files of Arrays_in.cfg and the Arrays_out.cfg produced before, and utilize inscop_GR-C instrument to be written to by the code generated by kernel part to extract #pragmascop and #pragmaendscop and be identified in the original C code of kernel code section, thus generating the final GR-C code in Fig. 4, Fig. 5 is the position that in main function, original #pragmascop and #pragmaendscop identifies place。
In order to verify that reconstruction structure utilizes the performance of this set automated conversion system, use EEMBC test case to test, table 1 illustrate C code at ATOM230 platform and our GR-C that automatically generates the performance on GReP platform。For given test case that can be parallel, the performance of the present invention improves general about 10 times。Equally, the GR-C that the present invention is automatically generated and the GR-C of hand-coding has carried out Experimental comparison, it is analyzed with the encoding procedure of convolutional code: for the GR-C code of hand-coding, reconstruction structure has 286 cycles reading configuration information, 673 cycles read data, 2041 cycles write data, and 4016 cycles carry out data calculating, and wherein on ALU, calculating employs 512 cycles;For the GR-C code that the present invention automatically generates, 16 PE be divide into two groups by system, often organize one group of two data of every four period treatment, so the calculating time on ALU takes 1024 cycles, are the twices of manual configuration。We utilize other algorithm to be also carried out calculating, it has been found that on average have the gap of 2~3 times。
Table 1GReP and Atom230 Performance comparision
The present invention is that general reconfigurable processor proposes a kind of parallelization extension language GR-C based on C language, and proposing and a set of C language is automatically converted to the automatically parallelizing of GR-C language and the method and system of loop optimization, the present invention is a kind of source based on polyhedral model automatic crossover tool to source。In existing achievement, static dependencies analysis has had a lot of feasible method, but automatically parallelizing and automatically generating of language there is also a lot of problems, present invention is generally directed to certain general reconfigurable processor propose a set of automatic parallel system and solve this problem, C language can be transformed into GR-C language by system automatically that utilize us and realization is directed to the loop optimization of concrete reconstruction structure, so that the performance boost of reconstruction structure。
Embodiment two
The present invention also provides for parallelization and the loop optimization system of the high-level language of another kind of reconfigurable processor, including:
Acquisition module, is used for obtaining decls.h file, is inputted according to described decls.h file and exports intermediate file, obtains task function and the parameter information of pea function from described input and output intermediate file;
Optimize module, for extracting kernel function part from C code, utilize polyhedral model that the kernel function part proposed is optimized, to generate the GR-C language of kernel function part;
Generation module, for writing back to the kernel function part of C code, to generate final GR-C code according to described parameter information by the GR-C language of described kernel function part。
Preferably, described optimization module, including:
Conversion unit, for inputting the static dependencies analysis of C code, is converted into polyhedral model by the static dependencies analysis of described C code;
Optimize unit, for described polyhedral model is optimized, to obtain the polyhedral model of parallelization;
Generate unit, generate the GR-C language of kernel function part for the polyhedral model according to described parallelization。
Preferably, described generation unit, for using CLooG instrument that the polyhedral model of described parallelization is generated the GR-C language of kernel function part。
Preferably, described conversion unit, it is used for using LooPo framework to carry out code scans and dependency analysis。
Preferably, described optimization unit, for having rewritten based on PLUTO model, obtain the framework of affine transformation;Use PipLib as ILP computer;According to described framework and ILP computer, described polyhedral model is optimized, to obtain the polyhedral model of parallelization。
Preferably, described optimization unit adopts following loop optimization order that the kernel function part proposed is optimized:
First, all states are separated, and be grouped according to dependence and loop boundary;
Then, it is circulated fusion to each group, by suitable state fusion to together;
Then, the execution cycle needed for the operand often organized determines the parameter of loop unrolling;
Finally, it is circulated expansion by calculated parameter, obtains suitable length of the cycle and representation。
Other detailed content of embodiment two, specifically referring to the corresponding part of embodiment one, can not repeat them here。
In sum, the present invention is directed to general reconfigurable processor and propose a set of end-to-end language conversion system, for reconfigurable processor, core loop in compute-intensive applications requires over restructural part and carries out parallel computation, and this allows for C language and can not meet his parallel characteristics, utilize the system in the present invention, so needing to encapsulate the serial section in application program and parallel section respectively, and it is optimized according to system performance, ultimately generating a set of novel language, the suitability is very wide。It addition, when determining the data type of input and output of kernel function and length, have employed the method allowing programmer write decls.h, which greatly simplifies the complexity of system, and the suitability of system is greatly improved。In addition, in the process being circulated optimization, present invention utilizes polyhedral model to process, this makes system suitability more extensive too, whatever reconstruction structure, have only to change the order of the round-robin method of loop optimization part, it is possible to obtaining general solution, system transplanting on a different architecture is simpler。
In this specification, each embodiment adopts the mode gone forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, between each embodiment identical similar portion mutually referring to。
Professional further appreciates that, the unit of each example described in conjunction with the embodiments described herein and algorithm steps, can with electronic hardware, computer software or the two be implemented in combination in, in order to clearly demonstrate the interchangeability of hardware and software, generally describe composition and the step of each example in the above description according to function。These functions perform with hardware or software mode actually, depend on application-specific and the design constraint of technical scheme。Professional and technical personnel specifically can should be used for using different methods to realize described function to each, but this realization is it is not considered that beyond the scope of this invention。
Obviously, invention can be carried out various change and modification without deviating from the spirit and scope of the present invention by those skilled in the art。So, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention is also intended to include these change and modification。

Claims (12)

1. the parallelization of the high-level language of a reconfigurable processor and loop optimization method, it is characterised in that including:
Obtain decls.h file, inputted according to described decls.h file and export intermediate file, obtain task function and the parameter information of pea function from described input and output intermediate file;
From C code, extract kernel function part, utilize polyhedral model that the kernel function part proposed is optimized, to generate the GR-C language of kernel function part;
According to described parameter information, the GR-C language of described kernel function part is write back to the kernel function part of C code, to generate final GR-C code。
2. the parallelization of the high-level language of reconfigurable processor as claimed in claim 1 and loop optimization method, it is characterized in that, kernel function part is extracted from C code, utilize polyhedral model that the kernel function part proposed is optimized, to generate the GR-C language of kernel function part, including:
The static dependencies analysis of input C code, is converted into polyhedral model by the static dependencies analysis of described C code;
Described polyhedral model is optimized, to obtain the polyhedral model of parallelization;
Polyhedral model according to described parallelization generates the GR-C language of kernel function part。
3. the parallelization of the high-level language of reconfigurable processor as claimed in claim 2 and loop optimization method, it is characterised in that generate the GR-C language of kernel function part according to the polyhedral model of described parallelization, including:
Use CLooG instrument that the polyhedral model of described parallelization is generated the GR-C language of kernel function part。
4. the parallelization of the high-level language of reconfigurable processor as claimed in claim 2 and loop optimization method, it is characterised in that the static dependencies analysis of input C code includes:
LooPo framework is used to carry out code scans and dependency analysis。
5. the parallelization of the high-level language of reconfigurable processor as claimed in claim 2 and loop optimization method, it is characterised in that described polyhedral model is optimized, to obtain in the step of the polyhedral model of parallelization,
Rewrite based on PLUTO model, obtained the framework of affine transformation;
Use PipLib as ILP computer;
According to described framework and ILP computer, described polyhedral model is optimized, to obtain the polyhedral model of parallelization。
6. the parallelization of the high-level language of the reconfigurable processor as described in any one of claim 2 to 5 and loop optimization method, it is characterized in that, in the step be optimized described polyhedral model, adopt following loop optimization order that the kernel function part proposed is optimized:
First, all states are separated, and be grouped according to dependence and loop boundary;
Then, it is circulated fusion to each group, by suitable state fusion to together;
Then, the execution cycle needed for the operand often organized determines the parameter of loop unrolling;
Finally, it is circulated expansion by calculated parameter, obtains suitable length of the cycle and representation。
7. the parallelization of the high-level language of a reconfigurable processor and loop optimization system, it is characterised in that including:
Acquisition module, is used for obtaining decls.h file, is inputted according to described decls.h file and exports intermediate file, obtains task function and the parameter information of pea function from described input and output intermediate file;
Optimize module, for extracting kernel function part from C code, utilize polyhedral model that the kernel function part proposed is optimized, to generate the GR-C language of kernel function part;
Generation module, for writing back to the kernel function part of C code, to generate final GR-C code according to described parameter information by the GR-C language of described kernel function part。
8. the parallelization of the high-level language of reconfigurable processor as claimed in claim 7 and loop optimization system, it is characterised in that described optimization module, including:
Conversion unit, for inputting the static dependencies analysis of C code, is converted into polyhedral model by the static dependencies analysis of described C code;
Optimize unit, for described polyhedral model is optimized, to obtain the polyhedral model of parallelization;
Generate unit, generate the GR-C language of kernel function part for the polyhedral model according to described parallelization。
9. the parallelization of the high-level language of reconfigurable processor as claimed in claim 7 and loop optimization system, it is characterised in that described generation unit, for using CLooG instrument that the polyhedral model of described parallelization is generated the GR-C language of kernel function part。
10. the parallelization of the high-level language of reconfigurable processor as claimed in claim 7 and loop optimization system, it is characterised in that described conversion unit, is used for using LooPo framework to carry out code scans and dependency analysis。
11. the parallelization of the high-level language of reconfigurable processor as claimed in claim 7 and loop optimization system, it is characterised in that described optimization unit, for having rewritten based on PLUTO model, obtain the framework of affine transformation;Use PipLib as ILP computer;According to described framework and ILP computer, described polyhedral model is optimized, to obtain the polyhedral model of parallelization。
12. the parallelization of the high-level language of the reconfigurable processor as described in any one of claim 7 to 11 and loop optimization system, it is characterised in that described optimization unit adopts following loop optimization order that the kernel function part proposed is optimized:
First, all states are separated, and be grouped according to dependence and loop boundary;
Then, it is circulated fusion to each group, by suitable state fusion to together;
Then, the execution cycle needed for the operand often organized determines the parameter of loop unrolling;
Finally, it is circulated expansion by calculated parameter, obtains suitable length of the cycle and representation。
CN201610018726.7A 2016-01-12 2016-01-12 Parallelization and loop optimization method and system for a high-level language of reconfigurable processor Pending CN105700933A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610018726.7A CN105700933A (en) 2016-01-12 2016-01-12 Parallelization and loop optimization method and system for a high-level language of reconfigurable processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610018726.7A CN105700933A (en) 2016-01-12 2016-01-12 Parallelization and loop optimization method and system for a high-level language of reconfigurable processor

Publications (1)

Publication Number Publication Date
CN105700933A true CN105700933A (en) 2016-06-22

Family

ID=56226246

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610018726.7A Pending CN105700933A (en) 2016-01-12 2016-01-12 Parallelization and loop optimization method and system for a high-level language of reconfigurable processor

Country Status (1)

Country Link
CN (1) CN105700933A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106445666A (en) * 2016-09-26 2017-02-22 西安交通大学 Parallel optimization method of DOACROSS cycle
CN109597622A (en) * 2018-11-02 2019-04-09 广东工业大学 A kind of concurrency optimization method based on MIC architecture processor
CN112559033A (en) * 2020-12-25 2021-03-26 山东高云半导体科技有限公司 Code processing method, device, storage medium and processor
CN112631610A (en) * 2020-11-30 2021-04-09 上海交通大学 Method for eliminating memory access conflict for data reuse of coarse-grained reconfigurable structure

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104375805A (en) * 2014-11-17 2015-02-25 天津大学 Method for simulating parallel computation process of reconfigurable processor through multi-core processor
US20150127933A1 (en) * 2013-11-01 2015-05-07 Samsung Electronics Co., Ltd. Reconfigurable processor and method for optimizing configuration memory
CN104615474A (en) * 2014-09-02 2015-05-13 清华大学 Compiler optimization method for coarse-grained reconfigurable processor

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150127933A1 (en) * 2013-11-01 2015-05-07 Samsung Electronics Co., Ltd. Reconfigurable processor and method for optimizing configuration memory
CN104615474A (en) * 2014-09-02 2015-05-13 清华大学 Compiler optimization method for coarse-grained reconfigurable processor
CN104375805A (en) * 2014-11-17 2015-02-25 天津大学 Method for simulating parallel computation process of reconfigurable processor through multi-core processor

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FENGSHUO TIAN等: "An Automatic Translation and Parallelization System for General Purpose Reconfigurable Processor", 《ASIC (ASICON), 2015 IEEE 11TH INTERNATIONAL CONFERENCE ON》 *
楼杰超等: "异构粗粒度可重构处理器的自动任务编译器框架设计", 《微电子学与计算机》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106445666A (en) * 2016-09-26 2017-02-22 西安交通大学 Parallel optimization method of DOACROSS cycle
CN106445666B (en) * 2016-09-26 2019-10-11 西安交通大学 A kind of parallel optimization method of DOACROSS circulation
CN109597622A (en) * 2018-11-02 2019-04-09 广东工业大学 A kind of concurrency optimization method based on MIC architecture processor
CN112631610A (en) * 2020-11-30 2021-04-09 上海交通大学 Method for eliminating memory access conflict for data reuse of coarse-grained reconfigurable structure
CN112631610B (en) * 2020-11-30 2022-04-26 上海交通大学 Method for eliminating memory access conflict for data reuse of coarse-grained reconfigurable structure
CN112559033A (en) * 2020-12-25 2021-03-26 山东高云半导体科技有限公司 Code processing method, device, storage medium and processor

Similar Documents

Publication Publication Date Title
Zheng et al. Flextensor: An automatic schedule exploration and optimization framework for tensor computation on heterogeneous system
JP4931978B2 (en) Parallelization processing method, system, and program
AU2013290313B2 (en) Method and system for automated improvement of parallelism in program compilation
Gu et al. Exploiting statically schedulable regions in dataflow programs
Yu et al. S2FA: An accelerator automation framework for heterogeneous computing in datacenters
Zhao et al. Optimizing the memory hierarchy by compositing automatic transformations on computations and data
Rauchwerger Run-time parallelization: Its time has come
CN105700933A (en) Parallelization and loop optimization method and system for a high-level language of reconfigurable processor
Doroshenko et al. Developing and optimizing parallel programs with algebra-algorithmic and term rewriting tools
Wahib et al. Automated GPU kernel transformations in large-scale production stencil applications
Huang et al. Alcop: Automatic load-compute pipelining in deep learning compiler for ai-gpus
Dias et al. SparseLNR: accelerating sparse tensor computations using loop nest restructuring
Andon et al. Programming high-performance parallel computations: formal models and graphics processing units
Stavåker Contributions to parallel simulation of equation-based models on graphics processing units
JP7407192B2 (en) Method and apparatus for optimizing code for field programmable gate arrays
Kobeissi et al. Rec2poly: Converting recursions to polyhedral optimized loops using an inspector-executor strategy
Agostini et al. The SODA approach: leveraging high-level synthesis for hardware/software co-design and hardware specialization
Custers Algorithmic species: Classifying program code for parallel computing
Basthikodi et al. Classifying a program code for parallel computing against hpcc
Moskewicz et al. A metaprogramming and autotuning framework for deploying deep learning applications
Doroshenko et al. Automated design of parallel programs for heterogeneous platforms using algebra-algorithmic tools
Pogorilyy et al. A conception for creating a system of parametric design of parallel algorithms and their software implementations
Chavarría-Miranda et al. Global transformations for legacy parallel applications via structural analysis and rewriting
Kalyur et al. CALIPER: a coarse grain parallel performance estimator and predictor
Nobre et al. Beyond Polyhedral Analysis of OpenStream Programs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160622

WD01 Invention patent application deemed withdrawn after publication