CN112559031B - Many-core program reconstruction method based on data structure - Google Patents

Many-core program reconstruction method based on data structure Download PDF

Info

Publication number
CN112559031B
CN112559031B CN201910910099.1A CN201910910099A CN112559031B CN 112559031 B CN112559031 B CN 112559031B CN 201910910099 A CN201910910099 A CN 201910910099A CN 112559031 B CN112559031 B CN 112559031B
Authority
CN
China
Prior art keywords
time
data structure
array
core
reconstruction method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910910099.1A
Other languages
Chinese (zh)
Other versions
CN112559031A (en
Inventor
徐金秀
何香
陈鑫
徐占
刘鑫
李芳�
孙唯哲
郭恒
赵朋朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201910910099.1A priority Critical patent/CN112559031B/en
Publication of CN112559031A publication Critical patent/CN112559031A/en
Application granted granted Critical
Publication of CN112559031B publication Critical patent/CN112559031B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/72Code refactoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • G06F8/4441Reducing the execution time required by the program code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control
    • G06F9/30065Loop control instructions; iterative instructions, e.g. LOOP, REPEAT
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4482Procedural

Abstract

The invention discloses a many-core program reconstruction method based on a data structure, which comprises a reconstruction method based on extracting a basic type data structure, a reconstruction method based on space compression of array dimension reduction and a reconstruction method based on space compression of increasing transmission word length. The invention mainly aims at diversified data structures in the multi-stage heterogeneous many-core parallel computing problem, provides a high-efficiency data structure reconstruction method and improves the computing efficiency of heterogeneous parallel programs.

Description

Many-core program reconstruction method based on data structure
Technical Field
The invention relates to a many-core program reconstruction method based on a data structure, and belongs to the technical field of computers.
Background
In recent years, high-performance computing technology is rapidly developed, and numerical simulation computing software not only pursues higher and higher computing performance of a computer system, but also puts higher demands on storage capacity of the computer system. The data structure is a key factor influencing the performance of the computing software, and the data structure of the computing software designed by the multi-core computing system often becomes a soft rib which restricts the computing capability of the computing system when the computing software is calculated on a heterogeneous multi-core system.
Any computing software that wants to obtain correct results and good performance must have an ideal data structure designed for the data object. When researching a data structure, the relationship of data elements and the implementation mode thereof are generally considered, and meanwhile, the algorithm implementation and the operation execution efficiency need to be considered. The common problem encountered in the parallelization process of the many cores of the computing software is that a discrete data structure or a data structure with an intricate relationship causes the slave cores to frequently access irregular storage addresses of the master cores, so that the parallelization acceleration performance of the many cores of the computing software is greatly reduced.
Most of application software relates to complex data structures, a large number of physical quantity storage structures are basic data type multi-dimensional arrays, a data element storage structure with a complex logic relation is a complex data type, many-core parallelization is achieved, a slave core needs to visit a main core storage address, data with a certain length are copied to a slave core storage space, and efficient access and calculation are achieved for the slave core. At present, the heterogeneous many-core coprocessor has a simple structure, strong computing capability and limited storage capability, so that the problem of memory access exception of the coprocessor commonly occurs when a traditional multi-core program is directly parallelized by heterogeneous many-core.
Disclosure of Invention
The invention aims to provide a data structure-based many-core program reconstruction method, which mainly aims at diversified data structures in the multi-level heterogeneous many-core parallel computing problem, provides an efficient data structure reconstruction method and improves the computing efficiency of heterogeneous parallel programs.
In order to achieve the purpose, the invention adopts the technical scheme that: a many-core program reconstruction method based on a data structure comprises a reconstruction method based on extracting a basic type data structure, a reconstruction method based on array dimension reduction space compression, and a reconstruction method based on transmission word length increase space compression;
the reconstruction method based on the extracted basic type data structure comprises the following steps:
s11, analyzing a plurality of time hot spot functions during program operation by using a performance analysis tool or printing output information, and finding out the most time-consuming program segment in each time hot spot function;
s12, analyzing the most time-consuming cycle segments in each time hotspot function one by one, firstly analyzing the data structure of the cycle segments, and executing S13 if a data variable of a complex data type statement exists in the cycle segments; if only the data variable of the basic data type statement exists in the loop segment, the loop segment is completed by the slave core, and S16 is executed;
s13, extracting basic data type member variables related to the tasks of the loop segments from the data variables of the complex data type declarations in the loop segments, wherein the basic data type member variables are called original variables, and performing corresponding alias declarations of the basic data type data variables, which are called new variables;
s14, adding the statement of the new variable extracted in the step S13 in the time hotspot function variable statement part, and performing address upper and lower boundary matching on the memory address of the original variable and the corresponding memory address of the new variable at the starting position of the time hotspot function execution part;
s15, modifying the original variable name in the loop section into a new variable name, completing the task of the loop section by a slave core, and executing S16;
and S16, directly using the compiling instruction to carry out many-core accelerated parallelization aiming at the loop segment task completed by the slave core.
The reconstruction method based on the array dimension reduction space compression comprises the following steps:
s21, analyzing a plurality of time hotspot functions during program operation by using a performance analysis tool or printing output information, and finding out the most time-consuming program segment in each time hotspot function;
s22, analyzing the most time-consuming cycle sections in each time hotspot function one by one, and executing S23 if a multidimensional array exists in the data structure in the cycle sections and the multidimensional array does not have a data dependency relationship; if the multi-dimensional arrays do not exist in the loop section or data dependency exists among the multi-dimensional arrays, S25 is executed;
s23, performing corresponding dimension reduction array statement on the multidimensional array without the dependency relationship in the time hotspot function statement part;
s24, searching execution statements related to the multidimensional arrays without dependency relations, modifying the execution statements related to the reduced-dimension arrays into execution statements related to the reduced-dimension arrays, forming a new cycle segment for reconstructing the data structure, and completing the operation by the slave core;
and S25, directly using the compiling instruction to carry out many-core accelerated parallelization aiming at the loop segment task completed by the slave core.
The reconstruction method based on the space compression of the increased transmission word length comprises the following steps:
s31, analyzing a plurality of time hotspot functions during program operation by using a performance analysis tool or printing output information, and finding out the most time-consuming program segment in each time hotspot function;
s32, analyzing the most time-consuming cycle sections in each time hotspot function one by one, observing the characteristics of the calculation tasks in the cycle sections, and executing S33 if multiple execution statements have operations of reading one or more same array variables for multiple times and writing different array variables; if not, the loop section is completed by the slave core, and S35 is executed;
s33, optimizing the data structure of the array variables read for many times, declaring corresponding alias arrays in the time hotspot function declaration part, combining a plurality of written different array variables into one dimension expansion array, and declaring corresponding dimension expansion arrays in the time hotspot function declaration part;
s34, in the time hotspot function executing part, aiming at the reconstructed data structure, adjusting the discrete memory access of the alias array into continuous memory access, and simultaneously modifying the write operation corresponding to the expanded dimension array to form a cycle segment of the reconstructed data structure, wherein the cycle segment is completed by a slave core;
and S35, directly using the compiling instruction to carry out many-core accelerated parallelization aiming at the loop segment task completed by the slave core.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
the invention discloses a many-core program reconstruction method based on a data structure, which mainly aims at diversified data structures in the problem of multi-level heterogeneous many-core parallel computation, provides an efficient data structure reconstruction method and improves the computation efficiency of heterogeneous parallel programs; in the heterogeneous many-core parallel process, the complex data structure of the many-core parallel block is analyzed in advance, the data structure is optimized, the redundant data structure is reduced, the many-core parallelization of application software is realized by combining the performance advantages of a heterogeneous many-core system, the optimized data structure can greatly reduce the discrete access and storage overhead of a master core and a slave core, and the operation speed of a program is improved; the method is suitable for most high-performance scientific computing software, including ocean numerical simulation computation, aerodynamic numerical simulation and the like, and can optimize the data structure of application software, improve the utilization rate of the storage space of a computing system and improve the computing performance of heterogeneous many-core parallel programs.
Drawings
FIG. 1 is an example of a code for a method for extracting a reconstruction of a base data structure;
FIG. 2 is a pseudo code of a data structure reconstruction method based on spatial compression;
FIG. 3 is a code example of a data structure reconstruction method based on spatial compression;
FIG. 4 is a flow chart of a reconstruction method based on extracting a basic type data structure according to the present invention;
FIG. 5 is a flow chart of the reconstruction method of the space compression based on the array dimension reduction according to the present invention;
fig. 6 is a flow chart of a reconstruction method based on space compression for increasing transmission word length according to the present invention.
Detailed Description
The embodiment is as follows: a many-core program reconstruction method based on a data structure comprises a reconstruction method based on extracting a basic type data structure, a reconstruction method based on space compression of array dimension reduction, and a reconstruction method based on space compression of increasing transmission word length;
the reconstruction method based on the extracted basic type data structure comprises the following steps:
s11, analyzing a plurality of time hot spot functions during program operation by using a performance analysis tool or printing output information, and finding out the most time-consuming program segment in each time hot spot function;
s12, analyzing the most time-consuming cycle segments in each time hotspot function one by one, firstly analyzing the data structure of the cycle segments, and executing S13 if a data variable of a complex data type statement exists in the cycle segments; if only the data variable of the basic data type statement exists in the loop segment, the loop segment is completed by the slave core, and S16 is executed;
s13, extracting basic data type member variables related to the tasks of the loop segment from data variables of complex data type declarations in the loop segment, wherein the basic data type member variables are called original variables, and performing corresponding alias declarations of the basic data type data variables, wherein the alias declarations are called new variables;
s14, adding the statement of the new variable extracted in the step S13 in the time hotspot function variable statement part, and performing address upper and lower boundary matching on the memory address of the original variable and the corresponding memory address of the new variable at the starting position of the time hotspot function execution part;
s15, modifying the original variable name in the loop section into a new variable name, completing the task of the loop section by a slave core, and executing S16;
and S16, directly using the compiling instruction to carry out many-core accelerated parallelization aiming at the loop segment task completed by the slave core.
As redundant storage of the LDM space of the slave core and the access capability of the slave core are eliminated, the optimization performance of the many cores is greatly improved.
The reconstruction method based on the array dimension reduction space compression comprises the following steps:
s21, analyzing a plurality of time hotspot functions during program operation by using a performance analysis tool or printing output information, and finding out the most time-consuming program segment in each time hotspot function;
s22, analyzing the most time-consuming loop sections in each time hotspot function one by one, and executing S23 if a multidimensional array exists in a data structure in the loop section and the multidimensional array does not have a data dependency relationship; if the multi-dimensional arrays do not exist in the loop section or data dependency exists among the multi-dimensional arrays, S25 is executed;
taking the actual procedure shown in fig. 2 as an example: grad _ p is a four-dimensional array, which can be regarded as 2 np nlev three-dimensional arrays, and the third dimension of the three-dimensional array has no dependency relationship in use; through analyzing the gradient _ sphere function, the dependency relationship does not exist in the calculation of the third dimension;
s23, performing corresponding dimension reduction array statement on the multidimensional array without the dependency relationship in the time hotspot function statement part;
as in the example of fig. 2, through the above correlation analysis, the gradient _ sphere function can be calculated separately, so that the gradient _ sphere function is realized again, only half of the grad _ p is calculated each time, and the reconstruction can save half of the DMA space;
s24, searching execution statements related to the multidimensional arrays without dependency relations, modifying the execution statements related to the reduced-dimension arrays into execution statements related to the reduced-dimension arrays, forming a new cycle segment for reconstructing the data structure, and completing the operation by the slave core;
and S25, directly using the compiling instruction to carry out many-core accelerated parallelization aiming at the loop segment task completed by the slave core.
The reconstruction method based on the space compression of the increased transmission word length comprises the following steps:
s31, analyzing a plurality of time hotspot functions during program operation by using a performance analysis tool or printing output information, and finding out the most time-consuming program segment in each time hotspot function;
s32, analyzing the most time-consuming cycle sections in each time hotspot function one by one, observing the characteristics of the calculation tasks in the cycle sections, and executing S33 if multiple execution statements have operations of reading one or more same array variables for multiple times and writing different array variables; if not, the loop section is completed by the slave core, and S35 is executed;
s33, optimizing the data structure of the array variables read for many times, declaring corresponding alias arrays in the time hotspot function declaration part, combining a plurality of written different array variables into one dimension expansion array, and declaring corresponding dimension expansion arrays in the time hotspot function declaration part;
s34, in the time hotspot function executing part, aiming at the reconstructed data structure, adjusting the discrete memory access of the alias array into continuous memory access, and simultaneously modifying the write operation corresponding to the expanded dimension array to form a cycle segment of the reconstructed data structure, wherein the cycle segment is completed by a slave core;
the continuity of data access is increased, so that the transmission word length is increased by one DMA access;
and S35, directly using the compiling instruction to carry out many-core accelerated parallelization aiming at the loop segment task completed by the slave core.
The values of the 5 groups of variables in the upper box of fig. 3 are obtained from two identical variable operations, respectively, and due to the characteristics of data storage, the 5 groups on the left require 5 DMA write operations;
the data structure is optimized as follows: the left 5 arrays are merged into one expanded dimension array, and the right array structure is adjusted to be read continuously from the first dimension. Therefore, the DMA writing operation of the slave core is changed into 1 time, the length of the single DMA is increased by 4 times, the DMA times are reduced, meanwhile, higher bandwidth utilization rate is obtained, and the calculation/access ratio is greatly improved.
The examples are further explained below:
the invention provides a plurality of data structure optimization modes.
The most commonly used method is to extract a basic data structure reconstruction method, so that complex data types in computing software are simplified, the purpose is to extract a basic type data structure related to data operation in a many-core parallel block, and the method is specifically realized as shown in fig. 1.
In the figure, 3 complex data types exist in a left frame, type _ h comprises a real data type 3-dimensional array, type _ dish comprises a real data type 4-dimensional array, and type _ dh comprises a plurality of basic data type elements, variables type _ h, type _ dish and type _ dh are involved in a many-core parallel block, the problem that data variables are not identified can occur in direct many-core parallel, and corresponding basic data types t _ h _ h, t _ dish _ h and area need to be declared. After the variables are declared, addresses are required to be matched with the new variables and the original variables, and the access of the slave core to the main core variable area is changed into continuous access and storage operation, so that on one hand, redundant storage of an LDM space is reduced, and on the other hand, the performance of the slave core to the main memory is improved.
The second method is a data structure reconstruction method based on spatial compression. The multidimensional arrays serving as local variables in part of application software core calculation have no dependency relationship in some dimensions, and can achieve the effect of spatial multiplexing through dimension reduction, and the specific design idea is shown in fig. 2.
In fig. 2, the upper block diagram is the original code, and grad _ p is a four-dimensional array, which can be regarded as 2 np nlev three-dimensional arrays, and the third dimension of the three-dimensional array has no dependency relationship in use. Therefore, the gradient _ sphere function is re-implemented, only half of the grad _ p is calculated each time, and the reconstruction can save half of the DMA space.
The third method is a data structure reconstruction method based on space compression, and the core idea is to increase the word length of single transmission as much as possible, obtain higher DMA bandwidth, reduce the number of DMA and avoid DMA congestion.
Values of 5 array variables in an example in an upper box in fig. 3 are obtained from two same variable operations respectively, and are directly subjected to many-core parallel, 5 arrays on the left side need 5 times of DMA write operations, after data structure optimization, the 5 arrays are combined into one extended dimension array, the DMA write operations from the core are changed into 1 time, and the length of a single DMA is increased by 4 times, so that higher bandwidth utilization rate is obtained while the DMA times are reduced, and the calculation/memory access ratio is greatly improved.
When the data structure-based many-core program reconstruction method is adopted, the efficient data structure reconstruction method is mainly provided for diversified data structures in the multi-level heterogeneous many-core parallel computing problem, and the computing efficiency of heterogeneous parallel programs is improved; in the heterogeneous many-core parallel process, the complex data structure of the many-core parallel block is analyzed in advance, the data structure is optimized, the redundant data structure is reduced, the many-core parallelization of application software is realized by combining the performance advantages of a heterogeneous many-core system, the optimized data structure can greatly reduce the discrete access and storage overhead of a master core and a slave core, and the operation speed of a program is improved; the method is suitable for most high-performance scientific computing software, including ocean numerical simulation computation, aerodynamic numerical simulation and the like, and can optimize the data structure of application software, improve the utilization rate of the storage space of a computing system and improve the computing performance of heterogeneous many-core parallel programs.
To facilitate a better understanding of the invention, the terms used herein will be briefly explained as follows:
data structure: in the English data structure, data elements with specific relations among the data elements are organized according to logical relations to form a set, which can be defined as a data structure and is a data storage mode of a computer, and the good data structure can improve the data processing capacity of the computer.
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (1)

1. A many-core program reconstruction method based on a data structure is characterized in that: the method comprises a reconstruction method based on extracting a basic type data structure, a reconstruction method based on array dimension reduction space compression, and a reconstruction method based on transmission word length increase space compression;
the reconstruction method based on the extracted basic type data structure comprises the following steps:
s11, analyzing a plurality of time hot spot functions during program operation by using a performance analysis tool or printing output information, and finding out the most time-consuming program segment in each time hot spot function;
s12, analyzing the most time-consuming cycle sections in each time hotspot function one by one, firstly analyzing the data structure of the cycle sections, and executing S13 if data variables declared by complex data types exist in the cycle sections; if only the data variable of the basic data type statement exists in the loop section, the loop section is completed by the slave core, and S16 is executed;
s13, extracting basic data type member variables related to the tasks of the loop segments from the data variables of the complex data type declarations in the loop segments, wherein the basic data type member variables are called original variables, and performing corresponding alias declarations of the basic data type data variables, which are called new variables;
s14, adding the statement of the new variable extracted in the S13 in the time hot spot function variable statement part, and performing address upper and lower boundary matching on the memory address of the original variable and the corresponding memory address of the new variable at the starting position of the time hot spot function execution part;
s15, modifying the original variable name in the loop section into a new variable name, completing the task of the loop section by a slave core, and executing S16;
s16, directly using a compiling instruction to carry out many-core accelerated parallelization aiming at the loop segment task completed by the slave core;
the reconstruction method based on the array dimension reduction space compression comprises the following steps:
s21, analyzing a plurality of time hotspot functions during program operation by using a performance analysis tool or printing output information, and finding out the most time-consuming program segment in each time hotspot function;
s22, analyzing the most time-consuming loop sections in each time hotspot function one by one, and executing S23 if a multidimensional array exists in a data structure in the loop section and the multidimensional array does not have a data dependency relationship; if the multi-dimensional arrays do not exist in the loop section or data dependency exists among the multi-dimensional arrays, S25 is executed;
s23, performing corresponding dimension reduction array statement on the multidimensional array without the dependency relationship in the time hotspot function statement part;
s24, searching execution statements related to the multidimensional arrays without dependency relations, modifying the execution statements related to the reduced-dimension arrays into execution statements related to the reduced-dimension arrays, forming a new cycle segment for reconstructing the data structure, and completing the operation by the slave core;
s25, directly using a compiling instruction to carry out many-core accelerated parallelization aiming at the loop segment task completed by the slave core;
the reconstruction method based on the space compression of the increased transmission word length comprises the following steps:
s31, analyzing a plurality of time hotspot functions during program operation by using a performance analysis tool or printing output information, and finding out the most time-consuming program segment in each time hotspot function;
s32, analyzing the most time-consuming cycle sections in each time hotspot function one by one, observing the characteristics of the calculation tasks in the cycle sections, and executing S33 if multiple execution statements have operations of reading one or more same array variables for multiple times and writing different array variables; if not, the loop section is completed by the slave core, and S35 is executed;
s33, optimizing the data structure of the array variables read for many times, declaring corresponding alias arrays in the time hotspot function declaration part, combining a plurality of written different array variables into one expanded-dimension array, and declaring corresponding expanded-dimension arrays in the time hotspot function declaration part;
s34, in the time hotspot function executing part, aiming at the reconstructed data structure, adjusting the discrete memory access of the alias array into continuous memory access, and simultaneously modifying the write operation corresponding to the expanded dimension array to form a cycle segment of the reconstructed data structure, wherein the cycle segment is completed by a slave core;
and S35, directly using the compiling instruction to carry out many-core accelerated parallelization aiming at the loop segment task completed by the slave core.
CN201910910099.1A 2019-09-25 2019-09-25 Many-core program reconstruction method based on data structure Active CN112559031B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910910099.1A CN112559031B (en) 2019-09-25 2019-09-25 Many-core program reconstruction method based on data structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910910099.1A CN112559031B (en) 2019-09-25 2019-09-25 Many-core program reconstruction method based on data structure

Publications (2)

Publication Number Publication Date
CN112559031A CN112559031A (en) 2021-03-26
CN112559031B true CN112559031B (en) 2022-10-04

Family

ID=75029110

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910910099.1A Active CN112559031B (en) 2019-09-25 2019-09-25 Many-core program reconstruction method based on data structure

Country Status (1)

Country Link
CN (1) CN112559031B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929724A (en) * 2012-11-06 2013-02-13 无锡江南计算技术研究所 Multistage memory access method and discrete memory access method based on heterogeneous multi-core processor

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012050519A (en) * 2010-08-31 2012-03-15 Fujifilm Corp Mammographic apparatus

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929724A (en) * 2012-11-06 2013-02-13 无锡江南计算技术研究所 Multistage memory access method and discrete memory access method based on heterogeneous multi-core processor

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries》;Bin Ren 等;《2011 International Conference on Parallel Architectures and Compilation Techniques》;20111231;全文 *
《高通量众核并行模拟加速技术研究》;方国庆 等;《计算机工程》;20170430;全文 *

Also Published As

Publication number Publication date
CN112559031A (en) 2021-03-26

Similar Documents

Publication Publication Date Title
Nukada et al. Bandwidth intensive 3-D FFT kernel for GPUs using CUDA
CN109002659B (en) Fluid machinery simulation program optimization method based on super computer
US8738349B2 (en) Gate-level logic simulator using multiple processor architectures
EP3396542B1 (en) Database operating method and device
Verma et al. Accelerating workloads on fpgas via opencl: A case study with opendwarfs
CN105511867A (en) Optimization mode automatic generation method and optimization device
CN113553057B (en) Optimization system for parallel computing of GPUs with different architectures
Lyuh et al. High-level synthesis for low power based on network flow method
CN112446471A (en) Convolution acceleration method based on heterogeneous many-core processor
CN112559031B (en) Many-core program reconstruction method based on data structure
CN109522127B (en) Fluid machinery simulation program heterogeneous acceleration method based on GPU
Panda et al. Incorporating DRAM access modes into high-level synthesis
Breß et al. Exploring the design space of a GPU-aware database architecture
CN111475205A (en) Coarse-grained reconfigurable array structure design method based on data flow decoupling
Wang et al. An automatic-addressing architecture with fully serialized access in racetrack memory for energy-efficient CNNs
WO2022078400A1 (en) Device and method for processing multi-dimensional data, and computer program product
CN116185377A (en) Optimization method and device for calculation graph and related product
JP2663893B2 (en) Architecture simulator
CN110851178B (en) Inter-process program static analysis method based on distributed graph reachable computation
Gavriilidis Computation offloading in jvm-based dataflow engines
CN109271344B (en) Data preprocessing method based on parallel file reading of Shenwei chip architecture
Ande et al. tachyon: Efficient Shared Memory Parallel Computation of Extremum Graphs
Qiao et al. A customizable MapReduce framework for complex data-intensive workflows on GPUs
Adoni et al. Hgraph: Parallel and distributed tool for large-scale graph processing
Yan et al. Optimizing algorithm of sparse linear systems on gpu

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant