CN110377525A - A kind of parallel program property-predication system based on feature and machine learning when running - Google Patents

A kind of parallel program property-predication system based on feature and machine learning when running Download PDF

Info

Publication number
CN110377525A
CN110377525A CN201910680598.6A CN201910680598A CN110377525A CN 110377525 A CN110377525 A CN 110377525A CN 201910680598 A CN201910680598 A CN 201910680598A CN 110377525 A CN110377525 A CN 110377525A
Authority
CN
China
Prior art keywords
program
basic block
pitching pile
feature
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910680598.6A
Other languages
Chinese (zh)
Other versions
CN110377525B (en
Inventor
张伟哲
何慧
王一名
郝萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201910680598.6A priority Critical patent/CN110377525B/en
Publication of CN110377525A publication Critical patent/CN110377525A/en
Application granted granted Critical
Publication of CN110377525B publication Critical patent/CN110377525B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3608Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A kind of parallel program property-predication system based on feature and machine learning when running, belongs to the technical field of parallel program property-predication.That there are expenses in order to solve the parallel program property-predication system based on machine learning is bigger by the present invention, predicted time is long, and the problem that accuracy rate is lower.Mixing pitching pile is carried out to original program, reduce basic block counter, then program is deleted into the serial program of not input results, the process that prewired program executes while reducing the runing time of program, quickly and accurately get basic block frequency, it pre-processes these data, in input prediction model, finally exports the execution time of large-scale parallel program.The model that the present invention generates has very strong generalization ability, can accurately predict the execution time of large-scale parallel program, and predict expense very little.

Description

A kind of parallel program property-predication system based on feature and machine learning when running
Technical field
The parallel program property-predication system of feature and machine learning, belongs to parallel when the present invention relates to a kind of based on operation Procedural foreseeable technical field.
Background technique
With the rapid growth of high performance computing system scale and complexity, such as number of nodes, storage, user is in high-performance The cost that concurrent application is executed in computing system is consequently increased, many concurrent programs holding in high performance computing system Line efficiency is relatively low, causes the waste of system resource, this leads to the efficiency and scalability of high performance system and application program Problem becomes more and more prominent.Therefore, before executing concurrent program on a large scale in high performance computing system, by running small rule Mould concurrent program is predicted to be very important in the performance of large-scale parallel program on the target system.In addition, according to prediction As a result, carrying out performance optimization to concurrent program, it can be effectively reduced the cost of execution, avoid the waste of resource.
Document number be CN101650687B prior art discloses a kind of large-scale parallel program property-predication realization sides Method comprising: the communication sequence and sequence for collecting concurrent program calculate vector, analyze the similitude that each process calculates and selection Representational process records the Content of Communication of representative process, has generation using the calculate node playback of target platform Table process, the sequence for obtaining representative process calculate the time, and the calculating time of other processes is replaced with these calculating times; Obtain the communications records of concurrent program;Use the program feature that network simulator automatic Prediction is final.It can be made by this method With seldom hardware resource, accurate concurrent program estimated performance is obtained.
That there are expenses is bigger for parallel program property-predication system based on machine learning, predicted time is long, and accurate Rate is lower, in the prior art without to the lower parallel program performance for reaching optimal compromise of expense, predicted time, accuracy rate of sening as an envoy to Forecasting system.
Summary of the invention
The technical problem to be solved by the present invention is
That there are expenses in order to solve the parallel program property-predication system based on machine learning is bigger by the present invention, prediction when Between long, and the problem that accuracy rate is lower.
The present invention solves the technical solution that above-mentioned technical problem uses are as follows:
A kind of parallel program property-predication system based on feature and machine learning when running, the system comprises features to obtain Modulus block, performance modeling module and performance prediction module,
Feature obtains module and carries out " edge to it after concurrent program to be measured is converted to LLVM IR form Profiling pitching pile " generates the concurrent program (executable program) after pitching pile, with different input sizes and process number, executes Concurrent program after pitching pile generates total run time, process number, basic block frequency, to the total run time, process number, base Three kinds of parameters of this block frequency are pre-processed;
Performance modeling module, for using pretreated process number, basic block frequency as input;Pretreated execution Time carries out machine learning as output, obtains performance prediction model after machine learning;
Performance prediction module for above-mentioned concurrent program to be measured to be converted to LLVM IR form, then carries out basic block to it Pitching pile is mixed, program is carried out again after pitching pile and deletes to obtain executable serial program, to obtain input size in module than feature Big different input size and process number execute the serial program with process number, generate process number and basic block frequency, so Process number and basic block frequency are pre-processed again afterwards;Using after processing process number and basic block frequency as the performance The input of prediction model, the concurrent program for obtaining prediction execute the output of time.
Further, detailed process is as follows for edge profiling pitching pile algorithm,
Input are as follows: the LLVM IR of concurrent program,
Output are as follows: the IR after edge profiling pitching pile,
1) counter group C, is created in concurrent program to be measured, and is initialized as zero;
2), judge whether the side in figure is critical edge in the corresponding controlling stream graph of LLVM IR of concurrent program, if It is between the source basic block (basic block) and target basic block of critical edge e, to be inserted into new basic block newbb;In new basic block Code { C [index] ++ } is added before the command for stopping of newbb;Otherwise, in the source basic block of critical edge e or target basic block Code { C [index] ++ } is added before command for stopping, completes pitching pile.
Further, detailed process is as follows for mixing pitching pile algorithm,
Input: the LLVM IR of concurrent program,
Output: the IR after mixing pitching pile,
1) feature, is obtained to obtain through handling the basic block collection selected in module,
2) counter group C, is created in target program, and is initialized as zero;
3), for the circulation l comprising basic block selected in step 1) in concurrent program to be measured, judge whether l is certainly It so recycles and judges whether the head block h on the side back in circulation is dominated by the basic block, if it is, in head node header A preheader block p is created before;Then following steps are executed:
A preheader block p is created before node header;
Obtain the relevant value of LTC: %start, %end, %stride;
Code is added before the command for stopping of pCalculate the LTC of l;
It is executed when p traversal, adds code { C [index] +=Г } before the command for stopping of p;
Otherwise, code { C [index] ++ } is added in above-mentioned basic block.
Further, detailed process is as follows for program Pruning algorithm:
Input: the IR after concurrent program mixing pitching pile
Output: the IR after deleting
1) code relevant to being exported in concurrent program in the IR after, first deleting concurrent program mixing pitching pile;
2) function call in MPI concurrent program, is deleted again,
3) dead code, is finally eliminated.
The present invention has following advantageous effects:
It accurately predicts the performance of large-scale parallel program, can not only be customer analysis program feature, it can be in height Application program is effectively carried out in performance computing system, moreover it is possible to help user management and schedule job, reasonably allocation schedule plan Slightly, the operation waiting time is reduced, and is able to carry out stock assessment, user is instructed to apply for resource.Therefore, the invention proposes one A parallel program property-predication system, the model which generates have very strong generalization ability, can accurately predict to advise greatly The execution time of mould concurrent program, and predict expense very little, the value with very strong practical application.
It is of the present invention based on operation when feature and machine learning parallel program property-predication system in operation when it is special Sign refers to basic block frequency, and parallel program performance refers to the execution time of program.
Detailed description of the invention
Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with tool of the invention Body embodiment or embodiment are used together to explain the present invention, are not construed as limiting the invention.In the accompanying drawings:
Fig. 1 is parallel program property-predication frame construction drawing of the present invention;
Fig. 2 is predicted time of 6 kinds of concurrent programs characterized by basic block and actual time comparison diagram, in figure a) Sweep3D, b) LULESH, c) NPB SP, d) NPB BT, e) NPB LU and f) meaning of NPB EP indicate simultaneously well known to being Line program title;Ordinate in figure indicates to execute time, abscissa expression sample size;
Fig. 3 is the box figure of MAPE of six kinds of concurrent programs characterized by basic fast frequency, and SVR, RF, Ridge indicate three Kind machine learning method;
Fig. 4 is the comparison diagram of three kinds of method errors;
Fig. 5 is the comparison diagram of six kinds of concurrent programs prediction expenses and raw overhead.
Specific embodiment
In conjunction with shown in Fig. 1 to 5, for a kind of concurrent program based on feature and machine learning when running of the present invention The realization of performance prediction system is illustrated as follows:
1 parallel program property-predication system
As shown in Figure 1, the parallel program property-predication system is broadly divided into three parts: feature obtains, the sum of performance modeling Performance prediction.First part is that performance of program obtains, mainly by carrying out edge profiling to small-scale concurrent program Pitching pile obtains training data feature, and the program pitching pile in the present invention is all based on LLVM compiler framework, after pitching pile is performed a plurality of times Program, be averaged, obtain process number and basic block frequency, as the feature of training data, the total run time of program is made For parallel program performance index of the invention, feature pretreatment is then carried out, the generalization ability of model is improved;Second part is property It can model, using the machine learning regression algorithm for having supervision, carry out performance modeling, constantly adjust ginseng, it is pre- to evaluate optimal performance Survey model;Part III is to carry out large-scale parallel program property-predication using this model, needs the extensive journey of quick obtaining Input of the basic block frequency as model when sort run reduces basic block it is therefore proposed that carrying out mixing pitching pile to original program Then program is deleted into the serial program of not input results by counter, retain journey while reducing the runing time of program The process that sequence executes, quickly and accurately gets basic block frequency, pre-processes these data, last defeated in input prediction model The execution time of large-scale parallel program out.
The acquisition of 2 performance model features
Small-scale concurrent program is changed into LLVM intermediate code form first with LLVM compiler framework front end, is then write It realizes the Pass of edge profiling pitching pile, executes Pass, pitching pile automatically is carried out to program.Then, after executing pitching pile Program generates the file comprising basic block frequency.It include process number and basic block frequency by data preparation matter finally, reading file The data set of rate.Edge profiling pitching pile algorithm is specifically expressed as follows:
3 performance modelings based on machine learning
First carry out feature pretreatment, mainly to data carry out non-linear normalizing, and by removal repeated characteristic, Variance back-and-forth method and Pearson correlation coefficient select suitable feature.Then using SVR, Ridge recurrence and tri- kinds of engineerings of RF It practises algorithm and carries out performance modeling, divide data into training set, test set and verifying collection, use training set model of fit, test Collect adjusting parameter and collect the assessment for carrying out model using verifying, wherein grid data service and k folding cross-validation method are combined, comments Ginseng is constantly adjusted while estimating model, automatically selects out optimal configuration parameter.Mould is assessed using mean absolute percentage error Type generalization ability.
The performance prediction of 4 large-scale parallel programs
In order to predict large-scale parallel program performance, feature when needing to obtain the operation of extensive program, as prediction model Input.Although obtaining the expense very little of small-scale performance of program using edge profiling pitching pile, obtained using it The expense of extensive performance of program is very big.Therefore, it is necessary to reduce the expense of extensive program after pitching pile.It is brought to reduce pitching pile Expense, propose a kind of combination process pitching pile algorithm.In addition, in order to reduce the expense that extensive program itself executes, it is also proposed that A kind of program Pruning algorithm.
Mixing pitching pile algorithm will combine dynamic pitching pile and static pitching pile.It is held using cycle-index recognition methods estimation circulation Capable number can directly obtain cycle-index in the process of running, do not need insertion counter and add up.If circulation is concluded Initialization of variable is %start, and the condition for exiting circulation is %end, and the stepping of circulation is %stride.The meter of cycle-index Г Calculation form is as follows:
The new basic block for being known as preheader is added before the header of circulation, and by the basic block in header Counter is moved in preheader, and insertion calculates the formula of cycle-index in preheader, in this way, there is no need to be inserted into meter Number device.This method can be further reduced the quantity of access and refresh counter.But the base in not all Natural Circulation This block counter can be moved in preheader.Determine that this is followed next, being given in the Natural Circulation comprising branch Whether the basic block frequency in ring can be moved to the method in preheader basic block.Judged using definition below basic Whether the counter of block can be moved to preheader node.
1 is defined in a controlling stream graph, input node b0, if each path from b0 to bj all has to pass through bi When, then claim node bi to dominate node bj, writes and be bi > > bj.According to definition, each node dominates oneself, for example, bi > > bi.
Mixing pitching pile algorithm is specifically expressed as follows:
Program Pruning algorithm obtains selected basic block frequency in the case where not considering calculated result and therefore first retains Initial code and the relevant code of pitching pile, the program after being deleted with guarantee can operate normally and accurate recording basic block frequency Then rate deletes useless and relevant to output code in IR.In addition, in order to generate a serial program, it is also necessary to delete The part of concurrent program function call.After deleting code relevant to output and MPI function call code, it may appear that many Dead code, these codes are not used for other calculating, these dead codes can be deleted from IR by executing dead code elimination.In this way, IR just is reduced, obtain smaller executable program and executes speed faster.
Program Pruning algorithm is specifically expressed as follows:
Technical effect of the invention is described below below:
1 prediction result
Table 1 illustrates the set of two kinds of features, the first is common method (INPUT), and selection is characterized in input Parameter and process number, second is method proposed by the present invention (RUNTIME), and selection is characterized in basic block frequency and process Number.The method characterized by basic block frequency is substantially better than the method characterized by inputting parameter as can be seen from Table 1.This hair Bright method MAPE is 20% hereinafter, the average MAPE of 6 kinds of Parallel applications is 8.41%.
The feature set and MAPE of 1 concurrent program of table
Table 2 is the standard deviation of 6 kinds of concurrent program prediction errors, can be clearly seen that the dispersion degree of prediction error, from And analyze the stability of model.The stability of RF is best in the method characterized by inputting parameter, is using basic block frequency When rate is as feature, SVR is better than RF.Generally speaking, the stability of the SVR model using basic block frequency as feature is best.
These results indicate that compared with the conventional machines learning method only with input parameter attribute, it is special when based on operation The automatic performance modeling of sign can establish better performance model, significantly improve precision of prediction and stability.
The standard deviation of 2 concurrent program error of table
Fig. 2 show respectively Sweep3D, LULESH and NPB Parallel application using SVR, RF and Ridge regression algorithm, and with Basic block is the predicted time of characteristic and the comparison diagram of the true runing time of program.When in these figures, according to actual motion Incremental order is ranked up the sample of test set, and most deep point is true program execution time, other shallower points indicate The time of machine learning model prediction.
Fig. 3 is the box figure of MAPE of 6 kinds of Parallel applications characterized by basic block frequency, and box figure can be avoided exceptional value Influence, accurately show the discrete distribution of data.From these figures, it can clearly find out that the prediction error of SVR is minimum.
2 comparative experimentss
Method proposed by the present invention and the other two kinds classical performance prediction models based on input parameter will be compared. Both methods is the method for Branes and the method for Hoefler.The comparison of three kinds of method errors is as shown in Figure 4.
The MAPE of 3 three kinds of methods of table
3 performance prediction expenses
When predicting the performance of concurrent application, it is only necessary to which execution deletes rear serial program accordingly to collect basic block Frequency, without executing original parallel application program.The data of generation only include several basic blocks (6 in the present invention) Basic block frequency, storage overhead can be ignored.Therefore, the serial program after main assessment is deleted is opened in the execution of prediction Pin.Computing resource on supercomputer is therefore, in this experiment, to predict table when expense also uses core according to charging when core Show.
Table 4 shows when predicting the performance of 6 selected application programs, when the core of method consumption of the invention and initially simultaneously The comparison of number when the core that row application program executes.It can be found from this table, execute the method for the present invention in 6 application programs All expenses be all significantly less than original application program execution expense.Average administration fee only accounts for former execution cost 0.1219%.This means that method HPC user can be helped effectively to predict the performance of concurrent application.This is because deleting As soon as the program after subtracting is an independent serial program, it only uses a node or a core that can execute.In addition, being inserted by reducing Enter the quantity of counter and eliminate many dead codes to optimize this serial program, further increases its performance.
The average overhead of table 4 method and original execution
Fig. 5 illustrates the prediction expense of 6 kinds of concurrent programs and the comparison diagram of raw overhead, in these figures, according to reality Incremental order when operation is ranked up the sample of test set, and when y-axis is core, the line close to x-axis is prediction expense, far from x The line of axis is raw overhead.From these figures, it can be clearly seen that prediction expense is far smaller than the expense of original program execution.

Claims (4)

1. a kind of parallel program property-predication system based on feature and machine learning when running, which is characterized in that the system Module, performance modeling module and performance prediction module are obtained including feature,
Feature obtains module and carries out " edge to it after concurrent program to be measured is converted to LLVM IR form Profiling pitching pile " generates the concurrent program after pitching pile, parallel after executing pitching pile with different input sizes and process number Program generates total run time, process number, basic block frequency, to the total run time, process number, three kinds of basic block frequency Parameter is pre-processed;
Performance modeling module, for using pretreated process number, basic block frequency as input;The pretreated execution time Machine learning is carried out as output, obtains performance prediction model after machine learning;
Performance prediction module for above-mentioned concurrent program to be measured to be converted to LLVM IR form, then carries out basic block mixing to it Pitching pile carries out program after pitching pile again and deletes to obtain executable serial program, with than feature obtain in module input size and into The big different input size of number of passes and process number execute the serial program, generate process number and basic block frequency, then again Process number and basic block frequency are pre-processed;Using after processing process number and basic block frequency as the performance prediction The input of model, the concurrent program for obtaining prediction execute the output of time.
2. a kind of parallel program property-predication system based on feature and machine learning when running according to claim 1, It is characterized in that, detailed process is as follows for edge profiling pitching pile algorithm,
Input are as follows: the LLVM IR of concurrent program,
Output are as follows: the IR after edge profiling pitching pile,
1) counter group C, is created in concurrent program to be measured, and is initialized as zero;
2), judge whether the side in figure is critical edge in the corresponding controlling stream graph of LLVM IR of concurrent program, if so, Between the source basic block and target basic block of critical edge e, it is inserted into new basic block newbb;In the command for stopping of new basic block newbb Preceding addition code { C [index] ++ };Otherwise, generation is added before the command for stopping of the source basic block of critical edge e or target basic block Code { C [index] ++ } completes pitching pile.
3. a kind of parallel program property-predication system based on feature and machine learning when running according to claim 1 or 2 System, which is characterized in that detailed process is as follows for mixing pitching pile algorithm,
Input: the LLVM IR of concurrent program,
Output: the IR after mixing pitching pile,
1) feature, is obtained to obtain through handling the basic block collection selected in module,
2) counter group C, is created in target program, and is initialized as zero;
3), for the circulation l comprising basic block selected in step 1) in concurrent program to be measured, judge whether l is that nature follows Ring and judge whether the head block h on the side back is dominated by the basic block in circulation, if it is, before head node header Create a preheader block p;Then following steps are executed:
A preheader block p is created before node header;
Obtain the relevant value of LTC: %start, %end, %stride;
Code is added before the command for stopping of pCalculate the LTC of l;
It is executed when p traversal, adds code { C [index] +=Г } before the command for stopping of p;
Otherwise, code { C [index] ++ } is added in above-mentioned basic block.
4. a kind of parallel program property-predication system based on feature and machine learning when running according to claim 3, It is characterized in that, detailed process is as follows for program Pruning algorithm:
Input: the IR after concurrent program mixing pitching pile
Output: the IR after deleting
1) code relevant to being exported in concurrent program in the IR after, first deleting concurrent program mixing pitching pile;
2) function call in MPI concurrent program, is deleted again,
3) dead code, is finally eliminated.
CN201910680598.6A 2019-07-25 2019-07-25 Parallel program performance prediction system based on runtime characteristics and machine learning Active CN110377525B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910680598.6A CN110377525B (en) 2019-07-25 2019-07-25 Parallel program performance prediction system based on runtime characteristics and machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910680598.6A CN110377525B (en) 2019-07-25 2019-07-25 Parallel program performance prediction system based on runtime characteristics and machine learning

Publications (2)

Publication Number Publication Date
CN110377525A true CN110377525A (en) 2019-10-25
CN110377525B CN110377525B (en) 2022-11-15

Family

ID=68256290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910680598.6A Active CN110377525B (en) 2019-07-25 2019-07-25 Parallel program performance prediction system based on runtime characteristics and machine learning

Country Status (1)

Country Link
CN (1) CN110377525B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522644A (en) * 2020-04-22 2020-08-11 中国科学技术大学 Method for predicting running time of parallel program based on historical running data
CN113553266A (en) * 2021-07-23 2021-10-26 湖南大学 Parallelism detection method, system, terminal and readable storage medium of serial program based on parallelism detection model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063373A (en) * 2011-01-06 2011-05-18 北京航空航天大学 Method for positioning performance problems of large-scale parallel program
US20150032971A1 (en) * 2013-07-26 2015-01-29 Futurewei Technologies, Inc. System and Method for Predicting False Sharing
CN105183650A (en) * 2015-09-11 2015-12-23 哈尔滨工业大学 LLVM-based automatic performance prediction method for scientific calculation program
CN105183651A (en) * 2015-09-11 2015-12-23 哈尔滨工业大学 Viewpoint increase method for automatic performance prediction of program
CN105224452A (en) * 2015-09-11 2016-01-06 哈尔滨工业大学 A kind of prediction cost optimization method for scientific program static analysis performance
US20190213706A1 (en) * 2018-12-28 2019-07-11 Intel Corporation Techniques for graphics processing unit profiling using binary instrumentation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063373A (en) * 2011-01-06 2011-05-18 北京航空航天大学 Method for positioning performance problems of large-scale parallel program
US20150032971A1 (en) * 2013-07-26 2015-01-29 Futurewei Technologies, Inc. System and Method for Predicting False Sharing
CN105183650A (en) * 2015-09-11 2015-12-23 哈尔滨工业大学 LLVM-based automatic performance prediction method for scientific calculation program
CN105183651A (en) * 2015-09-11 2015-12-23 哈尔滨工业大学 Viewpoint increase method for automatic performance prediction of program
CN105224452A (en) * 2015-09-11 2016-01-06 哈尔滨工业大学 A kind of prediction cost optimization method for scientific program static analysis performance
US20190213706A1 (en) * 2018-12-28 2019-07-11 Intel Corporation Techniques for graphics processing unit profiling using binary instrumentation

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
GIOVANNI MARIANI等: ""Predicting cloud performance for HPC applications before deployment"", 《FUTURE GENERATION COMPUTER SYSTEMS》 *
MARTIN SCHOEBERL等: ""T-CREST: Time-predictable multi-core architecture for embedded systems"", 《JOURNAL OF SYSTEMS ARCHITECTURE》 *
WEIZHE ZHANG等: ""Predicting HPC parallel program performance based on LLVM compiler"", 《CLUSTER COMPUTING》 *
牛晓霞等: "基于edge profiling的循环运行时信息分析方法", 《计算机工程与应用》 *
谢虎成: ""基于LLVM的科学计算程序自动性能预测研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
陈莉: ""SMP机群上的并行代码优化技术"", 《中国优秀博硕士学位论文全文数据库 (博士) 信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522644A (en) * 2020-04-22 2020-08-11 中国科学技术大学 Method for predicting running time of parallel program based on historical running data
CN111522644B (en) * 2020-04-22 2023-04-07 中国科学技术大学 Method for predicting running time of parallel program based on historical running data
CN113553266A (en) * 2021-07-23 2021-10-26 湖南大学 Parallelism detection method, system, terminal and readable storage medium of serial program based on parallelism detection model

Also Published As

Publication number Publication date
CN110377525B (en) 2022-11-15

Similar Documents

Publication Publication Date Title
CN103235974B (en) A kind of method improving massive spatial data treatment effeciency
US20170330078A1 (en) Method and system for automated model building
CN104750780B (en) A kind of Hadoop configuration parameter optimization methods based on statistical analysis
Nunamaker Jr A methodology for the design and optimization of information processing systems
Kamthe et al. A stochastic approach to estimating earliest start times of nodes for scheduling DAGs on heterogeneous distributed computing systems
CN110377525A (en) A kind of parallel program property-predication system based on feature and machine learning when running
CN105607952A (en) Virtual resource scheduling method and apparatus
CN113822173A (en) Pedestrian attribute recognition training acceleration method based on node merging and path prediction
CN112948123A (en) Spark-based grid hydrological model distributed computing method
CN110516884A (en) A kind of short-term load forecasting method based on big data platform
CN112148942A (en) Business index data classification method and device based on data clustering
CN111444635A (en) XM L language-based system dynamics simulation modeling method and engine
CN113762514A (en) Data processing method, device, equipment and computer readable storage medium
CN108647135B (en) Hadoop parameter automatic tuning method based on micro-operation
Wang et al. FineQuery: Fine-grained query processing on CPU-GPU integrated architectures
CN114401496A (en) Video information rapid processing method based on 5G edge calculation
CN110928705B (en) Communication characteristic analysis method and system for high-performance computing application
CN113610225A (en) Quality evaluation model training method and device, electronic equipment and storage medium
CN109190160B (en) Matrixing simulation method of distributed hydrological model
CN108280574B (en) Evaluation method and device for structural maturity of power distribution network
CN110262891A (en) Across virtual platform automatic multifunctional resource cyclic utilization system
Chen et al. A deep learning-based approach with PSO for workload prediction of containers in the cloud
Jie A performance modeling-based HADOOP configuration tuning strategy
CN105512401A (en) Make-to-order based worker shift arrangement simulation method
Pham et al. Machine learning approach to generate pareto front for list-scheduling algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant