CN106250686B

CN106250686B - A kind of collective communication function modelling method of concurrent program

Info

Publication number: CN106250686B
Application number: CN201610599836.7A
Authority: CN
Inventors: 张伟哲; 何慧; 郝萌; 韩硕; 鲁刚钊
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2016-07-27
Filing date: 2016-07-27
Publication date: 2018-11-02
Anticipated expiration: 2036-07-27
Also published as: CN106250686A

Abstract

A kind of collective communication function modelling method of concurrent program, the present invention relates to the collective communication function modelling methods of concurrent program.The purpose of the present invention is to solve the prior arts to call duration time data acquisition inaccuracy, takes considerable time the shortcomings that money.A kind of collective communication function modelling method detailed process of concurrent program is：Step 1: the measuring assembly communication functions n times under experiment porch, obtain call duration time data of the collective communication function under different degree of parallelisms and data volume；Step 2: being fitted to call duration time data of the set communication function under different degree of parallelisms and data volume with the artificial neural network based on BP back-propagation algorithms, the neural network model of corresponding communication functions is obtained.The present invention is used for field of communication technology.

Description

A kind of collective communication function modelling method of concurrent program

Technical field

The present invention relates to the collective communication function modelling methods of concurrent program.

Background technology

The execution time of concurrent program is divided into calculating and communication two parts time, wherein when calculating time, that is, instruction execution Between, call duration time is the allocating time of communication functions.Time for each instruction passes through dynamic statistics instruction strip number and each in research The execution time of class machine instruction obtains, and call duration time is the emphasis of research.The parallel of scientific procedure is normally based on MPI (Message-Passing Interface, MPI) interface realizes that MPI defines the function library that can be called by program language. The information that communication functions are obtained by pitching pile, establishes the time model of communication functions, and final statistics obtains call duration time.Communicate letter Number is divided into point-to-point communication and collective communication, is point by analyzing the point-to-point communication time known to LogGP models about the traffic Section is linear, and the call duration time model of collective communication is research emphasis.

Invention content

The purpose of the present invention is to solve the prior arts to call duration time data acquisition inaccuracy, takes considerable time gold The shortcomings that money, and propose a kind of collective communication function modelling method of concurrent program.

A kind of collective communication function modelling method detailed process of concurrent program is：

Step 1: the measuring assembly communication functions n times under experiment porch, obtain collective communication function in different degree of parallelisms With the call duration time data under data volume；

The N value ranges are 1000-10000；

Step 2: with the artificial neural network based on BP back-propagation algorithms to set communication function in different degree of parallelisms It is fitted with the call duration time data under data volume, obtains the neural network model of corresponding communication functions.

Beneficial effects of the present invention are：

Experimental data is divided into training set and test set, wherein 70% is used as training set, 30% is used as test set.It is repeatedly mixed Cross-over experiment is closed, experiment every time upsets input data, chooses training set and test set again.Above-mentioned neural network structure is to logical The fitting effect of letter data is as shown in Fig. 6, Fig. 7, Fig. 8 and Fig. 9；Artificial neural network is verified using 4 examples of Fig. 6 to Fig. 9 To the fitting effect of different sets communication；

Fitting result is quantified using related coefficient, and related coefficient is the statistical indicator of level of intimate between reflecting variable.Its Middle Pierre blocks inferior correlation coefficient r and describes linearly related degree of strength between two variables, and ranging from [- 1,1] of r works as r>0 When, two variables are positively related.The absolute value of r is bigger, and correlation is stronger.Such as table 2.

2 related coefficient of table

Related coefficient can reflect correlativity between data, the precision measured with root-mean-square error (RMSE) quantization, such as Table 3.

3 RMSE of table

Solve the disadvantage that the prior art is low to call duration time data acquisition accuracy rate, takes considerable time money, this hair It is bright to propose a kind of collective communication function modelling method of concurrent program, the accuracy rate of call duration time data acquisition is improved, is saved Time and money.

Description of the drawings

Fig. 1 is flow chart of the present invention；

Fig. 2 is the function characteristic schematic diagram of Sigmoid activation primitives, and Sigmoid is a kind of S type functions, has the growth of S types Curve；

Fig. 3 is the function characteristic schematic diagram of Tanh activation primitives, and Tanh is a kind of hyperbolic tangent function；

Fig. 4 is the function characteristic schematic diagram of Relu activation primitives, and Relu is a kind of popular in artificial neural network Activation primitive, it will not tend to be saturated with gradually increasing for input parameter；

Fig. 5 is the function characteristic schematic diagram of Softplus activation primitives, and Softplus is flat for a kind of approximation of Relu functions Sliding function；

Fig. 6 is MPI_ALLreduce models fitting effects figures, and abscissa true value are actual value, ordinate It is predicted value for Predicted value, MPI_ALLreduce is global reduction function；

Fig. 7 is MPI_reduce models fitting effects figures, and MPI_reduce is reduction function；

Fig. 8 is MPI_Bcast models fitting effects figures, and MPI_Bcast is broadcast function；

Fig. 9 is MPI_Gather models fitting effects figures, and MPI_Gather is to collect function.

Specific implementation mode

Specific implementation mode one：Embodiment is described with reference to Fig. 1, and a kind of set of concurrent program of present embodiment is logical Letter number modeling method detailed process is：

The N value ranges are 1000-10000；

Step 2: with the artificial neural network based on BP (Error Back Propagation) back-propagation algorithm to collection It closes call duration time data of the communication function under different degree of parallelisms and data volume to be fitted, obtains corresponding communication functions Neural network model.

Specific implementation mode two：The present embodiment is different from the first embodiment in that：With being based in the step 2 The artificial neural network of BP (Error Back Propagation) back-propagation algorithm to set communication function different and Call duration time data under row degree and data volume are fitted, and obtain the neural network model of corresponding communication functions；Specific mistake Cheng Wei：

The flow of BP back-propagation algorithms is：

Forward-propagating process receives input signal first, successively by the weight and activation primitive work between each neuron With output layer is reached, the output valve after current iteration is obtained；

The error of epicycle iteration is calculated according to error definition mode；

Error is propagated backward into input layer from output layer according to certain rule, and successively adjusts weight and is missed with reducing Difference.It repeats the above process and is less than specified accuracy until reaching iterations or error, training terminates；

BP algorithm, error back propagation (Error Back Propagation, BP) algorithm.The basic thought of BP algorithm It is that learning process is made of the forward-propagating of signal and two processes of backpropagation of error.

Given sample set { (x₁,r₁),(x₂,r₂),…(x_q,r_q), neural network model output valve is y=(y₁,y₂… y_q), neural network model parameter W, b, W is weights, and b is threshold value.

x₁、x₂、x_qIt is characterized value, q is positive integer；r₁、r₂、r_qFor actual value, y₁、y₂、y_qIt is exported for neural network model Value；

The common expression-form for calculating neural network model output valve deviation actual value error size is as follows：

The number of plies is indicated with l, in the range of [1, n_l], n_lIndicate final output layer；Wherein S_lIndicate l layers of neuron number,Indicate the forward direction output valve of l layers of i-th of neuron,It is the input value of l layers of i-th of neuron, whereinw_ijFor weighted value,For the forward direction output valve of l-1 layers of j-th of neuron；

n_lFor positive integer；1≤i≤S_l；w∈W

Residual errorIndicate the influence that the node generates final output value；F (z) indicates the activation primitive of neuron；

Step 2 one, data prediction；

The selection of step 2 two, activation primitive；

Step 2 three carries out weight and threshold value more according to step 2 one and step 2 two using BP back-propagation algorithms Newly.

Other steps and parameter are same as the specific embodiment one.

Specific implementation mode three：The present embodiment is different from the first and the second embodiment in that：In the step 2 one Data prediction；Detailed process is：

Call duration time data under different degree of parallelisms and data volume be under a certain degree of parallelism, delta data measurement Call duration time is either under some data volume, the call duration time of variation degree of parallelism measurement；

Call duration time data under different degree of parallelisms and data volume are to assemble, is non-uniform, this has very training Big influence, therefore need first to upset before the call duration time data under different degree of parallelism and data volume are passed to network training Call duration time data under different degree of parallelisms and data volume；

The parameter value update of the neuron of first hidden layer is proportional to input value, if certain is one-dimensional excessive, joins Number updated value is just very big, conversely, parameter updated value is small.So different characteristic is just endowed different " importance ".Because of communication The data area gap of two features of function is larger, such as the range of degree of parallelism is (4,64), the range of data volume be (100, 1000000).Call duration time data under different degree of parallelism and data volume are normalized, normalization formula is as follows：

Max, min respectively refer to maximum value, minimum value in certain one-dimensional data, by normalization, artificial neural network it is defeated Enter ranging from (0,1) of feature；X ' is certain one-dimensional data after normalization, and x is certain one-dimensional data before normalization；

It is described certain it is one-dimensional be degree of parallelism, data volume or call duration time.

Other steps and parameter are the same as one or two specific embodiments.

Specific implementation mode four：Unlike one of present embodiment and specific implementation mode one to three：The step 2 The selection of activation primitive in two；Detailed process is：The experiment initial period selects common Sigmoid activation primitives, but effect is non- It is often poor, there is the case where not restraining.According to pushing over it is recognised that error is in backpropagation for BP algorithm, each layer can all multiply With the first derivative of activation primitive and current layer neuron input value.The derivative f ' (z) of Sigmoid functions=f (z) (1-f (z)),f′(z)∈(0,1).The range of the input value of neuron is also (0,1) or (- 1,1).Pass through each layer of error in this way Decay at double, gradient can ceaselessly decay causes network that cannot restrain until disappearing.It is final by many experiments

Relu activation primitives are selected, therefore gradient can flow well in backpropagation.

Relu activation primitive forms are：

F (z)=max (0, z)

In formula, z is Relu activation primitive input values, and Relu activation primitive gradients are 1, and only one end is saturated.

Softplus can regard the smoothed version of Relu as, and other than realizing unilateral inhibit, Relu functions make neuron Has sparse activity.This property mean the function only deep layer and with multiple nodes network structure in send out The effect of waving.Different node linears can be selected to combine, reduce the dependence to Nonlinear Mapping mechanism, network structure is cleverer It is living.

It is non-linear to realize that data area is converted to another space by activation primitive from a space.The string of activation primitive Connection and parallel-connection structure approach Any Nonlinear Function by changing parameter presentation.Common activation primitive have Sigmoid, Tanh, Relu, Softplus etc..

It is respectively from left to right from top to bottom for the function characteristic schematic diagram of activation primitive as Figure 2-Figure 5 Sigmoid、Tanh、Relu、Softplus。

Sigmoid functional forms are：

Function output area is [0,1].Sigmoid functions are from mathematical angle, function when data are excessive or too small Value tends to constant, and functional derivative is 0 at this time.It is partly intermediate region that function effect is really effective.Analogy to human brain structure, People often only retain interested ignoring big portion's feature of things.Therefore push emphasis feature to center.

Tanh functional forms are：

The output area of function is [- 1,1].Tanh functions are continuously differentiable point, the work of data in mathematical angle With similar Sigmoid functions, real live part is intermediate region.

Softplus functional forms are：

F (z)=log (1+exp (z))

Cause gradient excessive using exponential function as activation primitive, therefore alleviate the trend risen plus log, adds 1 to be Ensure nonnegativity.From the angle of biology, neurosurgeon thinks that human brain receives signal post-processing closer to Softplus functions.With Output valve is fixed on unlike a certain range by above-mentioned two function, which will retain the excited boundary of a side data, suppression The other side processed.

Other steps and parameter are identical as one of specific implementation mode one to three.

Specific implementation mode five：Unlike one of present embodiment and specific implementation mode one to four：The step 2 Carry out the update of weight and threshold value in three using BP back-propagation algorithms according to step 2 one and step 2 two；Specific steps are such as Under：

(1) propagated forward is very simple, which layer is indicated with l, in the range of [1,2,3 ..., n_l], n_lIndicate neural network mould The final output layer of type；2,3 are successively calculated ... n_lActivation value, the output of neural network model may finally be obtained as a result, table It is shown as h_W,b(x), n_lFor positive integer；；

(2) when l is n-th_lWhen layer, to n_lA certain neuron i in all neurons of layer, first according to error formulaError is obtained, error pair n-th is utilized_lLayer i-th of neuron weighted input andAsk inclined It leads：

In formula, W is weight vector, and b is threshold value, and q is call duration time of the communication functions under different degree of parallelisms and data volume Sample size in data, 0 is positive integer；It is n-th_lLayer i-th of neuron weighted input and, whereinIndicate the weighted value between j-th of neuron of l-1 layers and l i-th of neuron of layer,For the forward direction output valve of l-1 layers of j-th of neuron；Indicate the forward direction output valve of l layers of i-th of neuron； 1≤i≤S_l；w_ij∈W；S_lIt is positive integer for l layers of neuron number；L is positive integer；S_l-1L-1 layers of neuron Number is positive integer；r_iFor actual value；y_iFor the output valve of neural network model；Residual error δ_i ^(l)Indicate l layers of neuron to final The influence that output valve generates；Indicate the activation primitive of neuron；

(3) work as l=n_l-1,n_l-2,n_lIn -3 ..., 2 when any layer, the residual computations method of l layers of i-th of node is

In formula, S_l+1It is positive integer for l+1 layers of neuron number；For l layers of weight vector；Residual errorIndicate l The influence that+1 layer neuron generates final output value；1≤j≤S_l；

(4) partial derivative of weights and threshold value update needs is calculated, i.e.,

In formula,For the forward direction output valve of l layers of j-th of neuron；b_i ^(l)It is the i-th of l layers of neural network model A neuron threshold value；

Input sample is had also appeared during adjustment parameter does not influence output, i.e., it is defeated which kind of sample no matter inputted It is the same to go out value.The reason is that since weight parameter W is excessive, therefore L2 regular terms is added behind error formula

Wherein, w indicates the parameter of regular terms, and λ is regularization coefficient, and q is sample size；L2 is Gaussian prior；To J (W, It b), will after seeking local derviationIt is changed to

When W updates, a positive number is really subtracted, weight is caused to decay.

The update of parameter W, b can be multiplied by learning rate η during the experiment, with facilitate adjust learning rate avoid it is excessive or It is less than normal.Factor of momentum ρ is introduced, weight vector W is updated and threshold value b, specific formula is expressed as：

In formula, η is learning rate；For j-th of neuron of updated l layers and l+1 i-th of neuron of layer Between weight vector；For update j-th of neuron of preceding l layers and l+1 i-th of neuron of layer between power to Amount；For the weight vector between j-th of neuron of l layers and l+1 i-th of neuron of layer.

Learning rate refer to parameter every time update walk how far, if fruit setting it is excessive, be easy to cross optimal solution or in optimal solution It nearby hovers, setting is too small and convergence time can be caused long.Dynamic changes learning rate and is necessary when therefore designing a model.

Other steps and parameter are identical as one of specific implementation mode one to four.

Specific implementation mode six：Unlike one of present embodiment and specific implementation mode one to five：The regular terms Coefficient lambda=0.005.

In our experimentations, λ=0.005 is taken to reach ideal experiment effect.

Other steps and parameter are identical as one of specific implementation mode one to five.

Specific implementation mode seven：Unlike one of present embodiment and specific implementation mode one to six：The ρ is set as 0.9。

The purpose being arranged in this way is to be accelerated using previous big proportion descent direction at decline initial stage, crossing function When local maximum, more new direction is substantially opposite twice in succession, and ρ can make update amplitude reduce at this time, cross paddy face.With Gradient constantly declines, and optimization process is easy to be absorbed in local minimum, and gradient is 0 at this time, and it is minimum that function face is jumped out in ρ helps at this time Point.Such as table 1, many experiments determine the structure of 7 layers of neural network, and each node layer number is determined according to Relu function characteristics. The structure of neural network model is described with following table.

1 neural network structure of table is illustrated

Other steps and parameter are identical as one of specific implementation mode one to six.

Beneficial effects of the present invention are verified using following embodiment：

Embodiment one：

What a kind of collective communication function modelling method of concurrent program of the present embodiment was specifically prepared according to the following steps：

Fitting result is quantified using related coefficient, and related coefficient is the statistical indicator of level of intimate between reflecting variable.Its Middle Pierre blocks inferior correlation coefficient r and describes linearly related degree of strength between two variables, and ranging from [- 1,1] of r works as r>0 When, two variables are positively related.The absolute value of r is bigger, and correlation is stronger.

2 related coefficient of table

Related coefficient can reflect correlativity between data, the precision measured with root-mean-square error (RMSE) quantization.

3 RMSE of table

The present invention can also have other various embodiments, without deviating from the spirit and substance of the present invention, this field Technical staff makes various corresponding change and deformations in accordance with the present invention, but these corresponding change and deformations should all belong to The protection domain of appended claims of the invention.

Claims

1. a kind of collective communication function modelling method of concurrent program, it is characterised in that：A kind of collective communication letter of concurrent program Counting modeling method detailed process is：

Step 1: the measuring assembly communication functions n times under experiment porch, obtain collective communication function in different degree of parallelism sum numbers According to the call duration time data under amount；

The N value ranges are 1000-10000；

Step 2: with the artificial neural network based on BP back-propagation algorithms to collective communication function in different degree of parallelism sum numbers It is fitted according to the call duration time data under amount, obtains the neural network model of corresponding communication functions；

In the step 2 with the artificial neural network based on BP back-propagation algorithms to collective communication function different parallel Call duration time data under degree and data volume are fitted, and obtain the neural network model of corresponding communication functions；Detailed process For：

Step 2 one, data prediction；

The selection of step 2 two, activation primitive；

Step 2 three, the update for carrying out weight and threshold value using BP back-propagation algorithms according to step 2 one and step 2 two；

Data prediction in the step 2 one；Detailed process is：

Call duration time data of the collective communication function under different degree of parallelisms and data volume are to change number under a certain degree of parallelism According to the call duration time of measurement either under some data volume, the call duration time of variation degree of parallelism measurement；

It is needed before call duration time data of the collective communication function under different degree of parallelisms and data volume are passed to network training First upset the call duration time data under different degree of parallelisms and data volume；

Call duration time data under different degree of parallelism and data volume are normalized, normalization formula is as follows：

Max, min respectively refer to maximum value, minimum value in certain one-dimensional data, and by normalization, the input of artificial neural network is special Ranging from (0,1) of sign；X ' is certain one-dimensional data after normalization, and x is certain one-dimensional data before normalization；

It is described certain it is one-dimensional be degree of parallelism, data volume or call duration time；

The selection of activation primitive in the step 2 two；Detailed process is：

Selection Relu activation primitives, Relu activation primitive forms are：

F (z)=max (0, z)

In formula, z is Relu activation primitive input values, and Relu activation primitive gradients are 1, and only one end is saturated；

Weight and threshold value are carried out more using BP back-propagation algorithms according to step 2 one and step 2 two in the step 2 three Newly；It is as follows：

(1) it usesWhich layer indicated, in the range of Indicate the final output layer of neural network model；Successively count It calculatesActivation value, finally obtain the output of neural network model as a result, being expressed as h_W,b(x),For positive integer；

(2) whenIt isIt is right when layerA certain neuron i in all neurons of layer, first according to error formulaObtain error；

Utilize error pairLayer i-th of neuron weighted input andSeek local derviation：

In formula, W is weight vector, and b is threshold value, and q is call duration time data of the communication functions under different degree of parallelisms and data volume In sample size, 0 is positive integer；It isLayer i-th of neuron weighted input and, wherein Indicate theJ-th of neuron of layer and theWeighted value between i-th of neuron of layer, It isThe forward direction output valve of j-th of neuron of layer；Indicate theThe forward direction output valve of i-th of neuron of layer；w_ij∈W；It isThe neuron number of layer is positive integer；For positive integer；TheThe neuron number of layer, For positive integer；r_iFor actual value；y_iFor the output valve of neural network model；Residual errorIt indicatesThe neuron of layer is to final output It is worth the influence generated；Indicate the activation primitive of neuron；

(3) whenWhen middle any layer, theThe residual computations method of i-th of node of layer is

In formula,It isThe neuron number of layer is positive integer；It isLayer weight vector；Residual errorIt indicatesLayer Neuron influence that final output value is generated；

In formula,It isThe forward direction output valve of j-th of neuron of layer；For neural network modelI-th of nerve of layer First threshold value；

L2 regular terms is added behind error formula

Wherein, w indicates that the parameter lambda of regular terms is regularization coefficient, and q is sample size；L2 is Gaussian prior；Inclined is asked to J (W, b) It, will after leadingIt is changed to

Factor of momentum ρ is introduced, weight vector W is updated and threshold value b, specific formula is expressed as：

In formula, η is learning rate；It is updatedJ-th of neuron of layer and theBetween i-th of neuron of layer Weight vector；To update preceding theJ-th of neuron of layer and theWeight vector between i-th of neuron of layer；It isJ-th of neuron of layer and theWeight vector between i-th of neuron of layer.

2. a kind of collective communication function modelling method of concurrent program according to claim 1, it is characterised in that：The canonical Term coefficient λ=0.005.

3. a kind of collective communication function modelling method of concurrent program according to claim 2, it is characterised in that：The ρ is set It is set to 0.9.