CN111835854A - Slow task prediction method based on grey prediction algorithm - Google Patents
Slow task prediction method based on grey prediction algorithm Download PDFInfo
- Publication number
- CN111835854A CN111835854A CN202010689837.7A CN202010689837A CN111835854A CN 111835854 A CN111835854 A CN 111835854A CN 202010689837 A CN202010689837 A CN 202010689837A CN 111835854 A CN111835854 A CN 111835854A
- Authority
- CN
- China
- Prior art keywords
- task
- slow
- tasks
- prediction
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 38
- 230000009466 transformation Effects 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 11
- 238000004458 analytical method Methods 0.000 claims description 7
- 238000013519 translation Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 5
- 238000009825 accumulation Methods 0.000 claims description 3
- 230000002087 whitening effect Effects 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 4
- 238000005457 optimization Methods 0.000 abstract description 2
- 230000000694 effects Effects 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000012821 model calculation Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
- H04L67/1023—Server selection for load balancing based on a hash applied to IP addresses or costs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1029—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers using data related to the state of servers by a load balancer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1031—Controlling of the operation of servers by a load balancer, e.g. adding or removing servers that serve requests
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a slow task prediction method based on a gray prediction algorithm, which comprises the following steps of: the method comprises the following steps: on a big data cluster, counting the total number of tasks operated and the number of slow tasks generated by each node in unit time; step two: and extracting the information of the slow tasks of the nodes, establishing a grey prediction model, and predicting the density of the slow tasks in a future time unit according to the collected data of a plurality of time units. The invention provides a slow task prediction method based on a grey prediction algorithm, aiming at the problem that slow tasks in a current big data platform influence cluster performance. Different from the conventional slow task identification technology with poor timeliness and accuracy, the method hopes to establish a differential equation model by using a gray prediction algorithm, analyze the operation rule of the cluster nodes, accurately predict the slow task condition of the cluster nodes in a period of time in the future and provide effective optimization suggestions for cluster users.
Description
Technical Field
The invention relates to the field of slow task analysis of a big data platform, in particular to a slow task prediction method based on a grey prediction algorithm.
Background
The development of information technology promotes the continuous progress of the era, the society gradually enters the big data processing era nowadays, and the nation also takes the big data technology as the core strategy for preempting the high point of economic technology development. Big data applications tend to run in a distributed fashion on large-scale clusters or cloud platforms. An application of large data is likely to consist of thousands of processes, and the size of a node can reach thousands. Therefore, the performance data of a large number of nodes must be aggregated for correlation analysis by deeply understanding the running state of the application.
Today's large data clusters often contain many heterogeneous nodes, and the clusters can process various tasks, and performance problems, namely slow task problems, are easy to occur in the process of program execution. The number of tasks executed in a cluster may be thousands of tasks in a certain time, but most of the tasks have quite different execution times, and for those tasks with execution times much higher than those of most of the tasks, we call slow tasks. In the conventional large data-related document, a slow task is defined as a task whose execution time is 1.5 times larger than the median of the execution times of all tasks in the same stage, and this definition is widely used by subsequent referents. The slow task occupies a large amount of cluster resources, and the existence of the slow task can significantly affect the working performance of the cluster.
The identification and prediction of the slow tasks can provide reference for cluster maintenance personnel, and a new application process is preferentially distributed to nodes with fewer slow tasks, so that the utilization rate of cluster resources can be effectively improved; meanwhile, according to the intensity of the slow tasks, the nodes with abnormal work can be found quickly, and the abnormal nodes are guided to be repaired.
The prediction is a science for researching prediction theory, method, evaluation and application. The basic theory of the thinking mode of comprehensive prediction mainly comprises an inertia principle, an analogy principle and a related principle. The core problem of prediction is the technical approach of prediction, or mathematical model of prediction. Different from most prediction methods, the gray prediction method uses a generated data sequence instead of an original data sequence, and uses a differential equation to fully mine the essence of the system, so that the precision is high.
The following problems mainly exist in the current slow task analysis algorithm:
the analysis algorithm is not high in timeliness, and data in cluster operation is used in a plurality of algorithms, so that the result can be calculated only after the cluster operates for a long time, and the slow task damages the performance of the cluster. In addition, the accuracy of the analysis algorithm cannot be guaranteed, and for some nodes with large slow task quantity fluctuation, the relation between the slow task quantity and the task quantity is not analyzed, so that the prediction effect is poor.
Disclosure of Invention
In order to solve the problems, the invention adopts a method for respectively counting the slow task quantity and the task quantity of each node to process data. And taking the ratio of the slow task quantity to the task quantity of each node as the predicted data quantity. Not only accurately expresses the intensity of the slow tasks, but also avoids the influence caused by the fluctuation of the number of the slow tasks in different time. And then, establishing a model by adopting a gray prediction method, calculating the internal relation between the time phase and the slow task condition, giving an equation, and finally calculating the predicted value of the slow task intensity degree in the future time period.
The invention provides a slow task prediction method based on a gray prediction algorithm, which comprises the following steps of:
the method comprises the following steps: on a big data cluster, counting the total number of tasks operated and the number of slow tasks generated by each node in unit time;
step two: extracting information of node slow tasks, establishing a grey prediction model, and predicting the density of the slow tasks in a future time unit according to the collected data of a plurality of time units;
the first step comprises the following steps:
step (1.1) counting task running information on a big data platform;
in the cluster, collecting the starting time and the ending time of each task and the node number where the task runs according to a certain time interval as a time unit;
step (1.2) calculating the execution time of each task, and counting the total number of tasks of each node;
and calculating the execution time of each task according to the starting time and the ending time. Counting the total task quantity of each node;
step (1.3) identifying the slow task number of the corresponding node according to a slow task judgment rule;
according to the general rules of slow task related documents, taking 1.5 times of the median of the time length of all tasks as a slow task judgment threshold, judging the task with the running time length exceeding the threshold as a slow task, and counting the number of the slow tasks of each node according to the slow task judgment threshold;
the second step comprises the following steps:
step (2.1) extracting slow task information;
in different time units, the use conditions of the clusters are different, the number of slow tasks is greatly changed, and the basic requirements of the gray prediction algorithm are difficult to meet due to excessive fluctuation of numerical values; in order to eliminate the influence of different use conditions and meet the condition of a gray prediction algorithm, the ratio of the number of slow tasks to the total number of tasks is adopted to participate in modeling;
step (2.2) transformation of input data;
the ratio of each item of the sequence to the previous item is called a level ratio, the level ratio of the gray prediction requirement sequence cannot be too large, and if the condition cannot be met, necessary transformation processing needs to be carried out on the original sequence;
step (2.3) establishing a GM (1,1) prediction model;
GM (1,1) represents that the model is a first-order differential equation and only contains a gray model with 1 variable, and the gray prediction is mainly characterized in that the model uses a generated data sequence instead of an original data sequence; in the modeling process, the original data is accumulated to generate, and modeling is carried out after an approximate exponential law is obtained.
Step (2.4) bringing the time point needing to be predicted into the database, and calculating a prediction result;
through modeling calculation, an equation meeting the original number sequence is finally obtained, and at the moment, the time points needing to be predicted are substituted, so that a predicted value can be calculated; the predicted value is still in an exponential growth form, which means that the difference item by item is needed to obtain the true prediction result.
Further, the step (1.3) is executed as follows:
and the substep (3-1) is ordered according to the calculated task time length. Counting the total number of tasks of all nodes, and obtaining a median of the task time length at a position of half of the total number of the tasks after sequencing;
the median of the task duration is multiplied by 1.5 to obtain a threshold value for judging the slow task in the substep (3-2);
and (3) judging the source data item by item again, if the task duration exceeds a threshold value, obtaining the slow task quantity of each node by the corresponding node slow task quantity +1 and so on.
Further, the step (2.1) extracts the discussion of slow task information:
the prediction of the number of slow tasks is the final goal of the algorithm, but is difficult to directly implement, or the prediction effect is poor. Since the use of clusters varies widely at different times, the number of slow tasks varies widely and there is no apparent regularity.
To predict the density of slow tasks more accurately, additional data volumes must be established. The mean value and the variance of the task duration are good quantitative indexes, but cannot be obtained at the initial stage of a unit time interval, and the timeliness of the prediction algorithm is greatly influenced. A method of describing the intensity of slow tasks by the ratio of the number of slow tasks to the number of tasks is used. Specifically, the slow task number and the task number of each node are respectively counted, and the ratio of the slow task number and the task number of each node is calculated. This ratio is used as input data for the prediction model.
Further, the execution conditions required to satisfy the gray prediction in step (2.2) are as follows:
let reference data be x(0)=(x(0)(1),x(0)(2),…,x(0)(n)), calculating a rank ratio of the sequences:
if all the step ratios λ (k) fall within the tolerable coverageInner, then sequence x(0)Can be used asData from model GM (1,1) were grey predicted. Otherwise, it needs to be aligned with sequence x(0)The necessary transformation is done so that it falls within the containment coverage. Namely, taking a proper constant c to perform translation transformation:
y(0)(k)=x(0)(k)+c,k=1,2,…,n,
let sequence y(0)=(y(0)(1),y(0)(2),…,y(0)(n)) step ratio:
further, in the step (2.3), the grey prediction model modeling calculation process is as follows:
(2.3.1) analyzing the level ratio of the input data, and if the level ratio meets the condition of a gray prediction algorithm, directly modeling; if the gray prediction algorithm condition is not met, translation transformation is carried out, and the level ratio is enabled to fall within the acceptable coverage;
(2.3.2) obtaining a number sequence of an approximate exponential law through accumulation operation;
(2.3.3) calculating a symbolic solution of a differential equation through the fitting parameters;
(2.3.4) solving a sequence of predicted values, wherein the sequence of predicted values comprises the predicted values of a known time period and a future time period, and the predicted values at the time are also a sequence of exponential law;
and (2.3.5) calculating a final result, namely a slow task prediction value in a future time period through differential operation.
Preferably, the prediction method of the gray model GM (1,1) in step (2.3) is:
known reference data column x(0)=(x(0)(1),x(0)(2),…,x(0)(n)), accumulating 1 time to generate a sequence:
x(1)=(x(1)(1),x(1)(2),…,x(1)(n))
=(x(0)(1),x(0)(1)+x(0)(2),…,x(0)(1)+…+x(0)(n)),
z(1)=(z(1)(2),z(1)(3),…,z(1)(n)),
in the formula: z is a radical of(1)(k)=0.5x(1)(k)+0.5x(1)(k-1),k=2,3,…,n。
Establishing a gray differential equation:
x(0)(k)+az(1)(k)=b,k=2,3,…,n,
the corresponding whitening differential equation is:
for convenience of presentation, let u ═ a, b]TIn each case, Y is ═ x(0)(2),x(0)(3),…,x(0)(n)]TMemory for recordingThen, by the least square method, the calculation is made such that j (u) ═ Y-BuTThe estimate of u for which (Y-Bu) reaches a minimum is:
then, solving the equation to obtain a predicted value:
Further, the method for counting task information on a big data platform in the step (1.1) comprises the following steps:
the task information mainly comes from log analysis, and a log file generated in the operation process of the big data platform records the execution condition of each task of each node, such as the log files of Hadoop and Spark platforms. The corresponding relation among the nodes, the tasks and the task duration can be obtained by analyzing the log file.
Further, the function of the ratio of the number of slow tasks to the number of tasks in step (2.1):
in order to improve the accuracy and the practicability, the data volume of the ratio of the number of slow tasks to the number of tasks is established to participate in model calculation. Although the data amount cannot directly predict the number of slow tasks, the data amount can better reflect the density of the slow tasks. If the predicted value of the data volume is larger, the performance of the corresponding node is reflected to be poorer, and as a user, fewer program processes are required to be distributed to the node, more processes are required to be distributed to the node with the lower predicted value, and finally the function of improving the cluster performance is achieved.
Has the advantages that:
in a large data platform, clusters often need to process various processes, the performance of nodes is also different, and in some heterogeneous clusters, the performance of the nodes is more obviously different. The generation of slow tasks is always infinite under the influence of various factors, but the operation state of a certain node at different times is always relatively stable. According to the rule, the invention adopts a grey prediction algorithm, establishes a differential equation model for the slow task generation of each node, analyzes historical data, and finally can predict the slow task condition of the node in a period of time in the future. The prediction result of the invention can give guidance suggestion for the optimization of the cluster, more processes are allocated to the nodes with sparse slow task prediction, and less processes are allocated to the nodes with dense slow task prediction. Thereby effectively improving the working performance of the cluster.
Drawings
FIG. 1 is a block diagram of a system for implementing a slow task prediction method based on a gray prediction algorithm in accordance with the present invention;
FIG. 2 is a flow chart of a slow task prediction method based on a gray prediction algorithm according to the present invention;
FIG. 3 is a flow chart of the gray prediction model modeling calculation of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The basic idea of the invention is to analyze the slow task condition by analyzing the log of the big data platform, extract the data characteristics according to the long-time slow task occurrence condition, and inject the data into the grey prediction model to predict the slow task intensity in a period of time in the future, thereby providing the process distribution suggestion for the cluster manager and finally achieving the goal of improving the cluster working performance.
FIG. 1 is a schematic diagram of a system architecture for implementing the slow task prediction method based on the gray prediction algorithm of the present invention. The large data platform such as Spark and Hadoop is composed of a large number of nodes, and logs are generated in the running process to record the running process of the program. The invention obtains the log information from the big data platform for analysis, and the log analyzer has the main functions of counting the starting time and the ending time of each task of each node and then transmitting the task information to the slow task analyzer. The slow task analyzer is responsible for calculating a slow task threshold value, counting the task number of each node and the slow task number, and calculating the ratio of the slow task number to the task number. The grey prediction model calculation module firstly processes information of a slow task to enable the data volume to meet the requirements of the grey prediction model, and particularly reduces the level ratio of the data volume sequence. And then modeling calculation is carried out, a time period ID to be predicted is brought in, and a prediction result is calculated.
FIG. 2 is a flowchart of the slow task prediction method based on the gray prediction algorithm of the present invention, and the detailed flow includes the following steps:
the method comprises the following steps: on a big data cluster, counting the total number of tasks operated and the number of slow tasks generated by each node in unit time;
step two: extracting information of node slow tasks, establishing a grey prediction model, and predicting the density of the slow tasks in a future time unit according to the collected data of a plurality of time units;
the first step comprises the following steps:
step (1.1) counting task running information on a big data platform;
in the cluster, collecting the starting time and the ending time of each task and the node number where the task runs according to a certain time interval as a time unit;
step (1.2) calculating the execution time of each task, and counting the total number of tasks of each node;
and calculating the execution time of each task according to the starting time and the ending time. Counting the total task quantity of each node;
step (1.3) identifying the slow task number of the corresponding node according to a slow task judgment rule;
and according to the general rules of the slow task related documents, taking 1.5 times of the median of the time length of all the tasks as a slow task judgment threshold, judging the task with the running time length exceeding the threshold as a slow task, and counting the number of the slow tasks of each node according to the result. The method comprises the following specific steps:
and the substep (3-1) is ordered according to the calculated task time length. And counting the total number of the tasks of all the nodes, and obtaining the median of the task time at the half position of the total number of the tasks after the task time is sequenced.
And (5) multiplying the median of the task time length by 1.5 to obtain a threshold value for judging the slow task in the substep (3-2).
And (3) judging the source data item by item again, if the task duration exceeds a threshold value, obtaining the slow task quantity of each node by the corresponding node slow task quantity +1 and so on.
The second step comprises the following steps:
step (2.1) extracting slow task information;
in different time units, the use conditions of the clusters are different, the number of slow tasks is greatly changed, and the basic requirements of the gray prediction algorithm are difficult to meet due to the excessive fluctuation of numerical values. In order to eliminate the effect of different use cases and meet the condition of the gray prediction algorithm, the ratio of the number of slow tasks to the total number of tasks is adopted to participate in modeling.
The prediction of the number of slow tasks is the final goal of the algorithm, but is difficult to directly implement, or the prediction effect is poor. Since the use of clusters varies widely at different times, the number of slow tasks varies widely and there is no apparent regularity.
To predict the density of slow tasks more accurately, additional data volumes must be established. The mean value and the variance of the task duration are good quantitative indexes, but cannot be obtained at the initial stage of a unit time interval, and the timeliness of the prediction algorithm is greatly influenced. A method of describing the intensity of slow tasks by the ratio of the number of slow tasks to the number of tasks is used. Specifically, the slow task number and the task number of each node are respectively counted, and the ratio of the slow task number and the task number of each node is calculated. This ratio is used as input data for the prediction model.
Step (2.2) transformation of input data;
the ratio of each term of the sequence to the previous term is called a level ratio, and gray prediction requires that the level ratio of the sequence cannot be too large, and if the condition cannot be met, necessary transformation processing needs to be performed on the original sequence. The execution conditions that need to satisfy the gray prediction are:
let reference data be x(0)=(x(0)(1),x(0)(2),…,x(0)(n)), calculating a rank ratio of the sequences:
if all the step ratios λ (k) fall within the tolerable coverageInternal, then sequencex(0)Gray prediction can be performed as data of the model GM (1, 1). Otherwise, it needs to be aligned with sequence x(0)The necessary transformation is done so that it falls within the containment coverage. Namely, taking a proper constant c to perform translation transformation:
y(0)(k)=x(0)(k)+c,k=1,2,…,n,
let sequence y(0)=(y(0)(1),y(0)(2),…,y(0)(n)) step ratio:
step (2.3) establishing a GM (1,1) prediction model;
GM (1,1) represents a gray model which is a first-order differential equation and only contains 1 variable, and the main characteristic of gray prediction is that the model uses a generated data sequence instead of an original data sequence. In the modeling process, the original data is accumulated to generate, and modeling is carried out after an approximate exponential law is obtained. The prediction method of the gray model GM (1,1) is as follows:
known reference data column x(0)=(x(0)(1),x(0)(2),…,x(0)(n)), accumulating 1 time to generate a sequence:
x(1)=(x(1)(1),x(1)(2),…,x(1)(n))
=(x(0)(1),x(0)(1)+x(0)(2),…,x(0)(1)+…+x(0)(n)),
z(1)=(z(1)(2),z(1)(3),…,z(1)(n)),
in the formula: z is a radical of(1)(k)=0.5x(1)(k)+0.5x(1)(k-1),k=2,3,…,n。
Establishing a gray differential equation:
x(0)(k)+az(1)(k)=b,k=2,3,…,n,
the corresponding whitening differential equation is:
for convenience of presentation, let u ═ a, b]TIn each case, Y is ═ x(0)(2),x(0)(3),…,x(0)(n)]TMemory for recordingThen, by the least square method, the calculation is made such that j (u) ═ Y-BuTThe estimate of u for which (Y-Bu) reaches a minimum is:
then, solving the equation to obtain a predicted value:
Step (2.4) bringing the time point needing to be predicted into the database, and calculating a prediction result;
and finally obtaining an equation meeting the original number sequence through modeling calculation, and substituting the time points needing to be predicted at the moment to calculate the predicted value. The predicted value is still in an exponential growth form, which means that the difference item by item is needed to obtain the true prediction result.
FIG. 3 is a flow chart of the gray prediction model modeling calculation of the present invention, the detailed flow includes the steps of:
(1) analyzing the level ratio of input data, and if the level ratio meets the condition of a gray prediction algorithm, directly modeling; if the gray prediction algorithm condition is not met, translation transformation is carried out, and the level ratio is enabled to fall within the acceptable coverage.
(2) And (4) accumulation operation, aiming at obtaining the number sequence of approximate exponential law.
(3) And fitting the parameters and calculating a symbolic solution of the differential equation.
(4) And solving a sequence of predicted values, wherein the sequence of predicted values comprises the predicted values of the known time period and the future time period, and the predicted values at the moment are also a sequence of exponential rules.
(5) And (4) performing difference operation to calculate a final result, namely a slow task predicted value in a future time period.
The invention has not been described in detail and is within the skill of the art.
The above description is only a part of the embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.
Claims (8)
1. A slow task prediction method based on a gray prediction algorithm is characterized by comprising the following steps:
the method comprises the following steps: on a big data cluster, counting the total number of tasks operated and the number of slow tasks generated by each node in unit time;
step two: extracting information of node slow tasks, establishing a grey prediction model, and predicting the density of the slow tasks in a future time unit according to collected data of a plurality of time units;
the first step comprises the following steps:
step (1.1) counting task running information on a big data platform;
in the cluster, collecting the starting time and the ending time of each task and the node number where the task runs according to a preset time interval as a time unit;
step (1.2) calculating the execution time of each task, and counting the total number of tasks of each node;
calculating the execution time of each task according to the starting time and the ending time, and counting the total task quantity of each node;
step (1.3) identifying the slow task number of the corresponding node according to a slow task judgment rule;
taking 1.5 times of the median of the time length of all the tasks as a slow task judgment threshold, judging the task with the running time length exceeding the threshold as a slow task, and counting the number of the slow tasks of each node according to the slow task judgment threshold;
the second step comprises the following steps:
step (2.1) extracting slow task information;
in different time units, the use conditions of the clusters are different, and the ratio of the slow task quantity to the total task quantity is adopted to participate in modeling;
step (2.2) transformation of input data;
the ratio of each term of the series to the preceding term is called the rank ratio, and grey prediction requires that the rank ratio of the series fall within the acceptable coverageIn the method, n is the total amount of data, and if the condition cannot be met, the original number sequence needs to be subjected to necessary transformation processing;
step (2.3) establishing a grey prediction model;
GM (1,1) represents that the model is a first-order differential equation and only contains a gray model with 1 variable, the model is not an original data sequence but a generated data sequence when the gray model is predicted, and in the modeling process, the original data is accumulated to generate, and then modeling is carried out after an approximate exponential law is obtained;
step (2.4) bringing the time point needing to be predicted into the database, and calculating a prediction result;
and finally obtaining an equation meeting the original number sequence through modeling calculation, substituting the time points needing to be predicted at the moment to calculate a predicted value, wherein the predicted value is still in an exponential growth form, and obtaining a real prediction result by making a difference item by item.
2. The slow-task prediction method based on gray prediction algorithm according to claim 1, characterized in that:
the step (1.3) is executed by the following specific steps:
the substep (3-1) is to sort according to the calculated task duration, count the total number of tasks of all nodes, the task duration after sorting obtains the median of the task duration in half of the total number of tasks;
the median of the task duration is multiplied by 1.5 to obtain a threshold value for judging the slow task in the substep (3-2);
and the substep (3-2) judges the counted data item by item again, if the task duration exceeds a threshold value, the corresponding node slow task number is +1, and so on, so as to obtain the slow task number of each node.
3. The slow-task prediction method based on gray prediction algorithm according to claim 1, characterized in that:
the step (2.1) of extracting slow task information specifically comprises the following steps:
and respectively counting the slow task quantity and the task quantity of each node, calculating the ratio of the slow task quantity and the task quantity of each node, and taking the ratio as input data of a prediction model.
4. The slow-task prediction method based on gray prediction algorithm according to claim 1, characterized in that:
the execution conditions required to satisfy the gray prediction in the step (2.2) are as follows:
let reference data be x(0)=(x(0)(1),x(0)(2),…,x(0)(n)), calculating a rank ratio of the sequences:
x(0)(k) represents the kth data, and n represents the total amount of data;
if all the step ratios λ (k) fall within the tolerable coverageInner, then sequence x(0)Gray prediction can be performed as data of a model GM (1, 1); otherwise, it needs to be aligned with sequence x(0)And (3) performing transformation processing to enable the constant to fall into the allowable coverage, namely taking a constant c to perform translation transformation:
y(0)(k)=x(0)(k)+c,k=1,2,…,n,
let sequence y(0)=(y(0)(1),y(0)(2),…,y(0)(n)) step ratio:
5. the slow-task prediction method based on gray prediction algorithm according to claim 1, characterized in that:
the prediction method of the gray model GM (1,1) in the step (2.3) comprises the following steps:
known reference data column x(0)=(x(0)(1),x(0)(2),…,x(0)(n)), accumulating 1 time to generate a sequence:
x(1)=(x(1)(1),x(1)(2),…,x(1)(n))
=(x(0)(1),x(0)(1)+x(0)(2),…,x(0)(1)+…+x(0)(n)),
z(1)=(z(1)(2),z(1)(3),…,z(1)(n)),
in the formula: z is a radical of(1)(k)=0.5x(1)(k)+0.5x(1)(k-1),k=2,3,…,n;
Establishing a gray differential equation as follows, wherein a and b are undetermined coefficients and are calculated by substituting data;
x(0)(k)+az(1)(k)=b,k=2,3,…,n,
the corresponding whitening differential equation is:
for convenience of illustration, let u ═ a, b]TIn each case, Y is ═ x(0)(2),x(0)(3),…,x(0)(n)]TMemory for recordingThen, by the least square method, the calculation is made such that j (u) ═ Y-BuTThe estimate of u for which (Y-Bu) reaches a minimum is:
then, solving the equation to obtain a predicted value:
6. The slow-task prediction method based on gray prediction algorithm according to claim 1, characterized in that: the method for counting the task running information on the big data platform in the step (1.1) specifically comprises the following steps:
the task information is from log analysis, and a log file generated by a big data platform in the running process records the execution condition of each task of each node, including log files of Hadoop and Spark platforms; the corresponding relation among the nodes, the tasks and the task duration can be obtained by analyzing the log file.
7. The slow-task prediction method based on gray prediction algorithm according to claim 1, characterized in that:
the ratio of the number of slow tasks to the number of tasks in the step (2.1) can reflect the density of the slow tasks, and the larger the predicted value of the data volume is, the worse the performance of the corresponding node is reflected, so that the user can allocate fewer program processes to the node, allocate more processes to the node with the predicted value lower than the node, and finally play a role in improving the cluster performance.
8. The slow-task prediction method based on gray prediction algorithm according to claim 1, characterized in that: in the step (2.3), the grey prediction model modeling calculation process is as follows:
(2.3.1) analyzing the level ratio of the input data, and if the level ratio meets the condition of a gray prediction algorithm, directly modeling; if the gray prediction algorithm condition is not met, translation transformation is carried out, and the level ratio is enabled to fall within the acceptable coverage;
(2.3.2) obtaining a number sequence of an approximate exponential law through accumulation operation;
(2.3.3) calculating a symbolic solution of a differential equation through the fitting parameters;
(2.3.4) solving a sequence of predicted values, wherein the sequence of predicted values comprises the predicted values of a known time period and a future time period, and the predicted values at the time are also a sequence of exponential law;
and (2.3.5) calculating a final result, namely a slow task prediction value in a future time period through differential operation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010689837.7A CN111835854B (en) | 2020-07-17 | 2020-07-17 | Slow task prediction method based on grey prediction algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010689837.7A CN111835854B (en) | 2020-07-17 | 2020-07-17 | Slow task prediction method based on grey prediction algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111835854A true CN111835854A (en) | 2020-10-27 |
CN111835854B CN111835854B (en) | 2021-08-10 |
Family
ID=72924268
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010689837.7A Expired - Fee Related CN111835854B (en) | 2020-07-17 | 2020-07-17 | Slow task prediction method based on grey prediction algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111835854B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101576443A (en) * | 2009-06-16 | 2009-11-11 | 北京航空航天大学 | Life prediction method of accelerated life test based on grey RBF neural network |
CN102609303A (en) * | 2012-01-18 | 2012-07-25 | 华为技术有限公司 | Slow-task dispatching method and slow-task dispatching device of Map Reduce system |
CN107194826A (en) * | 2017-06-16 | 2017-09-22 | 北京航空航天大学 | A kind of manufacture system Gernral Check-up and Forecasting Methodology based on quality state Task Network |
CN108153587A (en) * | 2017-12-26 | 2018-06-12 | 北京航空航天大学 | A kind of slow task reason detection method for big data platform |
CN108508863A (en) * | 2017-10-12 | 2018-09-07 | 上海智容睿盛智能科技有限公司 | A kind of electromechanical equipment method for diagnosing faults based on gray model |
CN109358306A (en) * | 2018-10-18 | 2019-02-19 | 国网天津市电力公司电力科学研究院 | One kind being based on the intelligent electric energy meter health degree trend forecasting method of GM (1,1) |
US10607155B2 (en) * | 2017-03-30 | 2020-03-31 | Intel Corporation | Diagnosing slow tasks in distributed computing |
-
2020
- 2020-07-17 CN CN202010689837.7A patent/CN111835854B/en not_active Expired - Fee Related
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101576443A (en) * | 2009-06-16 | 2009-11-11 | 北京航空航天大学 | Life prediction method of accelerated life test based on grey RBF neural network |
CN102609303A (en) * | 2012-01-18 | 2012-07-25 | 华为技术有限公司 | Slow-task dispatching method and slow-task dispatching device of Map Reduce system |
US10607155B2 (en) * | 2017-03-30 | 2020-03-31 | Intel Corporation | Diagnosing slow tasks in distributed computing |
CN107194826A (en) * | 2017-06-16 | 2017-09-22 | 北京航空航天大学 | A kind of manufacture system Gernral Check-up and Forecasting Methodology based on quality state Task Network |
CN108508863A (en) * | 2017-10-12 | 2018-09-07 | 上海智容睿盛智能科技有限公司 | A kind of electromechanical equipment method for diagnosing faults based on gray model |
CN108153587A (en) * | 2017-12-26 | 2018-06-12 | 北京航空航天大学 | A kind of slow task reason detection method for big data platform |
CN109358306A (en) * | 2018-10-18 | 2019-02-19 | 国网天津市电力公司电力科学研究院 | One kind being based on the intelligent electric energy meter health degree trend forecasting method of GM (1,1) |
Non-Patent Citations (3)
Title |
---|
NJ YADWADKAR等: "Multi-task learning for straggler avoiding predictive job scheduling", 《JOURNAL OF MACHINE LEARNING RESEARCH》 * |
崔云飞等: "基于节点识别的慢任务调度算法", 《通信学报》 * |
张克毅: "基于灰色***理论的推荐算法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Also Published As
Publication number | Publication date |
---|---|
CN111835854B (en) | 2021-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107122594B (en) | New energy vehicle battery health prediction method and system | |
CN110149237B (en) | Hadoop platform computing node load prediction method | |
CN110533112B (en) | Internet of vehicles big data cross-domain analysis and fusion method | |
KR20160019897A (en) | Fast grouping of time series | |
CN107025228B (en) | Question recommendation method and equipment | |
CN111130890A (en) | Network flow dynamic prediction system | |
CN116739742A (en) | Monitoring method, device, equipment and storage medium of credit wind control model | |
CN111835854B (en) | Slow task prediction method based on grey prediction algorithm | |
CN110196797B (en) | Automatic optimization method and system suitable for credit scoring card system | |
CN113726558A (en) | Network equipment flow prediction system based on random forest algorithm | |
CN116226468B (en) | Service data storage management method based on gridding terminal | |
CN110855519A (en) | Network flow prediction method | |
CN114518988B (en) | Resource capacity system, control method thereof, and computer-readable storage medium | |
CN116257336A (en) | Operator intelligent parallelization stream processing method and device under fluctuation data stream scene | |
CN106528849B (en) | Complete history record-oriented graph query overhead method | |
CN115310366A (en) | Method for evaluating airport publishing capacity based on random optimization model | |
CN113515560A (en) | Vehicle fault analysis method and device, electronic equipment and storage medium | |
CN113868597A (en) | Regression fairness measurement method for age estimation | |
CN113807587A (en) | Integral early warning method and system based on multi-ladder-core deep neural network model | |
CN118069380B (en) | Computing power resource processing method | |
CN117112871B (en) | Data real-time efficient fusion processing method based on FCM clustering algorithm model | |
CN114490626B (en) | Financial information analysis method and system based on parallel computing | |
CN117318053B (en) | Energy demand prediction method and system for energy storage power station | |
Кузьміних et al. | The influence of current results in a eventoriented data collection system | |
CN118193473A (en) | User resource sharing management method based on meta universe |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210810 |