CN104765648B - The problem of one kind is based on real time computation system nodal test method and device - Google Patents

The problem of one kind is based on real time computation system nodal test method and device Download PDF

Info

Publication number
CN104765648B
CN104765648B CN201510218215.5A CN201510218215A CN104765648B CN 104765648 B CN104765648 B CN 104765648B CN 201510218215 A CN201510218215 A CN 201510218215A CN 104765648 B CN104765648 B CN 104765648B
Authority
CN
China
Prior art keywords
node
task
calculating task
trouble
section point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510218215.5A
Other languages
Chinese (zh)
Other versions
CN104765648A (en
Inventor
叶炜晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201510218215.5A priority Critical patent/CN104765648B/en
Publication of CN104765648A publication Critical patent/CN104765648A/en
Application granted granted Critical
Publication of CN104765648B publication Critical patent/CN104765648B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The problem of being based on real time computation system the embodiment of the invention discloses one kind nodal test method and device.The problem of one kind is based on real time computation system nodal test method, comprises the following steps:When meeting default nodal test condition, each in running order node is detected respectively:For current first node to be detected, in the calculating task that the first node currently performs, it is determined that the target calculating task for nodal test, and generate the copy task of the target calculating task;The section point that there is currently idling-resource is searched, the copy task is sent to the section point;The duration T1 of the target calculating task is completed according to the first node and the section point completes the duration T2 of the copy task, determines whether the first node is trouble node.The technical scheme provided using the embodiment of the present invention, improves the computation rate of whole system, accordingly improves the computational efficiency of system.

Description

The problem of one kind is based on real time computation system nodal test method and device
Technical field
The present invention relates to field of computer technology, more particularly to a kind of the problem of being based on real time computation system nodal test side Method and device.
Background technology
Calculate in real time, also referred to as streaming computing, quick processing in real time can be carried out to data stream.Real time computation system is one Kind distributed computing system, is widely used in the fields such as data mining and the data analysis of Internet firm.Main flow is opened at present Source real time computation system, including Storm and Spark Streaming, all it is hypotactic Computational frame, as shown in figure 1, should The cluster that real time computation system is made up of more machine nodes, including host node and from node, every can be with from node One or more computing resources are provided, each computing resource can handle a calculating task.In actual applications, exist so Situation, some is in working condition from all computing resources of node, and some is in work from the part computing resource of node Make state, part computing resource is in idle condition, and some is in idle condition from all computing resources of node.
In the prior art for the detection of trouble node, both in the inspection for the node that can not perform calculating task completely Survey, it is so-called to perform calculating task completely, refer to certain node because catastrophe failure, as offline or calculation procedure is surprisingly closed Close, it is impossible to continue executing with calculating task.In this case, whole real time computation system all can not normal work.
In real time computation system, the detection to above mentioned problem node is important, but in actual applications, calculates in real time The hardware performance and load condition of each node in system there may be certain difference, and poor-performing or load pressure are larger The computation rate of node can be relatively low, such trouble node can be referred to as slow nodes, if existed in system such Node, wooden barrel short -board effect can be triggered and tie down the computation rate of the cluster of whole system.
For example a real time computation system is the cluster set up before 2 years, the model of the cluster interior joint is A, Current is dilatation cluster, and the model of newly-increased node is B, and the various performance parameters of the node of Type B number are superior to the section of A models Point, can be due to wooden barrel short -board effect so that the calculating of this mashed up cluster if performing calculating task using the node of A models Speed is also comparable to the computation rate of the node of A models.
For another example, there is unstable state in some in running order node in a real time computation system, as CPU is super negative Carry, network interface card is made, disk I/O failure etc., the computation rate of the node have dropped, the computation rate of whole system as this One machine node and reduce.
And existing this trouble node detection method is simply to the detection for the node that can not perform calculating task completely, nothing Method determines the relatively low node of computation rate, i.e. slow nodes so that slow nodes occurs in the cluster of real time computation system When, the computation rate of whole system is reduced, influences computational efficiency.
The content of the invention
To solve the above problems, the problem of being based on real time computation system the embodiment of the invention discloses one kind nodal test side Method and device.Technical scheme is as follows:
The problem of one kind is based on real time computation system nodal test method, including:
When meeting default nodal test condition, each in running order node is detected respectively:
For current first node to be detected, in the calculating task that the first node currently performs, it is determined that being used for The target calculating task of nodal test, and generate the copy task of the target calculating task;
The section point that there is currently idling-resource is searched, the copy task is sent to the section point;
The duration T1 of the target calculating task is completed according to the first node and the section point completes the pair The duration T2 of this task, determine whether the first node is trouble node.
In a kind of embodiment of the present invention, methods described also includes:
In the case of it is determined that the first node is trouble node, the identification information of the first node is added to and asked Inscribe in node listing.
In a kind of embodiment of the present invention, the lookup there is currently the section point of idling-resource, including:
According to described problem node listing, the node being not recorded in the trouble node list is searched;
In the node found there is the node of idling-resource as section point in selection.
In a kind of embodiment of the present invention, in addition to:
According to described problem node listing, the node being not recorded in the trouble node list is searched;
In the node found there is the node of idling-resource as the 3rd node in selection;
The target calculating task is scheduled to the 3rd node.
It is described that the target calculating task is completed according to the first node in a kind of embodiment of the present invention Duration T1 and the section point complete the duration T2 of the copy task, determine whether the first node is problem section Point, including:
Calculate duration T1 and the section point completion pair that the first node completes the target calculating task The absolute value of the duration T2 of this task difference;
If the absolute value is more than default threshold value, and T1 > T2, it is determined that the first node is trouble node.
In a kind of embodiment of the present invention, in addition to:
If the absolute value is more than default threshold value, and T1 < T2, it is determined that the section point is trouble node.
In a kind of embodiment of the present invention, in the calculating task currently performed in the first node, It is determined that the target calculating task for nodal test, including:
Determine the target calculating task for nodal test at random in the calculating task that the first node currently performs;
Or
It is excellent in the calculating task that the first node is currently performed according to the calculating task precedence information being obtained ahead of time First level highest calculating task, it is defined as the target calculating task for nodal test.
In a kind of embodiment of the present invention, in addition to:
In the case of it is determined that the first node is trouble node, the warning information of trouble node, the alarm are exported The identification information of the first node is carried in information.
The problem of one kind is based on real time computation system nodal test device, including:
Testing conditions judge module, for judging currently whether meet default nodal test condition, if it is, triggering Goal task determining module;
The goal task determining module, it is current in the first node for for current first node to be detected In the calculating task of execution, it is determined that the target calculating task for nodal test;
Copy task generation module, for generating the copy task of the target calculating task;
Section point searching modul, the section point of idling-resource is there is currently for searching;
Copy task sending module, for the copy task to be sent into the section point;
Trouble node determining module, for completing duration T1 and the institute of the target calculating task according to the first node The duration T2 that section point completes the copy task is stated, determines whether the first node is trouble node.
In a kind of embodiment of the present invention, in addition to:
Trouble node information add module, described in the case of it is determined that the first node is trouble node, inciting somebody to action The identification information of first node is added in trouble node list.
In a kind of embodiment of the present invention, the section point searching modul, it is specifically used for:
According to described problem node listing, the node being not recorded in the trouble node list is searched;
In the node found there is the node of idling-resource as section point in selection.
In a kind of embodiment of the present invention, in addition to:
3rd node selecting module, for being not recorded in the trouble node list according to described problem node listing, lookup In node, in the node found selection the node of idling-resource be present as the 3rd node;
Calculating task scheduler module, for the target calculating task to be scheduled into the 3rd node.
In a kind of embodiment of the present invention, described problem node determining module, including:
Absolute value calculating sub module, duration T1 and the institute of the target calculating task are completed for calculating the first node State the absolute value of the duration T2 of section point completion copy task difference;
Trouble node determination sub-module, for being more than default threshold value in the absolute value, and in the case of T1 > T2, really The fixed first node is trouble node.
In a kind of embodiment of the present invention,
Described problem node determination sub-module, it is additionally operable to be more than default threshold value, and T1 < T2 feelings in the absolute value Under condition, it is trouble node to determine the section point.
In a kind of embodiment of the present invention, the goal task determining module, it is specifically used for:
Determine the target calculating task for nodal test at random in the calculating task that the first node currently performs;
Or
It is excellent in the calculating task that the first node is currently performed according to the calculating task precedence information being obtained ahead of time First level highest calculating task, it is defined as the target calculating task for nodal test.
In a kind of embodiment of the present invention, in addition to:
Warning information output module, in the case of it is determined that the first node is trouble node, exporting problem section The warning information of point, the identification information of the first node is carried in the warning information.
The technical scheme provided using the embodiment of the present invention, by generating target calculating task corresponding to first node Copy task, and the duration of the target calculating task completed according to first node and section point complete the copy task when It is long, determine whether first node is trouble node.Because the copy task dispatching of target calculating task and the target calculating task is same In same calculating task, under normal circumstances, the duration difference that different nodes complete same calculating task is not too large, if difference It is larger, then it is probably that a certain node there is a problem, determines therefrom that the node that goes wrong so that real time computation system can be right in time Trouble node is handled, and improves the computation rate of whole system, accordingly improves the computational efficiency of system.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is the structural representation for implementing computing system in the embodiment of the present invention;
Fig. 2 be the embodiment of the present invention in it is a kind of based on real time computation system the problem of nodal test method implementing procedure Figure;
Fig. 3 is data flow process schematic diagram in the embodiment of the present invention;
Fig. 4 is interior joint detection process schematic diagram of the embodiment of the present invention;
Fig. 5 be the embodiment of the present invention in it is a kind of based on real time computation system the problem of nodal test device structural representation Figure.
Embodiment
The one kind provided first the embodiment of the present invention is based on the problem of real time computation system nodal test method and carried out Illustrate, this method may comprise steps of:
When meeting default nodal test condition, each in running order node is detected respectively:
For current first node to be detected, in the calculating task that the first node currently performs, it is determined that being used for The target calculating task of nodal test, and generate the copy task of the target calculating task;
The section point that there is currently idling-resource is searched, the copy task is sent to the section point;
The duration T1 of the target calculating task is completed according to the first node and the section point completes the pair The duration T2 of this task, determine whether the first node is trouble node.
The executive agent of above-mentioned steps is the control node in real time computation system.In hypotactic real time computation system In, control node is the host node in the system, and the other machines node in the system is can will count from node, host node Calculation task is distributed to respectively appoints from node, to being monitored from the running status of node, to going wrong from the calculating in node Business is scheduled.
Control node is examined to each in running order node respectively when meeting default nodal test condition Survey, each in running order node is node to be detected.For current first node to be detected, in the first node In the calculating task currently performed, it is determined that the target calculating task for nodal test, and generate the pair of the target calculating task This task, target calculating task and copy task are equal to same calculating task.Search the second section that there is currently idling-resource Point, and the copy task is sent to section point.So, first node and section point can enter to same calculating task Row processing, can obtain first node complete the duration of the target calculating task and section point complete the copy task when It is long, and according to the magnitude relationship of obtained duration, determine whether first node is trouble node.Control node is in work to each Make state working node carry out trouble node detection after, you can the problem of determining in the presence of system node, and in time Trouble node is handled accordingly.
The technical scheme provided using the embodiment of the present invention, by generating target calculating task corresponding to first node Copy task, and the duration of the target calculating task completed according to first node and section point complete the copy task when It is long, determine whether first node is trouble node.Because the copy task dispatching of target calculating task and the target calculating task is same In same calculating task, under normal circumstances, the duration difference that different nodes complete same calculating task is not too large, if difference It is larger, then it is probably that a certain node there is a problem, determines therefrom that the node that goes wrong so that real time computation system can be right in time Trouble node is handled, and improves the computation rate of whole system, accordingly improves the computational efficiency of system.
In order that those skilled in the art more fully understand the technical scheme in the embodiment of the present invention, below in conjunction with this hair Accompanying drawing in bright embodiment, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described Embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, this area The every other embodiment that those of ordinary skill is obtained under the premise of creative work is not made, belongs to protection of the present invention Scope.
It is shown in Figure 2, be in the embodiment of the present invention it is a kind of based on real time computation system the problem of nodal test method Implementing procedure figure, this method may comprise steps of:
S110:When meeting default nodal test condition, each in running order node is detected respectively.
In actual applications, the control node in real time computation system is when meeting default nodal test condition, can be with Each in running order node in system is detected respectively.Wherein, default nodal test condition, can be pre- If cycle (as daily or hourly), can also be that all in running order nodes complete a data in whole system The total duration of stream reaches default threshold value.It is understood that after real time computation system operation, control node is according to each work The log of node (i.e. in running order node), all working node in the system can be obtained and complete a data The normal range (NR) of the total duration of stream, if abnormal, the duration meeting of the allocated calculating task of its completion occurs for some working node Long, this exception can show from total duration.One threshold value is preset according to the normal range (NR) of total duration, works as total duration When reaching the threshold value, that is, think to have reached nodal test condition, control node starts to enter each in running order node Row detection, i.e., for each node to be detected, be performed both by following steps.
S120:For current first node to be detected, in the calculating task that the first node currently performs, it is determined that For the target calculating task of nodal test, and generate the copy task of the target calculating task.
In step S110, when meeting default nodal test condition, control node is respectively to each in running order Node detected, each in running order node can be used as node to be detected.In actual applications, each it is in The node of working condition can provide one or more computing resources, can at most perform the computing resource phase that can be provided with it With the calculating task of quantity.
Real time computation system for data flow real-time calculating process as shown in figure 3, in systems after input traffic, point The first step (Step1), second step (Step2), the 3rd step (Step3), the 4th are not performed by node 1, node 2, node 3, node 4 The calculating task of (Step4) is walked, then output result stream.In actual applications, node 1 and node 3 can be same node, i.e., The different computing resources of a certain node perform the calculating task of the first step and the 3rd step respectively.
It follows that current first node to be detected, its calculating task currently performed may have one or more.Pin To current first node to be detected, in the calculating task that the first node currently performs, it is determined that the mesh for nodal test Mark calculating task.In a kind of embodiment of the present invention, the calculating task that can currently be performed in the first node In determine target calculating task for nodal test at random, or, will according to the calculating task precedence information being obtained ahead of time The calculating task of highest priority in the calculating task that the first node currently performs, is defined as the target for nodal test Calculating task.It is understood that for the different computing tasks of same data flow its significance levels, the calculating of required consumption Resource may be different so that different calculating tasks has different priority, and the higher calculating task of priority is defined as Target calculating task, so, if it is determined that it is trouble node to go out present node, and the calculating task can be scheduled in time.
After determining the target calculating task for nodal test, the copy task of the target calculating task is generated.Here, The copy task of the target calculating task and the target calculating task is equal to same calculating task.It should be noted that according to The process of target calculating task ghost task belongs to prior art, and the embodiment of the present invention repeats no more to this.
S130:The section point that there is currently idling-resource is searched, the copy task is sent to the section point.
In actual applications, can be according to the computing resource state in which of each node in real time computation system by the system In node division be following three class:
First kind node, its all computing resource that can be provided are in working condition;
Second class node, there is that part computing resource is in running order in its computing resource that can be provided, and part is counted Calculate resource and be in idle condition;
3rd class node, its all computing resource that can be provided are in idle condition.
Within the system, control node can grasp the computing resource state in which of each node in real time.Control node can With in the second class, the 3rd class node determine a section point, by the copy task that step S120 is generated be sent to this second Node, the copy task is handled using its idle computing resource by section point.In actual applications, can be according to actual feelings Condition, the class node of prioritizing selection the 3rd is as section point.
By taking the data flow process shown in Fig. 3 as an example, this four nodes belong to first kind node or the second class node, false If this four nodes are different nodes, it is necessary to be detected respectively to this four nodes.For node 1, generate it and be used to save The copy task of the calculating task of point detection, and node 5 is found from the second class node or the 3rd class node, node 1 is right The copy task answered is sent to node 5;For node 2, generate it and be used for the copy task of the calculating task of nodal test, and from Node 6 is found in second class node or the 3rd class node, copy task corresponding to node 2 is sent to node 6;For node 3, generate it and be used for the copy task of the calculating task of nodal test, and section is found from the second class node or the 3rd class node Point 7, node 7 is sent to by copy task corresponding to node 3;For node 4, generate it and be used for the calculating task of nodal test Copy task, and node 8 is found from the second class node or the 3rd class node, copy task corresponding to node 4 is sent to Node 8.Referring specifically to shown in Fig. 4.So, it is equal to node 1 and node 5 and is handling same calculating task, node 2 and node 6 Same calculating task, node 4 and node 8 are being handled in the same calculating of processing handling same calculating task, node 3 and node 7 Task.
S140:The duration T1 of the target calculating task is completed according to the first node and the section point completes institute The duration T2 of copy task is stated, determines whether the first node is trouble node.
As it was previously stated, the copy task of target calculating task and the target calculating task is equal to same calculating task, lead to Cross the duration that contrast first node completes the duration of target calculating task and section point completes copy task, it may be determined that first Whether node is trouble node.
After above mentioned problem nodal test being done for all nodes in running order in system, you can to determine to be Which node is trouble node in system, to be handled accordingly trouble node.
In a kind of embodiment of the present invention, it can determine whether first node is problem section by following steps Point:
Step 1:Calculate duration T1 and section point completion that the first node completes the target calculating task The absolute value of the duration T2 of copy task difference;
Step 2:If the absolute value is more than default threshold value, and T1 > T2, it is determined that the first node is problem Node.
Control node in real time computation system can obtain first node by monitoring mode and complete target calculating task Duration T1 and section point complete the duration T2 of copy task, naturally it is also possible to by first node and section point actively by it The duration of completion task reports control node.
It is understood that for same calculating task, the handling duration difference of different nodes is not too large, if difference It is larger, it may indicate that one of node there is a problem.If the absolute value of T1 and T2 difference is more than default threshold value, and T1 > T2, then it is trouble node that can determine first node.The threshold value here preset at can be carried out according to actual conditions setting and Adjustment.
In one embodiment of the invention, methods described can also comprise the following steps:
In the case of it is determined that the first node is trouble node, the identification information of the first node is added to and asked Inscribe in node listing.
In actual applications, the control node of real time computation system can safeguard a trouble node list, wherein recording The identification information of trouble node.When it is determined that first node is trouble node, the identification information of first node can be added to In trouble node list.The problem of operation personnel can safeguard according to control node node listing is periodically carried out to trouble node Processing, after the completion of processing, if the failture evacuation of the trouble node, can also notify control node to be deleted in trouble node list Except the identification information of the node.
And in actual applications, it is unavoidable selected section point situation of problems occur.It is so if described exhausted Default threshold value, and T1 < T2 are more than to value, then it is trouble node that can determine the section point.
If it should be noted that during one-time detection, there are some problems in the section point found in itself, So, whether it is that trouble node is then less susceptible to determine for first node, can be searched during nodal test next time Contrast node of the node different from the section point in previous minor node detection process as first node, so, if the One node is trouble node really, then is not detected during certain nodal test once, by next time Nodal test process may also detect that.In a kind of embodiment of the present invention, the lookup for section point, control The problem of node can be safeguarded according to it node listing, the node being not recorded in the trouble node list is searched, and in institute In the node found there is the node of idling-resource as section point in selection.So it is possible to prevente effectively from because section point Itself the problem of influence first node whether be trouble node Detection accuracy.
In another embodiment of the invention, this method can also comprise the following steps:
First step:According to described problem node listing, the node being not recorded in the trouble node list is searched;
Second step:In the node found there is the node of idling-resource as the 3rd node in selection;
3rd step:The target calculating task is scheduled to the 3rd node.
In actual applications, first node is determined after trouble node, can directly to select one sky to be present in systems 3rd node of not busy resource, target calculating task is scheduled to the 3rd node.Can be with maintenance issues section in view of control node Point list, when selecting three nodes, the trouble node list first can be not recorded according to described problem node listing, lookup In node, then in the node found there is the node of idling-resource as the 3rd node in selection, and by target meter Calculate task scheduling and give the 3rd node.This way it is possible to avoid because the 3rd node itself has problem, and influence whole system The appearance of the situation of computation rate.
After determining trouble node, the calculating task in trouble node can be scheduled to other by real time computation system in time Working node, calculating task is avoided trouble node as far as possible, improve the computation rate of whole system, accordingly improve system Computational efficiency.
In another embodiment of the invention, in the case of it is determined that first node is trouble node, trouble node is exported Warning information, the identification information of first node is carried in warning information.Certainly, the problem of step is determined for other Point is applicable, i.e., after it is trouble node to determine certain node, the warning information of trouble node is exported, to facilitate operation personnel timely The problem of the problem of checking and handling correlation node.
The technical scheme provided using the embodiment of the present invention, by generating target calculating task corresponding to first node Copy task, and the duration of the target calculating task completed according to first node and section point complete the copy task when It is long, determine whether first node is trouble node.Because the copy task dispatching of target calculating task and the target calculating task is same In same calculating task, under normal circumstances, the duration difference that different nodes complete same calculating task is not too large, if difference It is larger, then it is probably that a certain node there is a problem, determines therefrom that the node that goes wrong so that real time computation system can be right in time Trouble node is handled, and improves the computation rate of whole system, accordingly improves the computational efficiency of system.
Corresponding to above method embodiment, the embodiment of the present invention additionally provides a kind of the problem of being based on real time computation system Nodal test device, shown in Figure 5, the device can include with lower module:
Testing conditions judge module 210, for judging currently whether meet default nodal test condition, if it is, Trigger goal task determining module 220;
The goal task determining module 220, for for current first node to be detected, working as in the first node In the calculating task of preceding execution, it is determined that the target calculating task for nodal test;
Copy task generation module 230, for generating the copy task of the target calculating task;
Section point searching modul 240, the section point of idling-resource is there is currently for searching;
Copy task sending module 250, for the copy task to be sent into the section point;
Trouble node determining module 260, for completing the duration T1 of the target calculating task according to the first node The duration T2 of the copy task is completed with the section point, determines whether the first node is trouble node.
In a kind of embodiment of the present invention, the device can also include with lower module:
Trouble node information add module, described in the case of it is determined that the first node is trouble node, inciting somebody to action The identification information of first node is added in trouble node list.
In a kind of embodiment of the present invention, the section point searching modul 240, it is specifically used for:
According to described problem node listing, the node being not recorded in the trouble node list is searched;
In the node found there is the node of idling-resource as section point in selection.
In a kind of embodiment of the present invention, described device can also include:
3rd node selecting module, for being not recorded in the trouble node list according to described problem node listing, lookup In node, in the node found selection the node of idling-resource be present as the 3rd node;
Calculating task scheduler module, for the target calculating task to be scheduled into the 3rd node.
In a kind of embodiment of the present invention, described problem node determining module 260, it can include:
Absolute value calculating sub module, duration T1 and the institute of the target calculating task are completed for calculating the first node State the absolute value of the duration T2 of section point completion copy task difference;
Trouble node determination sub-module, for being more than default threshold value in the absolute value, and in the case of T1 > T2, really The fixed first node is trouble node.
In a kind of embodiment of the present invention, described problem node determination sub-module, it is additionally operable to described absolute Value is more than default threshold value, and in the case of T1 < T2, it is trouble node to determine the section point.
In a kind of embodiment of the present invention, the goal task determining module 220, it is specifically used for:
Determine the target calculating task for nodal test at random in the calculating task that the first node currently performs;
Or according to the calculating task precedence information being obtained ahead of time, the calculating that the first node is currently performed is appointed The calculating task of highest priority in business, it is defined as the target calculating task for nodal test.
In a kind of embodiment of the present invention, described device can also include:
Warning information output module, in the case of it is determined that the first node is trouble node, exporting problem section The warning information of point, the identification information of the first node is carried in the warning information.
The device provided using the embodiment of the present invention, by the copy for generating target calculating task corresponding to first node Task, and the duration of the target calculating task is completed according to first node and section point completes the duration of the copy task, really Determine whether first node is trouble node.Because the copy task of target calculating task and the target calculating task is equal to same Calculating task, under normal circumstances, the duration difference that different nodes complete same calculating task are not too large, if difference is larger, It is probably then that a certain node there is a problem, determines therefrom that the node that goes wrong so that real time computation system can be in time to problem Node is handled, and improves the computation rate of whole system, accordingly improves the computational efficiency of system.
It should be noted that herein, such as first and second or the like relational terms are used merely to a reality Body or operation make a distinction with another entity or operation, and not necessarily require or imply and deposited between these entities or operation In any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to Nonexcludability includes, so that process, method, article or equipment including a series of elements not only will including those Element, but also the other element including being not expressly set out, or it is this process, method, article or equipment also to include Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that Other identical element also be present in process, method, article or equipment including the key element.
Each embodiment in this specification is described by the way of related, identical similar portion between each embodiment Divide mutually referring to what each embodiment stressed is the difference with other embodiment.It is real especially for device For applying example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method Part explanation.
Can one of ordinary skill in the art will appreciate that realizing that all or part of step in above method embodiment is To instruct the hardware of correlation to complete by program, described program can be stored in computer read/write memory medium, The storage medium designated herein obtained, such as:ROM/RAM, magnetic disc, CD etc..
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent substitution and improvements made within the spirit and principles in the present invention etc., are all contained in protection scope of the present invention It is interior.

Claims (16)

1. the problem of one kind is based on real time computation system nodal test method, it is characterised in that including:
When meeting default nodal test condition, each in running order node is detected respectively:
For current first node to be detected, in the calculating task that the first node currently performs, it is determined that being used for node The target calculating task of detection, and generate the copy task of the target calculating task;
The section point that there is currently idling-resource is searched, the copy task is sent to the section point;
The duration T1 of the target calculating task is completed according to the first node and the section point is completed the copy and appointed The duration T2 of business, determine whether the first node is trouble node.
2. according to the method for claim 1, it is characterised in that methods described also includes:
In the case of it is determined that the first node is trouble node, the identification information of the first node is added to problem section In point list.
3. according to the method for claim 2, it is characterised in that it is described to search the section point that there is currently idling-resource, Including:
According to described problem node listing, the node being not recorded in the trouble node list is searched;
In the node found there is the node of idling-resource as section point in selection.
4. according to the method in claim 2 or 3, it is characterised in that also include:
According to described problem node listing, the node being not recorded in the trouble node list is searched;
In the node found there is the node of idling-resource as the 3rd node in selection;
The target calculating task is scheduled to the 3rd node.
5. according to the method described in any one of claims 1 to 3, it is characterised in that described to complete institute according to the first node State the duration T1 of target calculating task and the section point completes the duration T2 of the copy task, determine the first node Whether it is trouble node, including:
Calculate duration T1 and the section point completion copy times that the first node completes the target calculating task The absolute value of the duration T2 of business difference;
If the absolute value is more than default threshold value, and T1 > T2, it is determined that the first node is trouble node.
6. according to the method for claim 5, it is characterised in that also include:
If the absolute value is more than default threshold value, and T1 < T2, it is determined that the section point is trouble node.
7. according to the method described in any one of claims 1 to 3, it is characterised in that described currently to be performed in the first node Calculating task in, it is determined that the target calculating task for nodal test, including:
Determine the target calculating task for nodal test at random in the calculating task that the first node currently performs;
Or
According to the calculating task precedence information being obtained ahead of time, priority in the calculating task that the first node is currently performed Highest calculating task, it is defined as the target calculating task for nodal test.
8. according to the method for claim 1, it is characterised in that also include:
In the case of it is determined that the first node is trouble node, the warning information of trouble node, the warning information are exported The middle identification information for carrying the first node.
9. the problem of one kind is based on real time computation system nodal test device, it is characterised in that including:
Testing conditions judge module, for judging currently whether meet default nodal test condition, if it is, triggering target Task determining module;
The goal task determining module, for for current first node to be detected, currently being performed in the first node Calculating task in, it is determined that the target calculating task for nodal test;
Copy task generation module, for generating the copy task of the target calculating task;
Section point searching modul, the section point of idling-resource is there is currently for searching;
Copy task sending module, for the copy task to be sent into the section point;
Trouble node determining module, for completing the duration T1 and described the of the target calculating task according to the first node Two nodes complete the duration T2 of the copy task, determine whether the first node is trouble node.
10. device according to claim 9, it is characterised in that also include:
Trouble node information add module, for it is determined that the first node be trouble node in the case of, by described first The identification information of node is added in trouble node list.
11. device according to claim 10, it is characterised in that the section point searching modul, be specifically used for:
According to described problem node listing, the node being not recorded in the trouble node list is searched;
In the node found there is the node of idling-resource as section point in selection.
12. the device according to claim 10 or 11, it is characterised in that also include:
3rd node selecting module, for being not recorded according to described problem node listing, lookup in the trouble node list Node, the node for selecting to have idling-resource in the node found is as the 3rd node;
Calculating task scheduler module, for the target calculating task to be scheduled into the 3rd node.
13. according to the device described in any one of claim 9 to 11, it is characterised in that described problem node determining module, bag Include:
Absolute value calculating sub module, the duration T1 of the target calculating task and described the are completed for calculating the first node Two nodes complete the absolute value of the duration T2 of copy task difference;
Trouble node determination sub-module, for being more than default threshold value in the absolute value, and in the case of T1 > T2, determine institute It is trouble node to state first node.
14. device according to claim 13, it is characterised in that
Described problem node determination sub-module, it is additionally operable to be more than default threshold value in the absolute value, and in the case of T1 < T2, It is trouble node to determine the section point.
15. according to the device described in any one of claim 9 to 11, it is characterised in that the goal task determining module, specifically For:
Determine the target calculating task for nodal test at random in the calculating task that the first node currently performs;
Or
According to the calculating task precedence information being obtained ahead of time, priority in the calculating task that the first node is currently performed Highest calculating task, it is defined as the target calculating task for nodal test.
16. device according to claim 9, it is characterised in that also include:
Warning information output module, in the case of it is determined that the first node is trouble node, exporting trouble node Warning information, the identification information of the first node is carried in the warning information.
CN201510218215.5A 2015-04-30 2015-04-30 The problem of one kind is based on real time computation system nodal test method and device Active CN104765648B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510218215.5A CN104765648B (en) 2015-04-30 2015-04-30 The problem of one kind is based on real time computation system nodal test method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510218215.5A CN104765648B (en) 2015-04-30 2015-04-30 The problem of one kind is based on real time computation system nodal test method and device

Publications (2)

Publication Number Publication Date
CN104765648A CN104765648A (en) 2015-07-08
CN104765648B true CN104765648B (en) 2017-12-08

Family

ID=53647494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510218215.5A Active CN104765648B (en) 2015-04-30 2015-04-30 The problem of one kind is based on real time computation system nodal test method and device

Country Status (1)

Country Link
CN (1) CN104765648B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106330531B (en) * 2016-08-15 2019-05-03 东软集团股份有限公司 The method and device of node failure record and processing
CN107959703B (en) * 2016-10-18 2021-04-16 网宿科技股份有限公司 Data processing method, client and distributed computing system
CN109218126B (en) * 2017-06-30 2023-10-17 中兴通讯股份有限公司 Method, device and system for monitoring node survival state
CN110705893B (en) * 2019-10-11 2021-06-15 腾讯科技(深圳)有限公司 Service node management method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8126696B1 (en) * 2008-10-15 2012-02-28 Hewlett-Packard Development Company, L.P. Modifying length of synchronization quanta of simulation time in which execution of nodes is simulated
CN102609303A (en) * 2012-01-18 2012-07-25 华为技术有限公司 Slow-task dispatching method and slow-task dispatching device of Map Reduce system
CN104199739A (en) * 2014-08-26 2014-12-10 浪潮(北京)电子信息产业有限公司 Speculation type Hadoop scheduling method based on load balancing
CN104331520A (en) * 2014-11-28 2015-02-04 北京奇艺世纪科技有限公司 Performance optimization method and device of Hadoop cluster and node state recognition method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8126696B1 (en) * 2008-10-15 2012-02-28 Hewlett-Packard Development Company, L.P. Modifying length of synchronization quanta of simulation time in which execution of nodes is simulated
CN102609303A (en) * 2012-01-18 2012-07-25 华为技术有限公司 Slow-task dispatching method and slow-task dispatching device of Map Reduce system
CN104199739A (en) * 2014-08-26 2014-12-10 浪潮(北京)电子信息产业有限公司 Speculation type Hadoop scheduling method based on load balancing
CN104331520A (en) * 2014-11-28 2015-02-04 北京奇艺世纪科技有限公司 Performance optimization method and device of Hadoop cluster and node state recognition method and device

Also Published As

Publication number Publication date
CN104765648A (en) 2015-07-08

Similar Documents

Publication Publication Date Title
CN104765648B (en) The problem of one kind is based on real time computation system nodal test method and device
US10692007B2 (en) Behavioral rules discovery for intelligent computing environment administration
CN107864071B (en) Active safety-oriented dynamic data acquisition method, device and system
US8930757B2 (en) Operations management apparatus, operations management method and program
CN106886485B (en) System capacity analysis and prediction method and device
CN109491850A (en) A kind of disk failure prediction technique and device
US20090178059A1 (en) Method and system for providing consistency in processing data streams
Chalermarrewong et al. Failure prediction of data centers using time series and fault tree analysis
KR102185190B1 (en) Method and system for anomaly behavior detection using machine learning
CN106844161A (en) Abnormal monitoring and Forecasting Methodology and system in a kind of carrier state stream calculation system
CN104699601A (en) Injecting Faults at Select Execution Points of Distributed Applications
CN104778111A (en) Alarm method and alarm device
Samir et al. Anomaly detection and analysis for clustered cloud computing reliability
TW201913522A (en) Risk feature screening, description message generation method, device and electronic device
US9600795B2 (en) Measuring process model performance and enforcing process performance policy
CN110032480A (en) A kind of server exception detection method, device and equipment
Ghasemieh et al. Survivability evaluation of fluid critical infrastructures using hybrid Petri nets
CN109194534B (en) Scheduling and management method for Internet of things equipment group
Huch et al. Machine learning-based run-time anomaly detection in software systems: An industrial evaluation
CN106487612A (en) A kind of server node monitoring method, monitoring server and system
CN108255620A (en) A kind of business logic processing method, apparatus, service server and system
US20190164067A1 (en) Method and device for monitoring a process of generating metric data for predicting anomalies
CN106452941A (en) Network anomaly detection method and device
CN110826075A (en) PLC dynamic measurement method, device, system, storage medium and electronic equipment
US20150020076A1 (en) Method to apply perturbation for resource bottleneck detection and capacity planning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant