CN104765648B - The problem of one kind is based on real time computation system nodal test method and device - Google Patents
The problem of one kind is based on real time computation system nodal test method and device Download PDFInfo
- Publication number
- CN104765648B CN104765648B CN201510218215.5A CN201510218215A CN104765648B CN 104765648 B CN104765648 B CN 104765648B CN 201510218215 A CN201510218215 A CN 201510218215A CN 104765648 B CN104765648 B CN 104765648B
- Authority
- CN
- China
- Prior art keywords
- node
- task
- calculating task
- trouble
- section point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The problem of being based on real time computation system the embodiment of the invention discloses one kind nodal test method and device.The problem of one kind is based on real time computation system nodal test method, comprises the following steps:When meeting default nodal test condition, each in running order node is detected respectively:For current first node to be detected, in the calculating task that the first node currently performs, it is determined that the target calculating task for nodal test, and generate the copy task of the target calculating task;The section point that there is currently idling-resource is searched, the copy task is sent to the section point;The duration T1 of the target calculating task is completed according to the first node and the section point completes the duration T2 of the copy task, determines whether the first node is trouble node.The technical scheme provided using the embodiment of the present invention, improves the computation rate of whole system, accordingly improves the computational efficiency of system.
Description
Technical field
The present invention relates to field of computer technology, more particularly to a kind of the problem of being based on real time computation system nodal test side
Method and device.
Background technology
Calculate in real time, also referred to as streaming computing, quick processing in real time can be carried out to data stream.Real time computation system is one
Kind distributed computing system, is widely used in the fields such as data mining and the data analysis of Internet firm.Main flow is opened at present
Source real time computation system, including Storm and Spark Streaming, all it is hypotactic Computational frame, as shown in figure 1, should
The cluster that real time computation system is made up of more machine nodes, including host node and from node, every can be with from node
One or more computing resources are provided, each computing resource can handle a calculating task.In actual applications, exist so
Situation, some is in working condition from all computing resources of node, and some is in work from the part computing resource of node
Make state, part computing resource is in idle condition, and some is in idle condition from all computing resources of node.
In the prior art for the detection of trouble node, both in the inspection for the node that can not perform calculating task completely
Survey, it is so-called to perform calculating task completely, refer to certain node because catastrophe failure, as offline or calculation procedure is surprisingly closed
Close, it is impossible to continue executing with calculating task.In this case, whole real time computation system all can not normal work.
In real time computation system, the detection to above mentioned problem node is important, but in actual applications, calculates in real time
The hardware performance and load condition of each node in system there may be certain difference, and poor-performing or load pressure are larger
The computation rate of node can be relatively low, such trouble node can be referred to as slow nodes, if existed in system such
Node, wooden barrel short -board effect can be triggered and tie down the computation rate of the cluster of whole system.
For example a real time computation system is the cluster set up before 2 years, the model of the cluster interior joint is A,
Current is dilatation cluster, and the model of newly-increased node is B, and the various performance parameters of the node of Type B number are superior to the section of A models
Point, can be due to wooden barrel short -board effect so that the calculating of this mashed up cluster if performing calculating task using the node of A models
Speed is also comparable to the computation rate of the node of A models.
For another example, there is unstable state in some in running order node in a real time computation system, as CPU is super negative
Carry, network interface card is made, disk I/O failure etc., the computation rate of the node have dropped, the computation rate of whole system as this
One machine node and reduce.
And existing this trouble node detection method is simply to the detection for the node that can not perform calculating task completely, nothing
Method determines the relatively low node of computation rate, i.e. slow nodes so that slow nodes occurs in the cluster of real time computation system
When, the computation rate of whole system is reduced, influences computational efficiency.
The content of the invention
To solve the above problems, the problem of being based on real time computation system the embodiment of the invention discloses one kind nodal test side
Method and device.Technical scheme is as follows:
The problem of one kind is based on real time computation system nodal test method, including:
When meeting default nodal test condition, each in running order node is detected respectively:
For current first node to be detected, in the calculating task that the first node currently performs, it is determined that being used for
The target calculating task of nodal test, and generate the copy task of the target calculating task;
The section point that there is currently idling-resource is searched, the copy task is sent to the section point;
The duration T1 of the target calculating task is completed according to the first node and the section point completes the pair
The duration T2 of this task, determine whether the first node is trouble node.
In a kind of embodiment of the present invention, methods described also includes:
In the case of it is determined that the first node is trouble node, the identification information of the first node is added to and asked
Inscribe in node listing.
In a kind of embodiment of the present invention, the lookup there is currently the section point of idling-resource, including:
According to described problem node listing, the node being not recorded in the trouble node list is searched;
In the node found there is the node of idling-resource as section point in selection.
In a kind of embodiment of the present invention, in addition to:
According to described problem node listing, the node being not recorded in the trouble node list is searched;
In the node found there is the node of idling-resource as the 3rd node in selection;
The target calculating task is scheduled to the 3rd node.
It is described that the target calculating task is completed according to the first node in a kind of embodiment of the present invention
Duration T1 and the section point complete the duration T2 of the copy task, determine whether the first node is problem section
Point, including:
Calculate duration T1 and the section point completion pair that the first node completes the target calculating task
The absolute value of the duration T2 of this task difference;
If the absolute value is more than default threshold value, and T1 > T2, it is determined that the first node is trouble node.
In a kind of embodiment of the present invention, in addition to:
If the absolute value is more than default threshold value, and T1 < T2, it is determined that the section point is trouble node.
In a kind of embodiment of the present invention, in the calculating task currently performed in the first node,
It is determined that the target calculating task for nodal test, including:
Determine the target calculating task for nodal test at random in the calculating task that the first node currently performs;
Or
It is excellent in the calculating task that the first node is currently performed according to the calculating task precedence information being obtained ahead of time
First level highest calculating task, it is defined as the target calculating task for nodal test.
In a kind of embodiment of the present invention, in addition to:
In the case of it is determined that the first node is trouble node, the warning information of trouble node, the alarm are exported
The identification information of the first node is carried in information.
The problem of one kind is based on real time computation system nodal test device, including:
Testing conditions judge module, for judging currently whether meet default nodal test condition, if it is, triggering
Goal task determining module;
The goal task determining module, it is current in the first node for for current first node to be detected
In the calculating task of execution, it is determined that the target calculating task for nodal test;
Copy task generation module, for generating the copy task of the target calculating task;
Section point searching modul, the section point of idling-resource is there is currently for searching;
Copy task sending module, for the copy task to be sent into the section point;
Trouble node determining module, for completing duration T1 and the institute of the target calculating task according to the first node
The duration T2 that section point completes the copy task is stated, determines whether the first node is trouble node.
In a kind of embodiment of the present invention, in addition to:
Trouble node information add module, described in the case of it is determined that the first node is trouble node, inciting somebody to action
The identification information of first node is added in trouble node list.
In a kind of embodiment of the present invention, the section point searching modul, it is specifically used for:
According to described problem node listing, the node being not recorded in the trouble node list is searched;
In the node found there is the node of idling-resource as section point in selection.
In a kind of embodiment of the present invention, in addition to:
3rd node selecting module, for being not recorded in the trouble node list according to described problem node listing, lookup
In node, in the node found selection the node of idling-resource be present as the 3rd node;
Calculating task scheduler module, for the target calculating task to be scheduled into the 3rd node.
In a kind of embodiment of the present invention, described problem node determining module, including:
Absolute value calculating sub module, duration T1 and the institute of the target calculating task are completed for calculating the first node
State the absolute value of the duration T2 of section point completion copy task difference;
Trouble node determination sub-module, for being more than default threshold value in the absolute value, and in the case of T1 > T2, really
The fixed first node is trouble node.
In a kind of embodiment of the present invention,
Described problem node determination sub-module, it is additionally operable to be more than default threshold value, and T1 < T2 feelings in the absolute value
Under condition, it is trouble node to determine the section point.
In a kind of embodiment of the present invention, the goal task determining module, it is specifically used for:
Determine the target calculating task for nodal test at random in the calculating task that the first node currently performs;
Or
It is excellent in the calculating task that the first node is currently performed according to the calculating task precedence information being obtained ahead of time
First level highest calculating task, it is defined as the target calculating task for nodal test.
In a kind of embodiment of the present invention, in addition to:
Warning information output module, in the case of it is determined that the first node is trouble node, exporting problem section
The warning information of point, the identification information of the first node is carried in the warning information.
The technical scheme provided using the embodiment of the present invention, by generating target calculating task corresponding to first node
Copy task, and the duration of the target calculating task completed according to first node and section point complete the copy task when
It is long, determine whether first node is trouble node.Because the copy task dispatching of target calculating task and the target calculating task is same
In same calculating task, under normal circumstances, the duration difference that different nodes complete same calculating task is not too large, if difference
It is larger, then it is probably that a certain node there is a problem, determines therefrom that the node that goes wrong so that real time computation system can be right in time
Trouble node is handled, and improves the computation rate of whole system, accordingly improves the computational efficiency of system.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with
Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is the structural representation for implementing computing system in the embodiment of the present invention;
Fig. 2 be the embodiment of the present invention in it is a kind of based on real time computation system the problem of nodal test method implementing procedure
Figure;
Fig. 3 is data flow process schematic diagram in the embodiment of the present invention;
Fig. 4 is interior joint detection process schematic diagram of the embodiment of the present invention;
Fig. 5 be the embodiment of the present invention in it is a kind of based on real time computation system the problem of nodal test device structural representation
Figure.
Embodiment
The one kind provided first the embodiment of the present invention is based on the problem of real time computation system nodal test method and carried out
Illustrate, this method may comprise steps of:
When meeting default nodal test condition, each in running order node is detected respectively:
For current first node to be detected, in the calculating task that the first node currently performs, it is determined that being used for
The target calculating task of nodal test, and generate the copy task of the target calculating task;
The section point that there is currently idling-resource is searched, the copy task is sent to the section point;
The duration T1 of the target calculating task is completed according to the first node and the section point completes the pair
The duration T2 of this task, determine whether the first node is trouble node.
The executive agent of above-mentioned steps is the control node in real time computation system.In hypotactic real time computation system
In, control node is the host node in the system, and the other machines node in the system is can will count from node, host node
Calculation task is distributed to respectively appoints from node, to being monitored from the running status of node, to going wrong from the calculating in node
Business is scheduled.
Control node is examined to each in running order node respectively when meeting default nodal test condition
Survey, each in running order node is node to be detected.For current first node to be detected, in the first node
In the calculating task currently performed, it is determined that the target calculating task for nodal test, and generate the pair of the target calculating task
This task, target calculating task and copy task are equal to same calculating task.Search the second section that there is currently idling-resource
Point, and the copy task is sent to section point.So, first node and section point can enter to same calculating task
Row processing, can obtain first node complete the duration of the target calculating task and section point complete the copy task when
It is long, and according to the magnitude relationship of obtained duration, determine whether first node is trouble node.Control node is in work to each
Make state working node carry out trouble node detection after, you can the problem of determining in the presence of system node, and in time
Trouble node is handled accordingly.
The technical scheme provided using the embodiment of the present invention, by generating target calculating task corresponding to first node
Copy task, and the duration of the target calculating task completed according to first node and section point complete the copy task when
It is long, determine whether first node is trouble node.Because the copy task dispatching of target calculating task and the target calculating task is same
In same calculating task, under normal circumstances, the duration difference that different nodes complete same calculating task is not too large, if difference
It is larger, then it is probably that a certain node there is a problem, determines therefrom that the node that goes wrong so that real time computation system can be right in time
Trouble node is handled, and improves the computation rate of whole system, accordingly improves the computational efficiency of system.
In order that those skilled in the art more fully understand the technical scheme in the embodiment of the present invention, below in conjunction with this hair
Accompanying drawing in bright embodiment, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described
Embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, this area
The every other embodiment that those of ordinary skill is obtained under the premise of creative work is not made, belongs to protection of the present invention
Scope.
It is shown in Figure 2, be in the embodiment of the present invention it is a kind of based on real time computation system the problem of nodal test method
Implementing procedure figure, this method may comprise steps of:
S110:When meeting default nodal test condition, each in running order node is detected respectively.
In actual applications, the control node in real time computation system is when meeting default nodal test condition, can be with
Each in running order node in system is detected respectively.Wherein, default nodal test condition, can be pre-
If cycle (as daily or hourly), can also be that all in running order nodes complete a data in whole system
The total duration of stream reaches default threshold value.It is understood that after real time computation system operation, control node is according to each work
The log of node (i.e. in running order node), all working node in the system can be obtained and complete a data
The normal range (NR) of the total duration of stream, if abnormal, the duration meeting of the allocated calculating task of its completion occurs for some working node
Long, this exception can show from total duration.One threshold value is preset according to the normal range (NR) of total duration, works as total duration
When reaching the threshold value, that is, think to have reached nodal test condition, control node starts to enter each in running order node
Row detection, i.e., for each node to be detected, be performed both by following steps.
S120:For current first node to be detected, in the calculating task that the first node currently performs, it is determined that
For the target calculating task of nodal test, and generate the copy task of the target calculating task.
In step S110, when meeting default nodal test condition, control node is respectively to each in running order
Node detected, each in running order node can be used as node to be detected.In actual applications, each it is in
The node of working condition can provide one or more computing resources, can at most perform the computing resource phase that can be provided with it
With the calculating task of quantity.
Real time computation system for data flow real-time calculating process as shown in figure 3, in systems after input traffic, point
The first step (Step1), second step (Step2), the 3rd step (Step3), the 4th are not performed by node 1, node 2, node 3, node 4
The calculating task of (Step4) is walked, then output result stream.In actual applications, node 1 and node 3 can be same node, i.e.,
The different computing resources of a certain node perform the calculating task of the first step and the 3rd step respectively.
It follows that current first node to be detected, its calculating task currently performed may have one or more.Pin
To current first node to be detected, in the calculating task that the first node currently performs, it is determined that the mesh for nodal test
Mark calculating task.In a kind of embodiment of the present invention, the calculating task that can currently be performed in the first node
In determine target calculating task for nodal test at random, or, will according to the calculating task precedence information being obtained ahead of time
The calculating task of highest priority in the calculating task that the first node currently performs, is defined as the target for nodal test
Calculating task.It is understood that for the different computing tasks of same data flow its significance levels, the calculating of required consumption
Resource may be different so that different calculating tasks has different priority, and the higher calculating task of priority is defined as
Target calculating task, so, if it is determined that it is trouble node to go out present node, and the calculating task can be scheduled in time.
After determining the target calculating task for nodal test, the copy task of the target calculating task is generated.Here,
The copy task of the target calculating task and the target calculating task is equal to same calculating task.It should be noted that according to
The process of target calculating task ghost task belongs to prior art, and the embodiment of the present invention repeats no more to this.
S130:The section point that there is currently idling-resource is searched, the copy task is sent to the section point.
In actual applications, can be according to the computing resource state in which of each node in real time computation system by the system
In node division be following three class:
First kind node, its all computing resource that can be provided are in working condition;
Second class node, there is that part computing resource is in running order in its computing resource that can be provided, and part is counted
Calculate resource and be in idle condition;
3rd class node, its all computing resource that can be provided are in idle condition.
Within the system, control node can grasp the computing resource state in which of each node in real time.Control node can
With in the second class, the 3rd class node determine a section point, by the copy task that step S120 is generated be sent to this second
Node, the copy task is handled using its idle computing resource by section point.In actual applications, can be according to actual feelings
Condition, the class node of prioritizing selection the 3rd is as section point.
By taking the data flow process shown in Fig. 3 as an example, this four nodes belong to first kind node or the second class node, false
If this four nodes are different nodes, it is necessary to be detected respectively to this four nodes.For node 1, generate it and be used to save
The copy task of the calculating task of point detection, and node 5 is found from the second class node or the 3rd class node, node 1 is right
The copy task answered is sent to node 5;For node 2, generate it and be used for the copy task of the calculating task of nodal test, and from
Node 6 is found in second class node or the 3rd class node, copy task corresponding to node 2 is sent to node 6;For node
3, generate it and be used for the copy task of the calculating task of nodal test, and section is found from the second class node or the 3rd class node
Point 7, node 7 is sent to by copy task corresponding to node 3;For node 4, generate it and be used for the calculating task of nodal test
Copy task, and node 8 is found from the second class node or the 3rd class node, copy task corresponding to node 4 is sent to
Node 8.Referring specifically to shown in Fig. 4.So, it is equal to node 1 and node 5 and is handling same calculating task, node 2 and node 6
Same calculating task, node 4 and node 8 are being handled in the same calculating of processing handling same calculating task, node 3 and node 7
Task.
S140:The duration T1 of the target calculating task is completed according to the first node and the section point completes institute
The duration T2 of copy task is stated, determines whether the first node is trouble node.
As it was previously stated, the copy task of target calculating task and the target calculating task is equal to same calculating task, lead to
Cross the duration that contrast first node completes the duration of target calculating task and section point completes copy task, it may be determined that first
Whether node is trouble node.
After above mentioned problem nodal test being done for all nodes in running order in system, you can to determine to be
Which node is trouble node in system, to be handled accordingly trouble node.
In a kind of embodiment of the present invention, it can determine whether first node is problem section by following steps
Point:
Step 1:Calculate duration T1 and section point completion that the first node completes the target calculating task
The absolute value of the duration T2 of copy task difference;
Step 2:If the absolute value is more than default threshold value, and T1 > T2, it is determined that the first node is problem
Node.
Control node in real time computation system can obtain first node by monitoring mode and complete target calculating task
Duration T1 and section point complete the duration T2 of copy task, naturally it is also possible to by first node and section point actively by it
The duration of completion task reports control node.
It is understood that for same calculating task, the handling duration difference of different nodes is not too large, if difference
It is larger, it may indicate that one of node there is a problem.If the absolute value of T1 and T2 difference is more than default threshold value, and
T1 > T2, then it is trouble node that can determine first node.The threshold value here preset at can be carried out according to actual conditions setting and
Adjustment.
In one embodiment of the invention, methods described can also comprise the following steps:
In the case of it is determined that the first node is trouble node, the identification information of the first node is added to and asked
Inscribe in node listing.
In actual applications, the control node of real time computation system can safeguard a trouble node list, wherein recording
The identification information of trouble node.When it is determined that first node is trouble node, the identification information of first node can be added to
In trouble node list.The problem of operation personnel can safeguard according to control node node listing is periodically carried out to trouble node
Processing, after the completion of processing, if the failture evacuation of the trouble node, can also notify control node to be deleted in trouble node list
Except the identification information of the node.
And in actual applications, it is unavoidable selected section point situation of problems occur.It is so if described exhausted
Default threshold value, and T1 < T2 are more than to value, then it is trouble node that can determine the section point.
If it should be noted that during one-time detection, there are some problems in the section point found in itself,
So, whether it is that trouble node is then less susceptible to determine for first node, can be searched during nodal test next time
Contrast node of the node different from the section point in previous minor node detection process as first node, so, if the
One node is trouble node really, then is not detected during certain nodal test once, by next time
Nodal test process may also detect that.In a kind of embodiment of the present invention, the lookup for section point, control
The problem of node can be safeguarded according to it node listing, the node being not recorded in the trouble node list is searched, and in institute
In the node found there is the node of idling-resource as section point in selection.So it is possible to prevente effectively from because section point
Itself the problem of influence first node whether be trouble node Detection accuracy.
In another embodiment of the invention, this method can also comprise the following steps:
First step:According to described problem node listing, the node being not recorded in the trouble node list is searched;
Second step:In the node found there is the node of idling-resource as the 3rd node in selection;
3rd step:The target calculating task is scheduled to the 3rd node.
In actual applications, first node is determined after trouble node, can directly to select one sky to be present in systems
3rd node of not busy resource, target calculating task is scheduled to the 3rd node.Can be with maintenance issues section in view of control node
Point list, when selecting three nodes, the trouble node list first can be not recorded according to described problem node listing, lookup
In node, then in the node found there is the node of idling-resource as the 3rd node in selection, and by target meter
Calculate task scheduling and give the 3rd node.This way it is possible to avoid because the 3rd node itself has problem, and influence whole system
The appearance of the situation of computation rate.
After determining trouble node, the calculating task in trouble node can be scheduled to other by real time computation system in time
Working node, calculating task is avoided trouble node as far as possible, improve the computation rate of whole system, accordingly improve system
Computational efficiency.
In another embodiment of the invention, in the case of it is determined that first node is trouble node, trouble node is exported
Warning information, the identification information of first node is carried in warning information.Certainly, the problem of step is determined for other
Point is applicable, i.e., after it is trouble node to determine certain node, the warning information of trouble node is exported, to facilitate operation personnel timely
The problem of the problem of checking and handling correlation node.
The technical scheme provided using the embodiment of the present invention, by generating target calculating task corresponding to first node
Copy task, and the duration of the target calculating task completed according to first node and section point complete the copy task when
It is long, determine whether first node is trouble node.Because the copy task dispatching of target calculating task and the target calculating task is same
In same calculating task, under normal circumstances, the duration difference that different nodes complete same calculating task is not too large, if difference
It is larger, then it is probably that a certain node there is a problem, determines therefrom that the node that goes wrong so that real time computation system can be right in time
Trouble node is handled, and improves the computation rate of whole system, accordingly improves the computational efficiency of system.
Corresponding to above method embodiment, the embodiment of the present invention additionally provides a kind of the problem of being based on real time computation system
Nodal test device, shown in Figure 5, the device can include with lower module:
Testing conditions judge module 210, for judging currently whether meet default nodal test condition, if it is,
Trigger goal task determining module 220;
The goal task determining module 220, for for current first node to be detected, working as in the first node
In the calculating task of preceding execution, it is determined that the target calculating task for nodal test;
Copy task generation module 230, for generating the copy task of the target calculating task;
Section point searching modul 240, the section point of idling-resource is there is currently for searching;
Copy task sending module 250, for the copy task to be sent into the section point;
Trouble node determining module 260, for completing the duration T1 of the target calculating task according to the first node
The duration T2 of the copy task is completed with the section point, determines whether the first node is trouble node.
In a kind of embodiment of the present invention, the device can also include with lower module:
Trouble node information add module, described in the case of it is determined that the first node is trouble node, inciting somebody to action
The identification information of first node is added in trouble node list.
In a kind of embodiment of the present invention, the section point searching modul 240, it is specifically used for:
According to described problem node listing, the node being not recorded in the trouble node list is searched;
In the node found there is the node of idling-resource as section point in selection.
In a kind of embodiment of the present invention, described device can also include:
3rd node selecting module, for being not recorded in the trouble node list according to described problem node listing, lookup
In node, in the node found selection the node of idling-resource be present as the 3rd node;
Calculating task scheduler module, for the target calculating task to be scheduled into the 3rd node.
In a kind of embodiment of the present invention, described problem node determining module 260, it can include:
Absolute value calculating sub module, duration T1 and the institute of the target calculating task are completed for calculating the first node
State the absolute value of the duration T2 of section point completion copy task difference;
Trouble node determination sub-module, for being more than default threshold value in the absolute value, and in the case of T1 > T2, really
The fixed first node is trouble node.
In a kind of embodiment of the present invention, described problem node determination sub-module, it is additionally operable to described absolute
Value is more than default threshold value, and in the case of T1 < T2, it is trouble node to determine the section point.
In a kind of embodiment of the present invention, the goal task determining module 220, it is specifically used for:
Determine the target calculating task for nodal test at random in the calculating task that the first node currently performs;
Or according to the calculating task precedence information being obtained ahead of time, the calculating that the first node is currently performed is appointed
The calculating task of highest priority in business, it is defined as the target calculating task for nodal test.
In a kind of embodiment of the present invention, described device can also include:
Warning information output module, in the case of it is determined that the first node is trouble node, exporting problem section
The warning information of point, the identification information of the first node is carried in the warning information.
The device provided using the embodiment of the present invention, by the copy for generating target calculating task corresponding to first node
Task, and the duration of the target calculating task is completed according to first node and section point completes the duration of the copy task, really
Determine whether first node is trouble node.Because the copy task of target calculating task and the target calculating task is equal to same
Calculating task, under normal circumstances, the duration difference that different nodes complete same calculating task are not too large, if difference is larger,
It is probably then that a certain node there is a problem, determines therefrom that the node that goes wrong so that real time computation system can be in time to problem
Node is handled, and improves the computation rate of whole system, accordingly improves the computational efficiency of system.
It should be noted that herein, such as first and second or the like relational terms are used merely to a reality
Body or operation make a distinction with another entity or operation, and not necessarily require or imply and deposited between these entities or operation
In any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to
Nonexcludability includes, so that process, method, article or equipment including a series of elements not only will including those
Element, but also the other element including being not expressly set out, or it is this process, method, article or equipment also to include
Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that
Other identical element also be present in process, method, article or equipment including the key element.
Each embodiment in this specification is described by the way of related, identical similar portion between each embodiment
Divide mutually referring to what each embodiment stressed is the difference with other embodiment.It is real especially for device
For applying example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method
Part explanation.
Can one of ordinary skill in the art will appreciate that realizing that all or part of step in above method embodiment is
To instruct the hardware of correlation to complete by program, described program can be stored in computer read/write memory medium,
The storage medium designated herein obtained, such as:ROM/RAM, magnetic disc, CD etc..
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent substitution and improvements made within the spirit and principles in the present invention etc., are all contained in protection scope of the present invention
It is interior.
Claims (16)
1. the problem of one kind is based on real time computation system nodal test method, it is characterised in that including:
When meeting default nodal test condition, each in running order node is detected respectively:
For current first node to be detected, in the calculating task that the first node currently performs, it is determined that being used for node
The target calculating task of detection, and generate the copy task of the target calculating task;
The section point that there is currently idling-resource is searched, the copy task is sent to the section point;
The duration T1 of the target calculating task is completed according to the first node and the section point is completed the copy and appointed
The duration T2 of business, determine whether the first node is trouble node.
2. according to the method for claim 1, it is characterised in that methods described also includes:
In the case of it is determined that the first node is trouble node, the identification information of the first node is added to problem section
In point list.
3. according to the method for claim 2, it is characterised in that it is described to search the section point that there is currently idling-resource,
Including:
According to described problem node listing, the node being not recorded in the trouble node list is searched;
In the node found there is the node of idling-resource as section point in selection.
4. according to the method in claim 2 or 3, it is characterised in that also include:
According to described problem node listing, the node being not recorded in the trouble node list is searched;
In the node found there is the node of idling-resource as the 3rd node in selection;
The target calculating task is scheduled to the 3rd node.
5. according to the method described in any one of claims 1 to 3, it is characterised in that described to complete institute according to the first node
State the duration T1 of target calculating task and the section point completes the duration T2 of the copy task, determine the first node
Whether it is trouble node, including:
Calculate duration T1 and the section point completion copy times that the first node completes the target calculating task
The absolute value of the duration T2 of business difference;
If the absolute value is more than default threshold value, and T1 > T2, it is determined that the first node is trouble node.
6. according to the method for claim 5, it is characterised in that also include:
If the absolute value is more than default threshold value, and T1 < T2, it is determined that the section point is trouble node.
7. according to the method described in any one of claims 1 to 3, it is characterised in that described currently to be performed in the first node
Calculating task in, it is determined that the target calculating task for nodal test, including:
Determine the target calculating task for nodal test at random in the calculating task that the first node currently performs;
Or
According to the calculating task precedence information being obtained ahead of time, priority in the calculating task that the first node is currently performed
Highest calculating task, it is defined as the target calculating task for nodal test.
8. according to the method for claim 1, it is characterised in that also include:
In the case of it is determined that the first node is trouble node, the warning information of trouble node, the warning information are exported
The middle identification information for carrying the first node.
9. the problem of one kind is based on real time computation system nodal test device, it is characterised in that including:
Testing conditions judge module, for judging currently whether meet default nodal test condition, if it is, triggering target
Task determining module;
The goal task determining module, for for current first node to be detected, currently being performed in the first node
Calculating task in, it is determined that the target calculating task for nodal test;
Copy task generation module, for generating the copy task of the target calculating task;
Section point searching modul, the section point of idling-resource is there is currently for searching;
Copy task sending module, for the copy task to be sent into the section point;
Trouble node determining module, for completing the duration T1 and described the of the target calculating task according to the first node
Two nodes complete the duration T2 of the copy task, determine whether the first node is trouble node.
10. device according to claim 9, it is characterised in that also include:
Trouble node information add module, for it is determined that the first node be trouble node in the case of, by described first
The identification information of node is added in trouble node list.
11. device according to claim 10, it is characterised in that the section point searching modul, be specifically used for:
According to described problem node listing, the node being not recorded in the trouble node list is searched;
In the node found there is the node of idling-resource as section point in selection.
12. the device according to claim 10 or 11, it is characterised in that also include:
3rd node selecting module, for being not recorded according to described problem node listing, lookup in the trouble node list
Node, the node for selecting to have idling-resource in the node found is as the 3rd node;
Calculating task scheduler module, for the target calculating task to be scheduled into the 3rd node.
13. according to the device described in any one of claim 9 to 11, it is characterised in that described problem node determining module, bag
Include:
Absolute value calculating sub module, the duration T1 of the target calculating task and described the are completed for calculating the first node
Two nodes complete the absolute value of the duration T2 of copy task difference;
Trouble node determination sub-module, for being more than default threshold value in the absolute value, and in the case of T1 > T2, determine institute
It is trouble node to state first node.
14. device according to claim 13, it is characterised in that
Described problem node determination sub-module, it is additionally operable to be more than default threshold value in the absolute value, and in the case of T1 < T2,
It is trouble node to determine the section point.
15. according to the device described in any one of claim 9 to 11, it is characterised in that the goal task determining module, specifically
For:
Determine the target calculating task for nodal test at random in the calculating task that the first node currently performs;
Or
According to the calculating task precedence information being obtained ahead of time, priority in the calculating task that the first node is currently performed
Highest calculating task, it is defined as the target calculating task for nodal test.
16. device according to claim 9, it is characterised in that also include:
Warning information output module, in the case of it is determined that the first node is trouble node, exporting trouble node
Warning information, the identification information of the first node is carried in the warning information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510218215.5A CN104765648B (en) | 2015-04-30 | 2015-04-30 | The problem of one kind is based on real time computation system nodal test method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510218215.5A CN104765648B (en) | 2015-04-30 | 2015-04-30 | The problem of one kind is based on real time computation system nodal test method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104765648A CN104765648A (en) | 2015-07-08 |
CN104765648B true CN104765648B (en) | 2017-12-08 |
Family
ID=53647494
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510218215.5A Active CN104765648B (en) | 2015-04-30 | 2015-04-30 | The problem of one kind is based on real time computation system nodal test method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104765648B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106330531B (en) * | 2016-08-15 | 2019-05-03 | 东软集团股份有限公司 | The method and device of node failure record and processing |
CN107959703B (en) * | 2016-10-18 | 2021-04-16 | 网宿科技股份有限公司 | Data processing method, client and distributed computing system |
CN109218126B (en) * | 2017-06-30 | 2023-10-17 | 中兴通讯股份有限公司 | Method, device and system for monitoring node survival state |
CN110705893B (en) * | 2019-10-11 | 2021-06-15 | 腾讯科技(深圳)有限公司 | Service node management method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8126696B1 (en) * | 2008-10-15 | 2012-02-28 | Hewlett-Packard Development Company, L.P. | Modifying length of synchronization quanta of simulation time in which execution of nodes is simulated |
CN102609303A (en) * | 2012-01-18 | 2012-07-25 | 华为技术有限公司 | Slow-task dispatching method and slow-task dispatching device of Map Reduce system |
CN104199739A (en) * | 2014-08-26 | 2014-12-10 | 浪潮(北京)电子信息产业有限公司 | Speculation type Hadoop scheduling method based on load balancing |
CN104331520A (en) * | 2014-11-28 | 2015-02-04 | 北京奇艺世纪科技有限公司 | Performance optimization method and device of Hadoop cluster and node state recognition method and device |
-
2015
- 2015-04-30 CN CN201510218215.5A patent/CN104765648B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8126696B1 (en) * | 2008-10-15 | 2012-02-28 | Hewlett-Packard Development Company, L.P. | Modifying length of synchronization quanta of simulation time in which execution of nodes is simulated |
CN102609303A (en) * | 2012-01-18 | 2012-07-25 | 华为技术有限公司 | Slow-task dispatching method and slow-task dispatching device of Map Reduce system |
CN104199739A (en) * | 2014-08-26 | 2014-12-10 | 浪潮(北京)电子信息产业有限公司 | Speculation type Hadoop scheduling method based on load balancing |
CN104331520A (en) * | 2014-11-28 | 2015-02-04 | 北京奇艺世纪科技有限公司 | Performance optimization method and device of Hadoop cluster and node state recognition method and device |
Also Published As
Publication number | Publication date |
---|---|
CN104765648A (en) | 2015-07-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104765648B (en) | The problem of one kind is based on real time computation system nodal test method and device | |
US10692007B2 (en) | Behavioral rules discovery for intelligent computing environment administration | |
CN107864071B (en) | Active safety-oriented dynamic data acquisition method, device and system | |
US8930757B2 (en) | Operations management apparatus, operations management method and program | |
CN106886485B (en) | System capacity analysis and prediction method and device | |
CN109491850A (en) | A kind of disk failure prediction technique and device | |
US20090178059A1 (en) | Method and system for providing consistency in processing data streams | |
Chalermarrewong et al. | Failure prediction of data centers using time series and fault tree analysis | |
KR102185190B1 (en) | Method and system for anomaly behavior detection using machine learning | |
CN106844161A (en) | Abnormal monitoring and Forecasting Methodology and system in a kind of carrier state stream calculation system | |
CN104699601A (en) | Injecting Faults at Select Execution Points of Distributed Applications | |
CN104778111A (en) | Alarm method and alarm device | |
Samir et al. | Anomaly detection and analysis for clustered cloud computing reliability | |
TW201913522A (en) | Risk feature screening, description message generation method, device and electronic device | |
US9600795B2 (en) | Measuring process model performance and enforcing process performance policy | |
CN110032480A (en) | A kind of server exception detection method, device and equipment | |
Ghasemieh et al. | Survivability evaluation of fluid critical infrastructures using hybrid Petri nets | |
CN109194534B (en) | Scheduling and management method for Internet of things equipment group | |
Huch et al. | Machine learning-based run-time anomaly detection in software systems: An industrial evaluation | |
CN106487612A (en) | A kind of server node monitoring method, monitoring server and system | |
CN108255620A (en) | A kind of business logic processing method, apparatus, service server and system | |
US20190164067A1 (en) | Method and device for monitoring a process of generating metric data for predicting anomalies | |
CN106452941A (en) | Network anomaly detection method and device | |
CN110826075A (en) | PLC dynamic measurement method, device, system, storage medium and electronic equipment | |
US20150020076A1 (en) | Method to apply perturbation for resource bottleneck detection and capacity planning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |