CN103731312A - Method and apparatus for performing failure checking on service of remote method invocation - Google Patents

Method and apparatus for performing failure checking on service of remote method invocation Download PDF

Info

Publication number
CN103731312A
CN103731312A CN201410037532.2A CN201410037532A CN103731312A CN 103731312 A CN103731312 A CN 103731312A CN 201410037532 A CN201410037532 A CN 201410037532A CN 103731312 A CN103731312 A CN 103731312A
Authority
CN
China
Prior art keywords
service
client
request
zookeeper
interim node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410037532.2A
Other languages
Chinese (zh)
Inventor
吴光超
贺晓亮
谢刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Feihu Information Technology Tianjin Co Ltd
Original Assignee
Feihu Information Technology Tianjin Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Feihu Information Technology Tianjin Co Ltd filed Critical Feihu Information Technology Tianjin Co Ltd
Priority to CN201410037532.2A priority Critical patent/CN103731312A/en
Publication of CN103731312A publication Critical patent/CN103731312A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Computer And Data Communications (AREA)

Abstract

The embodiment of the invention discloses a method and apparatus for performing failure checking on a service of remote method invocation. The method is follows: a client is connected with a Zookeeper service cluster, a service provided by a server creates a temporary node to which the service end belongs on the Zookeeper service cluster; the client initiates a request of the temporary node of monitoring the service to the Zookeeper service cluster, wherein the request is used for informing the client of a delete event when being used for triggering the Zookeeper service cluster to delete the temporary node of the service; when the client obtains the delete event, the client determines that the service fails currently, and when the client does not obtain the delete event, the client determines that the service does not fail currently. According to the embodiment of the invention, the client senses whether the service requested by the client fails or not, furthermore, when the client senses that the service requested by the client fails, a series of disaster recovery treatment measures are taken specific to a condition of request failure, and the stability and reliability of the service can be improved.

Description

The service of RMI is carried out to the method and apparatus of trouble shooting
Technical field
The present invention relates to distributed system field, particularly relate to the method and apparatus that the service of RMI is carried out to trouble shooting.
Background technology
Java RMI (Java RMI, Java Remote Method Invocation) be in Java programming language, for realizing an application programming interface for RMI, the program that it makes to move in client can be called object on remote service end (or be called " service ").RMI characteristic make Java programming personnel can be in network environment distributed operation.
When client is passed through the service of Java RMI request service end; due to various uncertain factors; as; the factors such as network interrupts, the machine of service end breaks down, the process of service end meets accident collapse or the load pressure of service end is excessive, will cause service end to provide service to client.And for client, when making it obtain some services from service end due to various uncertain factors, this service is equivalent to " breaking down ".
At present, between client and service end, normal ESB (the ESB that adopts, Enterprise Service Bus) technology, Enterprise Service Bus Technology is the product that traditional middleware Technology is combined with technology such as XML, Web services, provide connection maincenter the most basic in network, the technological disparity of isolation client and service end, realization seamless transparent calling between the two.But when the service of client-requested cannot be called because of fault, service end is only to the abnormal response of client return service.
, and for client, fault has appearred in certain service that can't perceive its request, also can when certain service is broken down, not take any disaster tolerance treatment measures therefore.
Summary of the invention
In order to solve the problems of the technologies described above, the embodiment of the present invention provides a kind of method and apparatus that the service of RMI is carried out to failure diagnosis, so that perceiving the service of its request, client whether there is fault, further, when the service that perceives its request in client is broken down, take disaster tolerance treatment measures, improved stability and the reliability of service.
The embodiment of the invention discloses following technical scheme:
A method of the service of RMI being carried out to failure diagnosis, comprising:
Client is connected with Zookeeper service cluster, and wherein, the service that service end provides has created the interim node that belongs to oneself on described Zookeeper service cluster;
Client is initiated the request of the interim node of monitoring described service to described Zookeeper service cluster, described request, for triggering described Zookeeper service cluster when deleting the interim node of described service, is given described client by the event notice of the interim node of the described service of deletion;
When client obtains the event of interim node of deleting described service, determine that described service breaks down current, when client does not obtain the event of the interim node of deleting described service, determine that described service do not break down current.
Preferably, described request is also for triggering described Zookeeper service cluster when re-creating the interim node of deleted described service, and by the event notice of interim node that re-creates described service, to described client, described method also comprises:
After client obtains the event of interim node of the described service of deletion of described Zookeeper service cluster notice, after client obtains the event of the interim node that re-creates described service of described Zookeeper service cluster notice, determine that described service do not break down current.
Preferably, also comprise:
When definite described service is current while not breaking down, if client described in service end request, serve, and the request failure to described service, client is served again described in service end request, until the request success to described service.
Further preferred, also comprise:
Before again asking described service at every turn, the continuous failed number of times of the described service of client statistics request, and whether the continuous failed number of times of judgement is more than or equal to the first predetermined threshold value;
If failed number of times is more than or equal to the first predetermined threshold value continuously, client stops serving described in service end request, but from having other service of identical function and select one in current service of not breaking down with described service, and to the selected service of service end request;
Described client is specially to service described in service end request again:
If failed number of times is less than the first predetermined threshold value continuously, client is served again described in service end request.
Preferably, also comprise:
When definite described service is during current breaking down, client is not served described in service end request, but from having other service of identical function and select one in current service of not breaking down with described service, and to the selected service of service end request.
Further preferred, described from having other service of identical function and select one to comprise in current service of not breaking down with described service:
According to the preset weight of service, from other service that there is identical function with described service and do not break down current, select the service of preset weight maximum.
Further preferred, also comprise:
When the continuous failed number of times of the described service of request is more than or equal to the first predetermined threshold value, being designated as one takes turns and asks unsuccessfully, client statistics is the failed wheel number of request continuously, and judgement asks failed wheel number whether to be more than or equal to the second predetermined threshold value continuously, or, client statistics is asked total wheel number unsuccessfully, and judges whether total wheel number of asking is unsuccessfully more than or equal to the 3rd predetermined threshold value;
If ask continuously failed wheel number to be more than or equal to the second predetermined threshold value, or, ask total wheel number to be unsuccessfully more than or equal to the 3rd predetermined threshold value, client stops serving described in service end request in Preset Time section.
A device that the service of RMI is carried out to failure diagnosis, is positioned at client, comprising:
Linkage unit, for being connected with Zookeeper service cluster, wherein, the service that service end provides has created the interim node that belongs to oneself on described Zookeeper service cluster;
Interception request unit, for initiate the request of the interim node of monitoring described service to described Zookeeper service cluster, described request, for triggering described Zookeeper service cluster when deleting the interim node of described service, is given described client by the event notice of the interim node of the described service of deletion;
Failure diagnosis unit, for when obtaining the event of interim node of deleting described service, determines that described service breaks down current, when not obtaining the event of the interim node of deleting described service, determines that described service do not break down current.
Preferably, described request is also for triggering described Zookeeper service cluster when re-creating the interim node of deleted described service, by the event notice of interim node that re-creates described service to described client,
Described failure diagnosis unit also for, after client obtains the event of interim node of the described service of deletion of described Zookeeper service cluster notice, after the event of the interim node that re-creates described service that obtains described Zookeeper service cluster notice, determine that described service do not break down current.
Preferably, also comprise:
Retry unit, for when definite described service is current while not breaking down, if served described in described service end request, and the request of described service failure, again described in service end request, serve, until the request success to described service.
Preferably, also comprise:
The first analytic unit, for before again asking described service at every turn, the continuous failed number of times of the described service of statistics request, and whether the continuous failed number of times of judgement is more than or equal to the first predetermined threshold value;
Jump-transfer unit, if be more than or equal to the first predetermined threshold value for continuous failed number of times, stop serving described in service end request, but from having other service of identical function and select one in current service of not breaking down with described service, and to the selected service of service end request;
Described retry unit specifically for, if continuously failed number of times is less than the first predetermined threshold value, again described in service end request, serve.
Preferably, also comprise:
The second analytic unit, for when the continuous failed number of times of a service of request is more than or equal to the first predetermined threshold value, being designated as one takes turns and asks unsuccessfully, client statistics is the failed wheel number of request continuously, and judgement asks failed wheel number whether to be more than or equal to the second predetermined threshold value continuously, or client statistics is asked total wheel number unsuccessfully, and judges whether total wheel number of asking is unsuccessfully more than or equal to the 3rd predetermined threshold value;
Screen unit, if for asking continuously failed wheel number to be more than or equal to the second predetermined threshold value, or, ask total wheel number to be unsuccessfully more than or equal to the 3rd predetermined threshold value, client stops serving described in service end request in Preset Time section.
As can be seen from the above-described embodiment, compared with prior art, the invention has the advantages that:
Utilize the above-mentioned working mechanism of Zookeeper service cluster, allow Zookeeper service cluster help client to monitor interim node, when interim node is deleted, Zookeeper service cluster will be deleted the event notice of this interim node to client.Client just can determine whether this service breaks down according to the event that whether receives the interim node of deleting some services.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, to the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.
A kind of flow chart that the service of RMI is carried out to the method for failure diagnosis that Fig. 1 provides for the embodiment of the present invention;
A kind of flow chart that the service of RMI is carried out to the method for disaster tolerance processing that Fig. 2 provides for the embodiment of the present invention;
The another kind that Fig. 3 provides for the embodiment of the present invention carries out the flow chart of the method for disaster tolerance processing to the service of RMI;
A kind of structure chart that the service of far call is carried out to the device of failure diagnosis that Fig. 4 provides for the embodiment of the present invention;
A kind of structure chart that the service of far call is carried out to the device of disaster tolerance processing that Fig. 5 provides for the embodiment of the present invention;
The another kind that Fig. 6 provides for the embodiment of the present invention carries out the structure chart of the device of disaster tolerance processing to the service of far call;
The another kind that Fig. 7 provides for the embodiment of the present invention carries out the structure chart of the device of disaster tolerance processing to the service of far call.
Embodiment
The embodiment of the present invention provides the method and apparatus that the service of RMI is carried out to failure diagnosis.Zookeeper service cluster is a system of coordinating for the reliability of large-scale distributed system, and it can encapsulate key service, and performance system efficient, function-stable offers user.Corresponding Zookeeper service cluster, itself there is following working mechanism: on Zookeeper service cluster, having a kind of node is interim node, interim node only exists when creating the service operation of this interim node, once find that the service that creates this interim node does not move, Zookeeper service cluster will be deleted this interim node.That is to say, whether the interim node of a service exists, and can show whether this service moves, and, shows whether this service breaks down that is.The core of technical solution of the present invention is, utilize the above-mentioned working mechanism of Zookeeper service cluster, allow Zookeeper service cluster help client to monitor interim node, when interim node is deleted, Zookeeper service cluster will be deleted the event notice of this interim node to client, when the interim node of deleting is re-created, Zookeeper service cluster also can be by the event notice that re-creates this interim node to client.Client just can determine whether this service breaks down current according to the event obtaining.
For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, below in conjunction with accompanying drawing, the embodiment of the present invention is described in detail.
Embodiment mono-
Refer to Fig. 1, it is a kind of flow chart that the service of RMI is carried out to the method for failure diagnosis that the embodiment of the present invention provides, and the method comprises the following steps:
Step 101: client is connected with Zookeeper service cluster, wherein, the service that service end provides has created the interim node that belongs to oneself on described Zookeeper service cluster;
Each service that service end offers client, at the initial phase of operation, can be set up one and belong to own interim node on Zookeeper service cluster, and by the sign of this service, is called this interim node and names.The identification name of each service is the port numbers of IP address+this service of the service end that this service is provided.
Step 102: client is initiated the request of the interim node of monitoring described service to described Zookeeper service cluster, described request, for triggering described Zookeeper service cluster when deleting the interim node of described service, is given described client by the event notice of the interim node of the described service of deletion;
User is configuration service list in the heart in Service Management, and all services with identical function are maintained in same service list, that is, in same service list, record has the identification name of all services of identical function.For example, in a service list, record the identification name of all A function services.Service Management center by the service list active push configuring to Zookeeper service cluster, Zookeeper service cluster is set up lasting node, and utilize lasting node to preserve the data (that is, thering is the identification name of all services of identical function) in each service list.For example, can utilize a lasting node to preserve the data in a service list.
Use the advantage of lasting node save data to be, once interrupt being connected between client and Zookeeper service cluster, data are still kept on Zookeeper service cluster, and can't disappear.
When client has the identification name of all services of a certain function to the request of Zookeeper service cluster, as response, Zookeeper service cluster will send to client by the identification name of all services with this function of preserving in some lasting nodes.Therefore,, in client, be also cached with the service list identical with Service Management center.By this service list, client can also be carried out disaster tolerance processing.And the process that client is carried out disaster tolerance processing according to this service list will describe follow-up.For example, if all service ends can provide 100 services with A function, in a service list of client, will correspondingly record the identification name of these 100 services.Certainly, if these 100 services are all in operation, on Zookeeper service cluster, also can be provided with the interim node of these 100 services.
In addition, Service Management center can also be pushed to client by the content of renewal by Zookeeper service cluster.
When the monitoring of client-requested Zookeeper service cluster has the interim node of correspondence of all services of a certain function, just the identification name of all services that obtain is sent to Zookeeper service cluster, Zookeeper service cluster just can determine that according to the sign of all services which interim node needs the interim node of monitoring is, and then these interim nodes are monitored.
As a working mechanism of Zookeeper service cluster, interim node only exists when creating the service operation of this interim node, once find that the service that creates this interim node does not move, Zookeeper service cluster will be deleted this interim node.Wherein, Zookeeper service cluster can be made regular check on the heartbeat that whether has received the service that creates this interim node, if Zookeeper service cluster does not receive the heartbeat of this service, think that this service does not move, then automatically delete the interim node of this service.Afterwards, if Zookeeper service cluster has received again the heartbeat of this service, think this service operation, and then automatically re-create the interim node of this service.
In the present invention, Zookeeper service cluster, except notifying client when deleting the interim node of some services, can also be notified client when re-creating the interim node of some services (deleted before this interim node).
Step 103: when client obtains the event of interim node of deleting described service, determine that described service breaks down current, when client does not obtain the event of the interim node of deleting described service, determine that described service do not break down current.
When Zookeeper service cluster is notified the deletion event of the some interim nodes of client, will be by the name of this interim node (, create the identification name of the service of this interim node) send to client, client can be deleted corresponding identification name in service list.Like this, what in this service list, record is exactly the identification name of the service of all survivals, that is, record be the identification name of all services of not breaking down.According to the content of this service list, client can determine which service break down, which service does not break down.Certainly, Zookeeper service cluster can also be when re-creating the interim node of deleted service, and by the event notice of interim node that re-creates this service, to client, client also can re-create the identification name of this service in service list.
And in current service of breaking down, client just can be to this service of service end request, from having other service of identical function and select one in current service of not breaking down with this service, and to the selected service of service end request.As can be seen from the above-described embodiment, compared with prior art, the invention has the advantages that:
Utilize the above-mentioned working mechanism of Zookeeper service cluster, allow Zookeeper service cluster help client to monitor interim node, when interim node is deleted, Zookeeper service cluster will be deleted the event notice of this interim node to client.Client just can determine whether this service breaks down according to the event that whether receives the interim node of deleting some services.
Embodiment bis-
The present embodiment two is with the difference of embodiment mono-, the event that whether receives the interim node of deleting some services in client basis can determine whether this service is after current breaking down, for in current service of not breaking down, if to this service of service end request, and to the request failure of this service, client can be further again to this service of service end request, until the request success to this service, to realize the disaster tolerance processing of service by the mode of continuous retry.Refer to Fig. 2, it is a kind of flow chart that the service of RMI is carried out to the method for disaster tolerance processing that the embodiment of the present invention provides, and the method comprises the following steps:
Step 201: client is connected with Zookeeper service cluster, wherein, the service that service end provides has created the interim node that belongs to oneself on described Zookeeper service cluster;
Step 202: client is initiated the request of the interim node of monitoring described service to described Zookeeper service cluster, described request, for triggering described Zookeeper service cluster when deleting the interim node of described service, is given described client by the event notice of the interim node of the described service of deletion;
Step 203: when client obtains the event of the interim node of deleting described service, determine that described service breaks down current, during the event of the interim node of the described service of deletion not obtaining when client, determine that described service do not break down current;
The concrete implementation of above-mentioned steps 201-203 can, referring to the step 101-103 in embodiment mono-, repeat no more herein.
When definite described service is current while not breaking down, if served described in service end request, and the request failure to described service, client continues to carry out following steps:
Step 204: the continuous failed number of times of the described service of client statistics request.
Step 205: whether client judgement continuously failed number of times is more than or equal to the first predetermined threshold value, if not, enters step 206, if so, enters step 207;
Step 206: client is served again described in service end request, if ask unsuccessfully, to return to step 204, if ask successfully process ends;
Step 207: client stops serving described in service end request, from having other service of identical function and select one in current service of not breaking down with described service, and to the selected service of service end request, process ends.
In client, safeguard and have up-to-date service list, in each up-to-date service list, record all identification names in current service of not breaking down with identical function.According to the record in nearest service list, client just can know that having in other service of identical function with the failed service of request, which service is not broken down current, and therefrom selects arbitrarily one to service end, to initiate request.
A kind of preferred mode is, for each service sets in advance a weight, to record the default weight of service in each service list of Service Management central service.Accordingly, in the each lasting node of Zookeeper service cluster, also record the default weight of service.Like this, in each service list of client maintenance, also just record the default weight of service, client can be according to the default weight of each service, from having identical function and current other service of not breaking down, select the service of weight maximum with the failed service of request.
It should be noted that, technical scheme of the present invention is not specifically limited the set-up mode of weight.But, consider the problem of load balancing of service end, can be service configuration weight according to factors such as the quantity of service of service end operation or the disposal abilities of service end.The quantity of the service of service end operation is more, and the weight of service is less, otherwise the weight of service is larger.The disposal ability of service end is faster, and the weight of service is larger, otherwise the weight of service is less.
If the request of second service selecting is failure also, client can also be asked second service again, when the total degree of the request of second service is more than or equal to predetermined threshold value, if the request of second service is not success still, select again the 3rd service, and again ask the 3rd service ... .. the like, until there is the request success of a service.
As can be seen from the above-described embodiment, compared with prior art, the invention has the advantages that:
When client can not break down in which service of perception, after which service is broken down, for the service of not breaking down, once the request of this service failure, client just can, by again to the mode of this service of service end request, be carried out disaster tolerance processing to this service.If the number of times of request reaches certain threshold value, can also be to other identical service of service end request function.Thereby stability and the reliability of this service have been improved.
Embodiment tri-
On the basis of the technical scheme shown in embodiment bis-, except by retry and jump to other service realize disaster tolerance process, the mode that the present embodiment three can further be served by shielding realizes disaster tolerance processing.Refer to Fig. 3, its another kind providing for the embodiment of the present invention carries out the flow chart of the method for disaster tolerance processing to the service of RMI, and the method comprises the following steps:
Step 301: client is connected with Zookeeper service cluster, wherein, the service that service end provides has created the interim node that belongs to oneself on described Zookeeper service cluster;
Step 302: client is initiated the request of the interim node of monitoring described service to described Zookeeper service cluster, described request user triggers described Zookeeper service cluster when deleting the interim node of described service, by the event notice of the interim node of the described service of deletion, gives described client;
Step 303: when client obtains the event of interim node of deleting described service, determine that described service breaks down current, when client does not obtain the event of the interim node of deleting described service, determine that described service do not break down current;
The concrete implementation of above-mentioned steps 301-303 can, referring to the step 101-103 in embodiment mono-, repeat no more herein.
When definite described service is current while not breaking down, if served described in service end request, and the request of described service failure, client continues to carry out following steps:
Step 304: the continuous failed number of times of the described service of client statistics request;
Step 305: whether client judgement continuously failed number of times is more than or equal to the first predetermined threshold value, if not, enters step 306, if so, enters step 307;
Step 306: client is served again described in service end request, if ask unsuccessfully, to return to step 304, if ask successfully process ends;
Step 307: be designated as one and take turns and ask unsuccessfully, simultaneously, client stops serving described in service end request, but from having other service of identical function and select one in current service of not breaking down with described service, and to the selected service of service end request;
Step 308: client statistics is the failed wheel number of request continuously;
In this step, except can adding up the wheel number that continuous request is failed, mode as an alternative, client also can be added up total wheel number of asking unsuccessfully.
When the first predetermined threshold value is m, the second predetermined threshold value can be n.
Step 309: whether the client judgement continuously failed wheel number of request is more than or equal to the second predetermined threshold value, if so, enters step 310, otherwise, step 304 returned to;
Or client judges whether total wheel number of asking is unsuccessfully more than or equal to the 3rd predetermined threshold value, if so, enters step 310, otherwise, step 304 returned to.
Step 310: client stops to service (being equivalent to described service to shield) described in client-requested in Preset Time section, and by the continuous failed number of times zero clearing of statistics.
For example, client can first be deleted away the identification name of this service from corresponding service list, after Preset Time section, the identification name of this service is returned in corresponding service list.
In addition, in Preset Time section, client can also be from having other service of identical function and select one in current service of not breaking down with described service, and initiate request to this service of selecting.
After Preset Time section, client can also continue to initiate request to this service, and performs step 304-310.
A kind of extreme situation is, if other that has an identical function with described service served all conductively-closeds,, all conductively-closeds of all services in service list, now, client can repeat to the first service in this service list of service end request, after attendant recovers first service, and just can be successful to the request of first service.
One preferred embodiment in, all services in service list can be divided into standby service and non-standby service, the non-standby service of override requests, when non-standby service is during current breaking down, and/or, when non-standby service is all during conductively-closed, then ask standby service.
As can be seen from the above-described embodiment, compared with prior art, the invention has the advantages that:
When client can not break down in which service of perception, after which service is broken down, for the service of not breaking down, once the request of this service failure, client just can, by again to the mode of this service of service end request, be carried out disaster tolerance processing to this service.If the number of times of request reaches certain threshold value, can also first shield this service a period of time, further other identical service of request function within the time period of shielding.Thereby stability and the reliability of this service have been improved.
Embodiment tetra-
With above-mentioned a kind of that the service of far call is carried out to the method for failure diagnosis is corresponding, the embodiment of the present invention also provides a kind of device that the service of far call is carried out to failure diagnosis.Refer to Fig. 4, it is a kind of structure chart that the service of far call is carried out to the device of failure diagnosis that the embodiment of the present invention provides, and this trouble-shooter is positioned at client, comprising: linkage unit 401, interception request unit 402 and failure diagnosis unit 403.Operation principle below in conjunction with this device is further introduced its internal structure and annexation.
Linkage unit 401, for being connected with Zookeeper service cluster, wherein, the service that service end provides has created the interim node that belongs to oneself on described Zookeeper service cluster;
Interception request unit 402, for initiate the request of the interim node of monitoring described service to described Zookeeper service cluster, described request, for triggering described Zookeeper service cluster when deleting the interim node of described service, is given described client by the event notice of the interim node of the described service of deletion;
Failure diagnosis unit 403, for when obtaining the event of interim node of deleting described service, determines that described service breaks down current, when not obtaining the event of the interim node of deleting described service, determines that described service do not break down current.
Preferably, described request is also for triggering described Zookeeper service cluster when re-creating the interim node of deleted described service, by re-create described service interim node event notice give described client, failure diagnosis unit 403 also for, after client obtains the event of interim node of the described service of deletion of described Zookeeper service cluster notice, after the event of the interim node that re-creates described service that obtains described Zookeeper service cluster notice, determine that described service do not break down current.
Preferably, as shown in Figure 5, it is a kind of structure chart that the service of far call is carried out to the device of disaster tolerance processing that the embodiment of the present invention provides, and this device, on the architecture basics shown in Fig. 4, also comprises:
Retry unit 404, for when definite described service is current while not breaking down, if served described in described service end request, and the request of described service failure, again described in service end request, serve, until the request success to described service.
Further preferred, as shown in Figure 6, this disaster tolerance processing unit also comprises:
The first analytic unit 405a, for before again asking described service at every turn, the continuous failed number of times of the described service of statistics request, and whether the continuous failed number of times of judgement is more than or equal to the first predetermined threshold value;
Jump-transfer unit 406a, if be more than or equal to the first predetermined threshold value for continuous failed number of times, stop serving described in service end request, but from having other service of identical function and select one in current service of not breaking down with described service, and to the selected service of service end request;
Described retry unit 404a specifically for, if continuously failed number of times is less than the first predetermined threshold value, again described in service end request, serve.
Or further preferred, as shown in Figure 7, this disaster tolerance processing unit also comprises:
The second analytic unit 405b, for when the continuous failed number of times of a service of request is more than or equal to the first predetermined threshold value, being designated as one takes turns and asks unsuccessfully, client statistics is the failed wheel number of request continuously, and judgement asks failed wheel number whether to be more than or equal to the second predetermined threshold value continuously, or client statistics is asked total wheel number unsuccessfully, and judges whether total wheel number of asking is unsuccessfully more than or equal to the 3rd predetermined threshold value;
Screen unit 406b, if for asking continuously failed wheel number to be more than or equal to the second predetermined threshold value, or, ask total wheel number to be unsuccessfully more than or equal to the 3rd predetermined threshold value, client stops serving described in service end request in Preset Time section.
Preferably, when definite described service is during current breaking down, client is not served described in service end request, from having other service of identical function and select one in current service of not breaking down with described service, and to the selected service of service end request.
Further preferred, described from having other service of identical function and select one to comprise in current service of not breaking down with described service:
According to the preset weight of service, from other service that there is identical function with described service and do not break down current, select the service of preset weight maximum.
As can be seen from the above-described embodiment,
Utilize the above-mentioned working mechanism of Zookeeper service cluster, allow Zookeeper service cluster help client to monitor interim node, when interim node is deleted, Zookeeper service cluster will be deleted the event notice of this interim node to client.Client just can determine whether this service breaks down according to the event that whether receives the interim node of deleting some services.
When client can not break down in which service of perception, after which service is broken down, for the service of not breaking down, once the request of this service failure, client just can, by again to the mode of this service of service end request, be carried out disaster tolerance processing to this service.If the number of times of request reaches certain threshold value, can also be to other identical service of service end request function.Or, if the number of times of request reaches certain threshold value, can also first shield this service a period of time, further other identical service of request function within the time period of shielding.Thereby stability and the reliability of this service have been improved.
The technical staff in described field can be well understood to, and for convenience of description and succinctly, the specific works process of the system of foregoing description, device and unit, can, with reference to the corresponding process in preceding method embodiment, not repeat them here.
In several embodiment provided by the present invention, should be understood that disclosed system, apparatus and method can realize by another way.For example, described above to device embodiment be only schematic, for example, the division of described unit, be only that a kind of logic function is divided, during actual realization, can have other dividing mode, for example multiple unit or assembly can be in conjunction with being maybe integrated into another system, or some features can ignore, or do not carry out.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, indirect coupling or the communication connection of device or unit can be electrical, mechanical or other form.
The described unit as separating component explanation can or can be also physically to separate, and the parts that show as unit can be or can not be also physical locations, can be positioned at a place, or also can be distributed in multiple network element.Can select according to the actual needs some or all of unit wherein to realize the object of the present embodiment scheme.
In addition, the each functional unit in each embodiment of the present invention can be integrated in a processing unit, can be also that the independent physics of unit exists, and also can be integrated in a unit two or more unit.Above-mentioned integrated unit both can adopt the form of hardware to realize, and can adopt the form of SFU software functional unit to realize.
It should be noted that, one of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, can carry out the hardware that instruction is relevant by computer program to complete, described program can be stored in a computer read/write memory medium, this program, when carrying out, can comprise as the flow process of the embodiment of above-mentioned each side method.Wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-Only Memory, ROM) or random store-memory body (Random Access Memory, RAM) etc.
Above the method and apparatus that the service of RMI is carried out to failure diagnosis provided by the present invention is described in detail, applied specific embodiment herein principle of the present invention and execution mode are set forth, the explanation of above embodiment is just for helping to understand method of the present invention and core concept thereof; , for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention meanwhile.

Claims (12)

1. a method of the service of RMI being carried out to failure diagnosis, is characterized in that, comprising:
Client is connected with Zookeeper service cluster, and wherein, the service that service end provides has created the interim node that belongs to oneself on described Zookeeper service cluster;
Client is initiated the request of the interim node of monitoring described service to described Zookeeper service cluster, described request, for triggering described Zookeeper service cluster when deleting the interim node of described service, is given described client by the event notice of the interim node of the described service of deletion;
When client obtains the event of interim node of deleting described service, determine that described service breaks down current, when client does not obtain the event of the interim node of deleting described service, determine that described service do not break down current.
2. method according to claim 1, it is characterized in that, described request is also for triggering described Zookeeper service cluster when re-creating the interim node of deleted described service, by re-create described service interim node event notice give described client, described method also comprises:
After client obtains the event of interim node of the described service of deletion of described Zookeeper service cluster notice, after client obtains the event of the interim node that re-creates described service of described Zookeeper service cluster notice, determine that described service do not break down current.
3. method according to claim 1, is characterized in that, also comprises:
When definite described service is current while not breaking down, if client described in service end request, serve, and the request failure to described service, client is served again described in service end request, until the request success to described service.
4. method according to claim 3, is characterized in that, also comprises:
Before again asking described service at every turn, the continuous failed number of times of the described service of client statistics request, and whether the continuous failed number of times of judgement is more than or equal to the first predetermined threshold value;
If failed number of times is more than or equal to the first predetermined threshold value continuously, client stops serving described in service end request, but from having other service of identical function and select one in current service of not breaking down with described service, and to the selected service of service end request;
Described client is specially to service described in service end request again:
If failed number of times is less than the first predetermined threshold value continuously, client is served again described in service end request.
5. method according to claim 1, is characterized in that, also comprises:
When definite described service is during current breaking down, client is not served described in service end request, but from having other service of identical function and select one in current service of not breaking down with described service, and to the selected service of service end request.
6. according to the method described in claim 4 or 5, it is characterized in that, described from described service have identical function other service select one in current service of not breaking down, to comprise:
According to the preset weight of service, from other service that there is identical function with described service and do not break down current, select the service of preset weight maximum.
7. method according to claim 4, is characterized in that, also comprises:
When the continuous failed number of times of the described service of request is more than or equal to the first predetermined threshold value, being designated as one takes turns and asks unsuccessfully, client statistics is the failed wheel number of request continuously, and judgement asks failed wheel number whether to be more than or equal to the second predetermined threshold value continuously, or, client statistics is asked total wheel number unsuccessfully, and judges whether total wheel number of asking is unsuccessfully more than or equal to the 3rd predetermined threshold value;
If ask continuously failed wheel number to be more than or equal to the second predetermined threshold value, or, ask total wheel number to be unsuccessfully more than or equal to the 3rd predetermined threshold value, client stops serving described in service end request in Preset Time section.
8. a device that the service of RMI is carried out to failure diagnosis, is characterized in that, is positioned at client, comprising:
Linkage unit, for being connected with Zookeeper service cluster, wherein, the service that service end provides has created the interim node that belongs to oneself on described Zookeeper service cluster;
Interception request unit, for initiate the request of the interim node of monitoring described service to described Zookeeper service cluster, described request, for triggering described Zookeeper service cluster when deleting the interim node of described service, is given described client by the event notice of the interim node of the described service of deletion;
Failure diagnosis unit, for when obtaining the event of interim node of deleting described service, determines that described service breaks down current, when not obtaining the event of the interim node of deleting described service, determines that described service do not break down current.
9. device according to claim 8, it is characterized in that, described request is also for triggering described Zookeeper service cluster when re-creating the interim node of deleted described service, by the event notice of interim node that re-creates described service to described client
Described failure diagnosis unit also for, after client obtains the event of interim node of the described service of deletion of described Zookeeper service cluster notice, after the event of the interim node that re-creates described service that obtains described Zookeeper service cluster notice, determine that described service do not break down current.
10. device according to claim 8, is characterized in that, also comprises:
Retry unit, for when definite described service is current while not breaking down, if served described in described service end request, and the request of described service failure, again described in service end request, serve, until the request success to described service.
11. devices according to claim 10, is characterized in that, also comprise:
The first analytic unit, for before again asking described service at every turn, the continuous failed number of times of the described service of statistics request, and whether the continuous failed number of times of judgement is more than or equal to the first predetermined threshold value;
Jump-transfer unit, if be more than or equal to the first predetermined threshold value for continuous failed number of times, stop serving described in service end request, but from having other service of identical function and select one in current service of not breaking down with described service, and to the selected service of service end request;
Described retry unit specifically for, if continuously failed number of times is less than the first predetermined threshold value, again described in service end request, serve.
12. devices according to claim 11, is characterized in that, also comprise:
The second analytic unit, for when the continuous failed number of times of a service of request is more than or equal to the first predetermined threshold value, being designated as one takes turns and asks unsuccessfully, client statistics is the failed wheel number of request continuously, and judgement asks failed wheel number whether to be more than or equal to the second predetermined threshold value continuously, or client statistics is asked total wheel number unsuccessfully, and judges whether total wheel number of asking is unsuccessfully more than or equal to the 3rd predetermined threshold value;
Screen unit, if for asking continuously failed wheel number to be more than or equal to the second predetermined threshold value, or, ask total wheel number to be unsuccessfully more than or equal to the 3rd predetermined threshold value, client stops serving described in service end request in Preset Time section.
CN201410037532.2A 2014-01-26 2014-01-26 Method and apparatus for performing failure checking on service of remote method invocation Pending CN103731312A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410037532.2A CN103731312A (en) 2014-01-26 2014-01-26 Method and apparatus for performing failure checking on service of remote method invocation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410037532.2A CN103731312A (en) 2014-01-26 2014-01-26 Method and apparatus for performing failure checking on service of remote method invocation

Publications (1)

Publication Number Publication Date
CN103731312A true CN103731312A (en) 2014-04-16

Family

ID=50455247

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410037532.2A Pending CN103731312A (en) 2014-01-26 2014-01-26 Method and apparatus for performing failure checking on service of remote method invocation

Country Status (1)

Country Link
CN (1) CN103731312A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104092591A (en) * 2014-08-04 2014-10-08 飞狐信息技术(天津)有限公司 Task monitoring method and system
CN105005469A (en) * 2015-06-03 2015-10-28 浙江大学 Non-blocking calling method based on Zookeeper and RabbitMQ
CN105391683A (en) * 2015-09-30 2016-03-09 小米科技有限责任公司 Remote method invocation method, device and system
CN106059843A (en) * 2016-08-16 2016-10-26 深圳市华成峰数据技术有限公司 Node configuration system and method based on Zookeepers
CN106250247A (en) * 2016-07-26 2016-12-21 浪潮电子信息产业股份有限公司 Method for realizing remote frame calling based on RMI and ZooKeeper
CN107465756A (en) * 2017-08-24 2017-12-12 北京奇艺世纪科技有限公司 A kind of method and apparatus of service request handling
CN107592229A (en) * 2017-09-22 2018-01-16 银联商务股份有限公司 A kind of service calling method, apparatus and system
CN107645476A (en) * 2016-07-22 2018-01-30 百度在线网络技术(北京)有限公司 Request processing method and device
CN104092591B (en) * 2014-08-04 2018-02-09 飞狐信息技术(天津)有限公司 A kind of Mission Monitor method and system
CN108108266A (en) * 2016-11-24 2018-06-01 腾讯科技(深圳)有限公司 Disaster recovery method, device and server
CN108366086A (en) * 2017-12-25 2018-08-03 聚好看科技股份有限公司 A kind of method and device of control business processing
CN109936613A (en) * 2017-12-19 2019-06-25 北京京东尚科信息技术有限公司 Disaster recovery method and device applied to server
CN110278227A (en) * 2018-03-15 2019-09-24 阿里巴巴集团控股有限公司 Service processing method, device and electronic equipment
CN110309016A (en) * 2019-06-13 2019-10-08 北京奇艺世纪科技有限公司 A kind of fusing restoration methods, device and server
CN110858168A (en) * 2018-08-24 2020-03-03 浙江宇视科技有限公司 Cluster node fault processing method and device and cluster node
CN111039115A (en) * 2018-10-15 2020-04-21 奥的斯电梯公司 Method and system for monitoring elevator communication module fault and elevator
CN113347263A (en) * 2021-06-11 2021-09-03 上海中通吉网络技术有限公司 Message cluster management method and system
CN114116128A (en) * 2021-11-23 2022-03-01 北京字节跳动网络技术有限公司 Method, device, equipment and storage medium for fault diagnosis of container instance

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080192626A1 (en) * 2006-01-10 2008-08-14 Huawei Technologies Co., Ltd. Service failure recovery method and system
CN102075380A (en) * 2010-12-16 2011-05-25 中兴通讯股份有限公司 Method and device for detecting server state
CN102710554A (en) * 2012-06-25 2012-10-03 深圳中兴网信科技有限公司 Distributed message system and service status detection method thereof
CN102739775A (en) * 2012-05-29 2012-10-17 宁波东冠科技有限公司 Method for monitoring and managing Internet of Things data acquisition server cluster
CN102932210A (en) * 2012-11-23 2013-02-13 北京搜狐新媒体信息技术有限公司 Method and system for monitoring node in PaaS cloud platform
CN103259688A (en) * 2013-06-04 2013-08-21 北京搜狐新媒体信息技术有限公司 Failure diagnosis method and device of distributed storage system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080192626A1 (en) * 2006-01-10 2008-08-14 Huawei Technologies Co., Ltd. Service failure recovery method and system
CN102075380A (en) * 2010-12-16 2011-05-25 中兴通讯股份有限公司 Method and device for detecting server state
CN102739775A (en) * 2012-05-29 2012-10-17 宁波东冠科技有限公司 Method for monitoring and managing Internet of Things data acquisition server cluster
CN102710554A (en) * 2012-06-25 2012-10-03 深圳中兴网信科技有限公司 Distributed message system and service status detection method thereof
CN102932210A (en) * 2012-11-23 2013-02-13 北京搜狐新媒体信息技术有限公司 Method and system for monitoring node in PaaS cloud platform
CN103259688A (en) * 2013-06-04 2013-08-21 北京搜狐新媒体信息技术有限公司 Failure diagnosis method and device of distributed storage system

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104092591A (en) * 2014-08-04 2014-10-08 飞狐信息技术(天津)有限公司 Task monitoring method and system
CN104092591B (en) * 2014-08-04 2018-02-09 飞狐信息技术(天津)有限公司 A kind of Mission Monitor method and system
CN105005469A (en) * 2015-06-03 2015-10-28 浙江大学 Non-blocking calling method based on Zookeeper and RabbitMQ
CN105005469B (en) * 2015-06-03 2018-04-13 浙江大学 A kind of Non-blocking call method based on Zookeeper and RabbitMQ
CN105391683A (en) * 2015-09-30 2016-03-09 小米科技有限责任公司 Remote method invocation method, device and system
CN107645476B (en) * 2016-07-22 2021-06-11 上海优扬新媒信息技术有限公司 Request processing method and device
CN107645476A (en) * 2016-07-22 2018-01-30 百度在线网络技术(北京)有限公司 Request processing method and device
CN106250247A (en) * 2016-07-26 2016-12-21 浪潮电子信息产业股份有限公司 Method for realizing remote frame calling based on RMI and ZooKeeper
CN106059843A (en) * 2016-08-16 2016-10-26 深圳市华成峰数据技术有限公司 Node configuration system and method based on Zookeepers
CN108108266A (en) * 2016-11-24 2018-06-01 腾讯科技(深圳)有限公司 Disaster recovery method, device and server
CN107465756A (en) * 2017-08-24 2017-12-12 北京奇艺世纪科技有限公司 A kind of method and apparatus of service request handling
CN107592229B (en) * 2017-09-22 2021-07-27 银联商务股份有限公司 Service calling method, device and system
CN107592229A (en) * 2017-09-22 2018-01-16 银联商务股份有限公司 A kind of service calling method, apparatus and system
CN109936613A (en) * 2017-12-19 2019-06-25 北京京东尚科信息技术有限公司 Disaster recovery method and device applied to server
CN108366086A (en) * 2017-12-25 2018-08-03 聚好看科技股份有限公司 A kind of method and device of control business processing
CN110278227A (en) * 2018-03-15 2019-09-24 阿里巴巴集团控股有限公司 Service processing method, device and electronic equipment
CN110858168A (en) * 2018-08-24 2020-03-03 浙江宇视科技有限公司 Cluster node fault processing method and device and cluster node
CN110858168B (en) * 2018-08-24 2023-08-18 浙江宇视科技有限公司 Cluster node fault processing method and device and cluster node
CN111039115A (en) * 2018-10-15 2020-04-21 奥的斯电梯公司 Method and system for monitoring elevator communication module fault and elevator
CN110309016A (en) * 2019-06-13 2019-10-08 北京奇艺世纪科技有限公司 A kind of fusing restoration methods, device and server
CN113347263A (en) * 2021-06-11 2021-09-03 上海中通吉网络技术有限公司 Message cluster management method and system
CN113347263B (en) * 2021-06-11 2022-10-11 上海中通吉网络技术有限公司 Message cluster management method and system
CN114116128A (en) * 2021-11-23 2022-03-01 北京字节跳动网络技术有限公司 Method, device, equipment and storage medium for fault diagnosis of container instance
CN114116128B (en) * 2021-11-23 2023-08-08 抖音视界有限公司 Container instance fault diagnosis method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN103731312A (en) Method and apparatus for performing failure checking on service of remote method invocation
CN101539873B (en) Data recovery method, data node and distributed file system
CN106487486B (en) Service processing method and data center system
CN110830283B (en) Fault detection method, device, equipment and system
CN105933391A (en) Node capacity expansion method, device and system
CN103383689A (en) Service process fault detection method, device and service node
CN108923992A (en) A kind of NAS cluster high availability method, system and electronic equipment and storage medium
CN112181660A (en) High-availability method based on server cluster
CN108038005A (en) Shared resource access method, client, server-side, system based on zookeeper
JP5855724B1 (en) Virtual device management apparatus, virtual device management method, and virtual device management program
CN109391691A (en) The restoration methods and relevant apparatus that NAS is serviced under a kind of single node failure
CN103795572B (en) The switching method and monitoring server of principal and subordinate's server
CN109828867A (en) A kind of cloud host disaster recovery method and system across data center
CN103490914A (en) Switching system and switching method for multi-machine hot standby of network application equipment
CN103761180A (en) Method for preventing and detecting disk faults during cluster storage
EP4060514A1 (en) Distributed database system and data disaster backup drilling method
CN104917827A (en) Method for realizing oracle load balancing cluster
CN111176888A (en) Cloud storage disaster recovery method, device and system
CN104202255A (en) Efficient multi-link data transmission implementation method
CN115248826A (en) Method and system for large-scale distributed graph database cluster operation and maintenance management
CN114553900B (en) Distributed block storage management system, method and electronic equipment
CN108319522A (en) A method of reinforcing distributed memory system reliability
CN107071189A (en) A kind of connection method of communication apparatus physical interface
CN116185697B (en) Container cluster management method, device and system, electronic equipment and storage medium
CN109842526A (en) A kind of disaster recovery method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20140416

RJ01 Rejection of invention patent application after publication