CN110417586B - Service monitoring method, service node, server and computer readable storage medium - Google Patents

Service monitoring method, service node, server and computer readable storage medium Download PDF

Info

Publication number
CN110417586B
CN110417586B CN201910649658.8A CN201910649658A CN110417586B CN 110417586 B CN110417586 B CN 110417586B CN 201910649658 A CN201910649658 A CN 201910649658A CN 110417586 B CN110417586 B CN 110417586B
Authority
CN
China
Prior art keywords
service
heartbeat information
abnormal
services
heartbeat
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910649658.8A
Other languages
Chinese (zh)
Other versions
CN110417586A (en
Inventor
郝向东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Big Data Technologies Co Ltd
Original Assignee
New H3C Big Data Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Big Data Technologies Co Ltd filed Critical New H3C Big Data Technologies Co Ltd
Priority to CN201910649658.8A priority Critical patent/CN110417586B/en
Publication of CN110417586A publication Critical patent/CN110417586A/en
Application granted granted Critical
Publication of CN110417586B publication Critical patent/CN110417586B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user

Abstract

The application provides a service monitoring method, a service node, a server and a computer readable storage medium, relating to the technical field of computers, by constructing a topology structure by a plurality of services including a first service together and establishing a correspondence between each service included in the topology structure and at least one other service, therefore, each service in the topological structure can determine whether the abnormal service possibly occurs in the service corresponding to each service in the corresponding relation by judging whether the receiving condition of the heartbeat information is abnormal or not, therefore, the monitoring of the working states of a plurality of services is realized, compared with the prior art, by forming a topological structure by a plurality of services together, and each service monitors other services corresponding to the service in the topology structure respectively, therefore, the situation that other service monitoring functions are completely lost due to single-point failure of the independent monitoring service is avoided.

Description

Service monitoring method, service node, server and computer readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a service monitoring method, a service node, a server, and a computer-readable storage medium.
Background
When a service cluster system composed of a plurality of services provides services for users, in order to avoid data loss or system function failure caused by abnormal operation of some services, it is generally necessary to monitor the operating state of each service in the service cluster system to ensure normal and stable operation of the service cluster system.
Currently, an independent monitoring service is generally adopted to monitor all services in a service cluster system in a unified manner; however, the problem of single point of failure exists in the independent monitoring service, and if the monitoring service itself is abnormal, the monitoring function of the whole service cluster system may be lost, and the whole service cluster system cannot be continuously monitored.
Disclosure of Invention
The application aims to provide a service monitoring method, a service node, a server and a computer readable storage medium, which can continuously provide monitoring service for a topological structure.
In order to achieve the above purpose, the embodiments of the present application employ the following technical solutions:
in a first aspect, an embodiment of the present application provides a service monitoring method, which is applied to a service node running a first service, where the first service and a plurality of other services together form a topology structure; each service contained in the topological structure is in a corresponding relation with at least one other service, and each service is used for receiving heartbeat information sent by the other service corresponding to the service and carrying out anomaly detection on the other service corresponding to the service; the method comprises the following steps:
the first service judges whether the receiving condition of the heartbeat information is abnormal or not;
if the receiving condition of the heartbeat information is determined to be abnormal, the first service judges whether abnormal services exist in other services corresponding to the first service.
In a second aspect, an embodiment of the present application provides a service node, where the service node runs a first service, and the first service and a plurality of other services together form a topology; each service contained in the topological structure is in a corresponding relation with at least one other service, and each service is used for receiving heartbeat information sent by the other service corresponding to the service and carrying out anomaly detection on the other service corresponding to the service; the service node comprises:
the judging module is used for judging whether the condition that the first service receives the heartbeat information is abnormal or not;
and the processing module is used for judging whether abnormal services occur in other services corresponding to the first service if the condition that the first service receives the heartbeat information is determined to be abnormal.
In a third aspect, an embodiment of the present application provides a server, including a memory for storing one or more programs; a processor; the one or more programs, when executed by the processor, implement the service monitoring method described above.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the service monitoring method described above.
In the service monitoring method, the service node, the server and the computer-readable storage medium provided by the embodiment of the application, a plurality of services including a first service together form a topology structure, and each service included in the topology structure is associated with at least one other service, so that each service in the topology structure can determine whether the service corresponding to each service in the association is abnormal or not by judging whether the receiving condition of heartbeat information is abnormal or not, thereby realizing monitoring of the working states of the plurality of services, compared with the prior art, the monitoring function of the topology structure formed by the plurality of services does not need to be provided by an independent monitoring service, but each service in the topology structure monitors other services corresponding to the service in the topology structure, thereby avoiding the complete loss of the monitoring function of the topology structure service caused by single-point failure and the like of the independent monitoring service, ensuring that the monitoring service can be continuously provided for the topological structure.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and it will be apparent to those skilled in the art that other related drawings can be obtained from the drawings without inventive effort.
FIG. 1 is a schematic application scenario diagram of monitoring service states in a service cluster system;
fig. 2 is a schematic block diagram of a server according to an embodiment of the present disclosure;
fig. 3 is a schematic application scenario diagram of a service monitoring method according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of a service monitoring method according to an embodiment of the present application;
FIG. 5 is a schematic flow chart of the substeps of S201 in FIG. 4;
FIG. 6 is another schematic flow chart of the substeps of S201 in FIG. 4;
FIG. 7 is a schematic flow chart of the substeps of S203 in FIG. 4;
FIG. 8 is a schematic flow chart of the substeps of S203-2 in FIG. 7;
fig. 9 is another schematic flow chart of a service monitoring method provided in an embodiment of the present application;
fig. 10 is a schematic structural diagram of a service node according to an embodiment of the present application.
In the figure: 100-a server; 101-a memory; 102-a processor; 103-a communication interface; 300-a service node; 301-a judgment module; 302-processing module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Referring to fig. 1, fig. 1 is a schematic application scenario diagram of monitoring a service state in a service cluster system, where a service cluster system formed by a plurality of services is generally implemented by using a monitoring scheme as shown in fig. 1, that is, an independent monitoring service for monitoring the service state is deployed, and all services in the service cluster system are uniformly monitored; each service in the service cluster system reports heartbeat information to the monitoring service, if the heartbeat information reported to the monitoring service by a certain service is abnormal, the monitoring service determines that the service corresponding to the abnormal heartbeat information is abnormal, and generates alarm information to remind maintenance personnel to process.
However, for the monitoring scheme shown in fig. 1, each service in the service cluster system performs heartbeat monitoring by using the monitoring service, and if the monitoring service fails, the monitoring function of the service cluster system may be lost, and the monitoring service cannot be provided for all the services, that is, a defect of a single point failure exists.
Therefore, based on the above drawbacks, a possible implementation manner provided by the embodiment of the present application is as follows: establishing a corresponding relation between each service and at least one other service in all services contained in a topological structure formed by a plurality of services in advance, receiving heartbeat information sent by other services corresponding to the service by each service, and performing abnormity detection on the other services corresponding to the service by judging whether the receiving condition of the heartbeat information is abnormal or not; even if a certain service is abnormal, the monitoring function of the service corresponding to the abnormal service is only influenced to be lost, but the monitoring function of the whole topological structure is not lost, so that the problem that other services cannot be monitored due to single-point failure of the centralized monitoring service in the prior art is solved.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Referring to fig. 2, fig. 2 is a schematic block diagram of a server 100 according to an embodiment of the present disclosure. The server 100 includes a memory 101, a processor 102, and a communication interface 103, wherein the memory 101, the processor 102, and the communication interface 103 are electrically connected to each other directly or indirectly to achieve data transmission or interaction, and for example, the components may be electrically connected to each other through one or more communication buses or signal lines.
The memory 101 may be configured to store software programs and modules, such as program instructions/modules corresponding to the service node 300 provided in the embodiment of the present application, and the processor 102 executes various functional applications and data processing by executing the software programs and modules stored in the memory 101, so as to implement the service monitoring method provided in the embodiment of the present application. The communication interface 103 may be used for the server 100 to communicate signaling or data with other node devices.
The Memory 101 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.
The processor 102 may be an integrated circuit chip having signal processing capabilities. The Processor 102 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
It will be appreciated that the configuration shown in fig. 2 is merely illustrative and that the server 100 may include more or fewer components than shown in fig. 2 or have a different configuration than shown in fig. 2. The components shown in fig. 2 may be implemented in hardware, software, or a combination thereof.
Referring to fig. 3, fig. 3 is a schematic application scenario diagram of a service monitoring method provided in the embodiment of the present application, where the service monitoring method provided in the embodiment of the present application is applied to a service node running a first service, for example, a service C in fig. 3 may be used as the first service, the first service and a plurality of other services together form a topology structure, each service included in the topology structure has a corresponding relationship with at least one other service, and each service is used to receive heartbeat information sent by the other service corresponding to the service, and perform anomaly detection on the other service corresponding to the service.
It should be noted that fig. 3 is only a schematic diagram, and a plurality of services belonging to the same service cluster system form a topology structure together; in some other possible implementation manners of the embodiment of the present application, a plurality of services that together form a topology structure may not belong to the same service cluster system, for example, in an application scenario as shown in fig. 3, a service a, a service B, a service C, a service D, and a service E together form a topology structure, the service a, the service B, and the service C belong to the same service cluster system, and the service D and the service E belong to another service cluster system.
Moreover, as a possible implementation manner, the corresponding relationship between each service in the topology and at least one other service may be unidirectional, for example, in the application scenario shown in fig. 3, in the corresponding relationship, a corresponding relationship is established between service a and service E, a corresponding relationship is established between service B and service a, a corresponding relationship is established between service C and service B, a corresponding relationship is established between service D and service C, and a corresponding relationship is established between service E and service D, that is, the corresponding relationship between each service and at least one other service in the corresponding relationship is unidirectional, service a performs anomaly detection on service E, service E performs anomaly detection on service D, service D performs anomaly detection on service C, service C performs anomaly detection on service B, and service B performs anomaly detection on service a.
However, in some other possible application scenarios, the corresponding relationship established between each service and another service may also be bidirectional, for example, based on the application scenario shown in fig. 3, in the corresponding relationship, the corresponding relationship established between the service C and the service E may be bidirectional, and the service C is configured to receive heartbeat information sent by the service E to perform anomaly detection on the service E; service E is also configured to receive heartbeat information sent by service C to perform anomaly detection on service C.
In addition, it should be noted that all services included in the topology shown in fig. 3 may be located in the same service node or located in different service nodes; for example, in the application scenario shown in fig. 3, taking the service C as the first service as an example, the service C may run on the same service node as other services (such as service a, service B, service D, or service E), or may run on different service nodes; for example, each service in fig. 3 may be operated in a different service node, and all service nodes in which at least one service included in the topology is operated form a service system to provide services for users.
The embodiment of the application does not make any limitation on whether each service in the topology structure is located in the same service node, which depends on a specific application scenario, as long as the first service and a plurality of other services can form the topology structure.
Moreover, the corresponding relationship between each service and at least one other service in all the services contained in the topological structure can be realized through a set system program; for example, when the service cluster system shown in fig. 3 is built, the service cluster system is started, or the service cluster system is updated, the system program establishes a corresponding relationship between each service and at least one other service, thereby forming a topology structure.
Based on the application scenario shown in fig. 3, please refer to fig. 4, where fig. 4 is a schematic flowchart of a service monitoring method provided in an embodiment of the present application, and the method includes the following steps:
s201, judging whether the receiving condition of the heartbeat information is abnormal or not by the first service; if normal, executing S202; if the abnormal condition exists, executing S203;
s202, the first service determines that abnormal services do not appear in other services corresponding to the first service;
s203, the first service determines whether an abnormal service occurs in other services corresponding to the first service.
In the embodiment of the application, the first service receives heartbeat information sent by at least one other service corresponding to the first service in the corresponding relationship according to the corresponding relationship, and the first service judges whether the receiving condition of the heartbeat information is abnormal or not, so as to judge whether the other service corresponding to the first service is abnormal or not.
If the first service determines that the receiving condition of the heartbeat information is abnormal, the first service needs to further determine whether abnormal services occur in other services corresponding to the first service; if the first service determines that the receiving condition of the heartbeat information is normal, the first service determines that no abnormal service occurs in other services corresponding to the first service, and at this time, the first service can also record the working states of the other services corresponding to the first service to represent that the first service monitors and determines that the other services corresponding to the first service are in normal working states at the moment.
For example, taking the service C in fig. 3 as the first service as an example, assume that the service corresponding to the service C in the correspondence relationship established by the system program is the service B; if the service C judges that the receiving condition of the heartbeat information is normal, the working state of the representation service B is normal; and if the service C judges that the receiving condition of the heartbeat information is abnormal, the working state of the representation service B is possibly abnormal.
It is worth noting that, for convenience of description, the above example is exemplified by directly adopting that a corresponding relationship for anomaly detection is established between the service C and the service B, in some other possible application scenarios in the embodiment of the present application, when the service C receives heartbeat information, it may not need to pay attention to a specific object that sends the heartbeat information, and when the service C determines that the receiving condition of the heartbeat information is normal, it is only required to directly record identification information that is used for representing that the working state of the other service corresponding to the first service is normal; for example, in the above example, the service C may directly record the first indication information for representing that the working state of the service corresponding to the service C in the corresponding relationship is normal, and it is not necessary to specifically record that the service B is normal; when the service C determines that the receiving condition of the heartbeat information is abnormal, it is also only necessary to record second indication information for indicating that the operating state of the service corresponding to the service C in the correspondence relationship is abnormal.
In other possible application scenarios, when the service C determines whether the receiving condition of the heartbeat information is abnormal, the service C may determine that the service C corresponds to the service B in the corresponding relationship by querying the corresponding relationship, so as to directly determine whether the service B is abnormal.
It can be seen that, in the above-mentioned scheme provided in this embodiment of the present application, a corresponding relationship between each service and at least one other service is established in advance in all services included in a topology structure formed by a plurality of services, and each service receives heartbeat information sent by other services corresponding to the service, and performs anomaly detection on the other services corresponding to the service by determining whether a reception condition of the heartbeat information is abnormal; even if a certain service is abnormal, only the monitoring function of the service corresponding to the abnormal service is affected to be lost, but the monitoring function of the whole topological structure is not lost.
Based on the above design, in the service monitoring method provided in the embodiment of the present application, a plurality of services including a first service together form a topology structure, and a corresponding relationship is established between each service included in the topology structure and at least one other service, so that each service in the topology structure can determine whether a service corresponding to each service in the corresponding relationship is possibly abnormal by determining whether the receiving condition of heartbeat information is abnormal, thereby implementing monitoring of the working states of the plurality of services, compared with the prior art, the monitoring function of the topology structure formed by the plurality of services is not required to be provided by an independent monitoring service, but each service in the topology structure monitors other services corresponding to the service in the topology structure, thereby avoiding complete loss of the monitoring function of the topology structure service caused by single point failure and the like of the independent monitoring service, ensuring that the monitoring service can be continuously provided for the topological structure.
It should be noted that the topology formed by the plurality of services together may be implemented in various forms.
Alternatively, as a possible implementation manner, the topology structure formed by the first service and the plurality of other services together may be, for example, a ring topology structure shown in fig. 3, where each service in the ring topology structure and the previous service adjacent along the first direction in the ring topology structure form a corresponding relationship. For example, in the application scenario shown in fig. 3, taking the counterclockwise direction as the first direction as an example, the service a establishes a corresponding relationship with the service E, the service B establishes a corresponding relationship with the service a, the service C establishes a corresponding relationship with the service B, the service D establishes a corresponding relationship with the service C, and the service E establishes a corresponding relationship with the service D, so that A, B, C, D, E are sequentially connected to form a ring-shaped topology structure; in the ring topology, taking the counterclockwise direction as an example of the positive direction, if the next service of the service a is the service B, the next service of the service B is the service C, the next service of the service C is the service D, the next service of the service D is the service E, and the next service of the service E is the service a, according to the monitoring policy corresponding to the above relationship, the service B monitors the service a, the service C monitors the service B, the service D monitors the service C, the service E monitors the service D, and the service a monitors the service E.
Moreover, as a possible implementation manner, when the system program forms the topology structure, the system program may set a heartbeat monitoring period, that is, every set period T, each service in the topology structure sends heartbeat information to a corresponding service in the corresponding relationship; similarly, each service receives heartbeat information sent by the corresponding service in the corresponding relationship at the set heartbeat detection time point according to the set period T; for example, in the application scenario shown in fig. 3, the service B sends heartbeat information to the service C according to the set period T, and similarly, the service C receives heartbeat information sent by the service B according to the set period T.
It should be noted that, the set period T may be set based on a period during which each service in the topology generates service data, for example, in an application scenario as shown in fig. 3, assuming that a service a generates one piece of service data every minute, a service B performs service data interaction with a service C every five minutes, a service a performs service data interaction with a service D every 10 minutes, and a service E generates one piece of service data every 3 minutes, the set period T may be set according to a rule less than or equal to 1 minute, for example, the set period T may be set to 30 seconds, 45 seconds, or the like, as long as the set period T is less than or equal to a time interval during which any service in the topology generates service data or performs service data interaction.
Optionally, referring to fig. 5, fig. 5 is a schematic flowchart of the sub-steps of S201 in fig. 4, and as a possible implementation manner, when S201 is implemented, the following sub-steps may be included:
s201-1, judging whether heartbeat information sent by a previous service adjacent to the first service in a first direction in a ring topology structure is received by the first service at a set heartbeat detection time point; if so, executing S201-2; if not, executing S201-3;
s201-2, judging that the receiving condition of the heartbeat information is normal by the first service;
s201-3, the first service judges that the receiving condition of the heartbeat information is abnormal.
Based on the ring topology structure shown in fig. 3, each service in the ring topology structure and the previous service adjacent to each other along the first direction in the ring topology structure form the corresponding relationship, and correspondingly, each service and the next service adjacent to each other along the first direction in the ring topology structure send heartbeat information; therefore, taking the first service as an example, the first service determines, at a set heartbeat detection time point according to a set period T, whether heartbeat information sent by a previous service adjacent to the first service in the first direction in the correspondence relationship is received, and if so, the first service determines that the receiving condition of the heartbeat information is normal; if the heartbeat information is not received, the first service judges that the receiving condition of the heartbeat information is abnormal.
In addition, if the first service determines that the receiving condition of the heartbeat information is abnormal, when S203 is executed, the first service determines that the previous service adjacent to the first service along the first direction in the ring topology structure is abnormal. For example, in the application scenario shown in fig. 3, taking the service C as the first service and the counterclockwise direction as the first direction as an example, in the application scenario, the last service adjacent to the service C in the counterclockwise direction in the ring topology is the service B; if the service C receives heartbeat information sent by the service B according to the set heartbeat detection time point, the service C judges that the working state of the service B is normal; and if the service C does not receive the heartbeat information sent by the service B according to the set heartbeat detection time point, the service C judges that the working state of the service B is abnormal.
Based on the above design, in the service monitoring method provided in the embodiment of the present application, the topology structure is set to be ring-shaped, and each service in the ring-shaped topology structure and the previous service adjacent to the ring-shaped topology structure along the first direction form a corresponding relationship, so that each service in the ring-shaped topology structure monitors the previous service adjacent to the ring-shaped topology structure along the first direction, and thus, the data amount during service monitoring can be simplified, and redundancy is reduced.
It is worth to be noted that the above implementation manner is only an illustration, and in some other possible application scenarios in the embodiment of the present application, the topology structure may be set in other forms, for example, a corresponding relationship is established between any two of the service a, the service B, the service C, the service D, and the service E, so that the topology structure is presented in a mesh form; or the service A, the service B, the service C, the service D and the service E are sequentially established with a bidirectional corresponding relation, so that the topological structure is presented in a chain form; the form of the topology structure is not limited in the embodiment of the present application, as long as the first service and the plurality of other services form the topology structure together, and each service in the topology structure establishes a corresponding relationship with at least one other service.
In some possible application scenarios, the first service may correspond to multiple services in the correspondence relationship, and in the application scenario, the first service needs to receive heartbeat information sent by the multiple services in the topology structure. For example, in the application scenario shown in fig. 3, service C is taken as a first service for exemplary illustration, it is assumed that service C and service B establish a corresponding relationship, and there is business data interaction between service C and service E; in order to ensure that a service unit formed by the service C and the service E can stably provide services for users, the service C and the service E can be used for establishing a corresponding relation for detecting abnormal services on the basis of business data interaction between the service C and the service E; in this application scenario, service C needs to receive not only the heartbeat information sent by service B, but also the heartbeat information sent by service E.
On this basis, please refer to fig. 6, fig. 6 is another schematic flowchart of the sub-step of S201 in fig. 4, and based on the foregoing application scenario, as another possible implementation, S201 may further include the following sub-steps:
s201-5, the first service judges whether the quantity of the heartbeat information received at the set heartbeat detection time point is the same as a set value; if yes, executing S201-6; if not, executing S201-7;
s201-6, the first service determines that the receiving condition of the heartbeat information is normal;
s201-7, the first service determines that the receiving condition of the heartbeat information is abnormal.
As a possible implementation manner, when the system program establishes or updates the topology structure shown in fig. 3, the system program may issue a set value for each service in the topology structure according to the established corresponding relationship, where the set value is the quantity of the heartbeat information that each service needs to receive at the set heartbeat monitoring time point and is sent by the corresponding service. For example, in the foregoing application scenario, taking the service C as the first service as an example, the system program establishes a corresponding relationship between the service C and the service B, and establishes a corresponding relationship between the service C and the service E, exemplarily, the system program issues the service C with a set value of 2, and the characterization service C needs to receive 2 pieces of heartbeat information at the set heartbeat monitoring time point. Therefore, in executing S201, the first service may compare the number of received heartbeat messages with a set value according to the set heartbeat monitoring time point; if the two are the same, the first service determines that the receiving condition of the heartbeat information is normal; otherwise, if the two are different, the first service determines that the receiving condition of the heartbeat information is abnormal. Taking the service C as the first service and the set value 2 issued by the system program as the service C as an example for explanation, if the service C receives 2 pieces of heartbeat information at the set heartbeat monitoring time point, which is the same as the set value 2 issued to the service C, the service C determines that the receiving condition of the heartbeat information is normal; if the service C receives 0 piece of heartbeat information (that is, does not receive the heartbeat information) or 1 piece of heartbeat information at the set heartbeat monitoring time point, and the heartbeat information is different from the set value 2 issued to the service C, the service C determines that the receiving condition of the heartbeat information is abnormal.
It should be noted that the above is merely an example, the first service may establish the corresponding relationship with other services with which business data interacts, and the embodiment of the present application does not limit the condition for establishing the corresponding relationship, for example, in an application scenario as shown in fig. 3, it is assumed that service C and service a do not have business data interaction, but the corresponding relationship between service C and service a may still be established, which depends on a specific application scenario or a specific configuration of a user.
In addition, in an application scenario shown in fig. 3, for example, taking service C as the first service, and the service C establishes a corresponding relationship with both service B and service E, if the service C determines that the receiving condition of the heartbeat information is abnormal, there may be three conditions: the heartbeat information sent by the service B to the service C is abnormal, the heartbeat information sent by the service E to the service C is abnormal, and the heartbeat information sent by the service B to the service C and the heartbeat information sent by the service E to the service C are both abnormal.
In the foregoing example, however, the relationship between service C and service B is different from the relationship between service C and service B, and the establishment of the corresponding relationship between service C and service B is based on the establishment of the corresponding relationship between service C and service B when the system program establishes the topology; the service C and the service E are based on the possible business data interaction between the service C and the service E, and in order to ensure that the service C and the service E can work normally, the corresponding relation is established between the service C and the service E, so that the service C and the service E can judge whether the working state of the other side is normal or not.
Therefore, as a possible implementation manner, in the embodiment of the present application, the correspondence relationship established between each service and any other service corresponding to the service is classified into a first-class correspondence relationship or a second-class correspondence relationship, and each service in all the services included in the topology structure establishes a first-class correspondence relationship with one service in the other services corresponding to the service; for the first-class corresponding relation, each service can directly judge whether the service establishing the first-class corresponding relation with the service is abnormal or not; for the second type corresponding relation, each service is used for auxiliary judgment on whether the service establishing the second type corresponding relation with the service is abnormal or not. For example, in the foregoing example, if the correspondence relationship between the service C and the service B is the first-type correspondence relationship, and the correspondence relationship between the service C and the service E is the second-type correspondence relationship, the service C can directly determine whether the working state of the service B is abnormal, but the service C performs the auxiliary determination on whether the working state of the service E is abnormal.
It should be noted that the direct judgment means that one service can directly judge whether the service corresponding to the service is abnormal according to whether heartbeat information sent by the service corresponding to the service is received, and the auxiliary judgment means that one service can only judge whether the corresponding service is suspected to be abnormal according to whether heartbeat information sent by the service corresponding to the service is received.
For example, in the above example, the correspondence relationship established between the service C and the service B is a first-type correspondence relationship, if the service C receives heartbeat information sent by the service B, it may be directly determined that the service B is working normally, and if the heartbeat information sent by the service B is not received, it may be directly determined that the service B is working abnormally; and the corresponding relation established between the service C and the service E is a second type corresponding relation, if the service C receives heartbeat information sent by the service E, the service C determines that the service E is in a normal working state, and if the service C does not receive the heartbeat information sent by the service E, the service C determines that the service E is suspected to be abnormal, and whether the service E works abnormally needs to be further judged.
Based on the foregoing embodiment, a specific process of determining whether an abnormal service occurs in other services corresponding to the first service is described next, please refer to fig. 7, fig. 7 is a schematic flowchart of the sub-step of S203 in fig. 4, and as a possible implementation manner, S203 may include the following sub-steps:
s203-1, the first service determines default target heartbeat information compared with set heartbeat information in the received heartbeat information at the set heartbeat detection time point;
s203-2, the first service judges whether the service needing to send the target heartbeat information is the service establishing the first type of corresponding relation with the first service or not according to the target heartbeat information; if so, executing S203-3; if not, executing S203-4;
s203-3, the first service determines that the service establishing the first type corresponding relation with the first service is abnormal;
s203-4, the first service determines that the service establishing the first-class corresponding relation with the first service is not abnormal.
In this embodiment of the application, when the first service monitors all services corresponding to the first service in the first-class correspondence and the second-class correspondence, each time the first service determines, according to the set period T, that the receiving condition of the heartbeat information is normal at the set heartbeat detection time, the first service may record the heartbeat information received at the time point, and further use the recorded heartbeat information as the set heartbeat information to determine a service with abnormal heartbeat when it is determined that the receiving condition of the heartbeat information is abnormal next time, where the set heartbeat information includes all heartbeat information that the first service needs to receive at the set heartbeat detection time point.
It should be noted that, because the service of heartbeat detection is generally periodic, every time the first service determines that the receiving condition of the heartbeat information is normal, as a possible implementation manner, a manner of overlaying the last received normal heartbeat information with the heartbeat information received at the current time may be adopted, so as to ensure that the first service always records the set heartbeat information as the latest received normal heartbeat information in the whole process of heartbeat detection.
In addition, in some other possible application scenarios in the embodiment of the present application, the normal heartbeat information received last time may not be covered, and the normal heartbeat information determined as normal each time is recorded as the set heartbeat information in a manner of adding a timestamp to the normal heartbeat information received each time or adding a sequence number according to a time sequence; as long as the first service can obtain the set heartbeat information as a standard to obtain default target heartbeat information when judging whether the receiving condition of the heartbeat information is abnormal; in addition, in some other possible application scenarios of the present application, the generated set heartbeat information may be directly sent to the first service in a manner specified by the system program, that is, when the system program initializes the topology structure, so that the first service obtains, according to the received set heartbeat information, default target heartbeat information of the received heartbeat information compared with the set heartbeat information.
Thus, when S203 is executed, the first service compares the heartbeat information received at the set heartbeat detection time point with the set heartbeat information, and acquires that the received heartbeat information is compared with the default target heartbeat information in the set heartbeat information. In the above example, the set heartbeat information may be heartbeat information recorded by the first service in a manner of continuously updating the first service in a covering manner, or may be latest recorded heartbeat information determined according to a timestamp, or heartbeat information with a largest number determined according to a sequence number; in addition, the default target heartbeat information obtained by the first service represents the heartbeat information which is lacked in the heartbeat information received by the first service at the set heartbeat detection time point.
For example, taking the service C as the first service as an example, if the set heartbeat information recorded by the service C includes { "flag": 1,0 } and { "E": urlA }, and the heartbeat information received by the service C at the set heartbeat detection time point is { "E": urlA }, the default target heartbeat information determined by the service C is { "flag": 1,0 }; or, if the heartbeat information received by the service C at the set heartbeat detection time point is { "flag": 1,0 }, the default target heartbeat information determined by the service C is { "E": urlA }.
The first service judges whether the service needing to send the target heartbeat information is a service establishing a first type of corresponding relation with the first service or not based on the obtained target heartbeat information; if so, the first service determines that the service establishing the first-class corresponding relation with the first service is abnormal; if not, the first service determines that the service establishing the first-class corresponding relation with the first service is not abnormal.
In order to implement S203-2, optionally, referring to fig. 8, fig. 8 is a schematic flow chart of the sub-steps of S203-2 in fig. 7, as a possible implementation manner, S203-2 may include the following sub-steps:
s203-2a, the first service judges whether the target heartbeat information contains first identification information used for representing the first type of corresponding relation; if yes, executing S203-2 b; if not, go to S203-2 c.
S203-2b, the first service determines that the service needing to send the target heartbeat information is the service establishing the first type of corresponding relation with the first service;
s203-2c, the first service determines that the service needing to send the target heartbeat information is not the service establishing the first-class corresponding relation with the first service.
In this embodiment of the present application, first identification information representing a first-class correspondence may be set, so that when each service in the topology receives heartbeat information sent by corresponding other services, whether the service sending the heartbeat information is a service establishing the first-class correspondence with the service may be determined by using the first identification information.
Therefore, the first service judges whether the determined default target heartbeat information contains first identification information or not; if the first type of service is included in the service, the first service determines that the service needing to send the target heartbeat information is the service establishing the first type of corresponding relationship with the first service, and therefore the service establishing the first type of corresponding relationship with the first service is determined to be abnormal; if not, the first service determines that the service needing to send the target heartbeat information is not the service establishing the first-class corresponding relationship with the first service, so that it can be determined that the service establishing the first-class corresponding relationship with the first service is not abnormal.
For example, in the above example, as shown in fig. 3, taking service C as the first service, and service B as the service that establishes the first-class correspondence with service C as an example, and assuming that the first identification information is "flag"; if the target heartbeat information determined by the service C is { "flag" [1,0] }, the service C contains first identification information "flag", namely, the service which needs to send the target heartbeat information is represented as the service which establishes the first-class corresponding relation with the service C and is abnormal, namely, the service B is abnormal; and if the target heartbeat information determined by the service C is { "E": urlA }, the service C does not contain the first identification information "flag", namely, the service which needs to send the target heartbeat information is characterized not to be the service which establishes the first type of corresponding relation with the service C, namely, the service B is not abnormal.
In addition, in the monitoring scheme shown in fig. 1, for example, the monitoring service, as a centralized monitoring mechanism, generally needs to monitor a plurality of different service cluster systems, that is, heartbeat monitoring is performed on a large number of services from the plurality of different service cluster systems, and different services often have different functions, so that if a certain service fails, the monitoring service can only record and alarm to a maintenance person, so that the maintenance person can process the failed service, for example, the maintenance person restarts the failed service or removes the service cluster system, and the like.
Therefore, to reduce the operations of the maintenance personnel and save the human resources, please refer to fig. 9, where fig. 9 is another schematic flowchart of the service monitoring method provided in the embodiment of the present application, and if the first service determines that the service establishing the first-type correspondence relationship with the first service is abnormal after performing S203, for example, when the service C determines that the service B is abnormal in the above example, the service monitoring method further includes the following steps:
s204, the first service judges whether the service establishing the first class corresponding relation with the first service establishes a second class corresponding relation with other services in the topological structure; if yes, executing S205; if not, executing S206;
s205, the first service updates the identification information used for indicating the working state in the target heartbeat information into an abnormal identification so as to update the set heartbeat information;
s206, the first service judges whether the service establishing the first class corresponding relation with the first service is unloaded; if unloaded, go to S207; if not, executing S209;
s207, the first service indicates that the service establishing the first type of corresponding relation with the first service is removed from the topological structure and the topological structure is reconstructed based on the rest services;
s208, the first service removes the target heartbeat information from the set heartbeat information;
s209, the first service instructs to restart the service establishing the first kind of corresponding relationship with the first service, and updates the identification information used for indicating the working state in the target heartbeat information as a restart identification so as to update the set heartbeat information.
In this embodiment of the application, if the first service determines that the service having the first-class correspondence with the first service has the second-class correspondence with other services in the topology, the first service is characterized to be only used for recording the working state of the service having the first-class correspondence with the first service, and at this time, the first service updates the identification information used for indicating the working state in the target heartbeat information to the abnormal identification, so as to update the set heartbeat information to indicate that the service having the first-class correspondence with the first service is abnormal.
On the contrary, if the first service determines that the service establishing the first-class correspondence with the first service does not establish the second-class correspondence with the other services in the topology, the first service needs to perform maintenance operation on the service establishing the first-class correspondence with the first service, that is: if the first service determines that the service establishing the first-class corresponding relationship with the first service is unloaded, the first service indicates to remove the service establishing the first-class corresponding relationship with the first service in the topological structure, reconstruct the topological structure based on the rest services, and remove the target heartbeat information from the set heartbeat information; if the first service determines that the service establishing the first-class corresponding relationship with the first service is not unloaded, the first service indicates to restart the service establishing the first-class corresponding relationship with the first service, and the identification information used for indicating the working state in the target heartbeat information is updated to be a restart identification so as to update the set heartbeat information, thereby indicating that the service establishing the first-class corresponding relationship with the first service in the topological structure is restarting.
It should be noted that, as a possible implementation manner, when the above-mentioned S206 is implemented, the first service may be implemented by calling a system program. For example, the system program records the state of each service in the topology structure in real time, such as abnormal, normal, uninstalled, and the like, and the first service may directly query the system program to determine whether the service establishing the first-class correspondence with the first service has been uninstalled.
In addition, the system program may be responsible for maintaining the topology, such as initializing the topology, restarting the service, adding the service to the topology, removing the service from the topology, and the like, and the first service may also be implemented by calling the system program when the service establishing the first type of correspondence relationship with the first service is instructed to be removed from the topology; it is understood that, in some other possible implementation manners of the embodiment of the present application, the operation of removing the service that establishes the first-type correspondence with the first service in the topology structure may also be implemented by implanting a set topology updating program in the first service in advance to remove the service that establishes the first-type correspondence with the first service in the topology structure and update the topology structure, as long as the service that establishes the first-type correspondence with the first service can be removed in the topology structure, and the topology structure can be reconstructed based on the remaining services.
Similarly, when the first service is realized to indicate that the service establishing the first-class corresponding relationship with the first service is restarted, the first service can also call a system program to realize the restart; it can also be understood that, in some other possible implementation manners of the embodiment of the present application, a set restart instruction program may be implanted in the first service in advance, so that the first service automatically restarts the service establishing the first-class correspondence with the first service, and an operation manner of each service instructing other services to restart in the embodiment of the present application is not limited; for example, it can also be: each service is preset with a restart program which can be used for self-restarting, and when the first service determines that the service establishing the first-class corresponding relationship with the first service is not unloaded, the first service can automatically restart the service establishing the first-class corresponding relationship with the first service by sending an activation instruction to the restart program arranged in the service establishing the first-class corresponding relationship with the first service.
Exemplarily, it is assumed that the format of the normal heartbeat information of the service corresponding to the first service in the first-class correspondence is { "flag": x, y }; in this format, "flag" is first identification information; x is a working state identifier, and a value of "x" is used to indicate a working state of the service, for example, 0 may be used to indicate a failure (that is, the first service does not receive heartbeat information sent by a service that establishes the first-class correspondence with the first service), 1 may be used to indicate a normal state (that is, the first service receives heartbeat information sent by a service that establishes the first-class correspondence with the first service), and 2 may be used to indicate a restart state (that is, a service that establishes the first-class correspondence with the first service is being restarted); y is the synchronization status identifier, and a value of "y" is used to indicate whether the service has a second-class correspondence with other services in the topology, for example, 0 may be used to indicate that no second-class correspondence is established with other services, and 1 may indicate that a second-class correspondence is established with other services.
According to the above example and with reference to the application scenario shown in fig. 3, it is assumed that the service C is the first service, and the service B is the service corresponding to the service C in the first-class correspondence relationship. If the target heartbeat information obtained by the service C is { "flag": 1,0 }, the service C determines that the abnormal service is the service B according to first identification information 'flag' contained in the target heartbeat information { "flag": 1,0 }; determining that the service B does not establish a second type corresponding relation with other services according to the second-bit 0 in the [1,0] ", and calling a system program by the service C to judge whether the service B is unloaded; if the service C determines that the service B is unloaded, calling a system program to remove the service B from the topological structure, reconstructing the topological structure based on the rest of the service A, the service C, the service D and the service E, and removing { "flag": 1,0 } from the set heartbeat information recorded by the service C; if the service C determines that the service B is not unloaded, the service C indicates to restart the service B, for example, a system program is called to restart the service B, and the '1' in the target heartbeat information is updated to '2', and the '2' is the restart identifier, that is, the target heartbeat information is updated to { 'flag': 2,0 }, which indicates that the service B is being restarted.
It should be noted that, in the above example, for the target heartbeat information { "flag": 1,0 } recorded by service C, 1 in the target heartbeat information may be recorded by service C, and 0 in the target heartbeat information is recorded by service B; when the service C normally receives the heartbeat information sent by the service B, the content of the heartbeat information received by the service C can be { "flag": x,0 }, wherein x is an unknown quantity (or can also be a known quantity, such as 1 by default), and when the service C determines that the heartbeat information of the service B is received normally, the service C updates x to 1, namely the set heartbeat information recorded by the service C is updated to { "flag": 1,0 }; if the service C judges that the heartbeat information of the service B is received abnormally, the service C cannot normally receive the heartbeat information sent by the service B, the service C updates '1' in the target heartbeat information to '0', namely records the heartbeat information of the service B as { 'flag': 0,0 }, and represents the fault of the service B; if the service C judges that the receiving service B is restarted, the service C updates '1' in the target heartbeat information to '2', namely records the heartbeat information of the service B as { 'flag': 2,0 }, and represents that the service B is restarted.
In addition, in the foregoing example, if the target heartbeat information obtained by the service C is { "flag": [1,1] }, which indicates that the service B establishes the second-class correspondence with other services, the service C directly records the heartbeat information of the service B as { "flag": [0,1] }, which indicates that the service B has failed.
On the other hand, please continue to refer to fig. 9, if the first service determines that the service requiring to send the target heartbeat information is not the service establishing the first-class correspondence with the first service, the service monitoring method further includes the following steps:
s210, the first service determines the service needing to send the target heartbeat information as a suspected abnormal service according to second identification information contained in the target heartbeat information, and determines a second service establishing a first-class corresponding relation with the suspected abnormal service;
s211, the first service inquires the working state of the service suspected to be abnormal from the second service; if the first service queries that the service suspected to be abnormal is normal or is restarting, executing S212; if the service work suspected to be abnormal is abnormal, the step S213 is executed;
s212, the first service waits for receiving heartbeat information sent by the service suspected to be abnormal at the next heartbeat detection time point of the set heartbeat detection time point;
s213, the first service judges whether the service suspected to be abnormal is unloaded; if unloaded, go to S214; if not, go to S215;
s214, the first service indicates that the service suspected to be abnormal is removed from the topological structure and the topological structure is reconstructed based on the rest services;
s208, the first service removes the target heartbeat information from the set heartbeat information;
s215, the first service indicates to restart the suspected abnormal service, and sends a restart recording instruction to the second service, so that the second service records the condition that the suspected abnormal service is being restarted.
In the embodiment of the present application, the heartbeat information sent between two services that establish the second-type correspondence includes second identification information, where the second identification information is used to indicate a service that establishes the first-type correspondence with the service that sends the heartbeat information; for example, if the heartbeat information sent between two services establishing the second-type correspondence is { "E": urlA }, it indicates that the service sending the heartbeat information is service E, the second identification information is "urlA", and urlA (query address) indicates that the service establishing the first-type correspondence with the service E sending the heartbeat information is service a. Therefore, if the first service determines that the service needing to send the target heartbeat information is not the service establishing the first-class corresponding relationship with the first service, the first service determines that the service needing to send the target heartbeat information is a suspected abnormal service according to the second identification information contained in the target heartbeat information, and determines that the second service establishing the first-class corresponding relationship with the abnormal service; for example, in the above example, if the service C is used as the first service, and the target heartbeat information determined by the service C is { "E": urlA }, the service C determines that the service E is a service suspected to be abnormal, and the service a having the first-class correspondence relationship with the service E is the second service.
On the basis, the first service can inquire the working state of the service suspected to be abnormal from the second service; on one hand, if the second service queries that the service suspected to be abnormal is normal or is restarting, the first service waits for the heartbeat information sent by the service suspected to be abnormal at the next heartbeat detection time point of the set heartbeat detection time point.
On the other hand, if the second service queries the suspected abnormal service to be abnormal, the first service further determines whether the suspected abnormal service is unloaded, if the first service determines that the suspected abnormal service is unloaded, the first service determines that the heartbeat information sent by the service which does not receive the suspected abnormal service is because the suspected abnormal service is unloaded and does not belong to the topology, and the first service indicates to remove the suspected abnormal service from the topology and reconstructs the topology based on the remaining services; otherwise, if the first service determines that the service suspected to be abnormal is not unloaded, the first service determines that the heartbeat information sent by the service suspected to be abnormal is not received because the service suspected to be abnormal is abnormal, at this moment, the first service instructs to restart the service suspected to be abnormal, and sends a restart recording instruction to the second service, so that the second service records the condition that the service suspected to be abnormal is being restarted.
For example, in the example that the service C is used as the first service, if the target heartbeat information determined by the service C is { "E": urlA }, it indicates that the service a (the second service) forms the first-class correspondence with the service E (the service suspected to be abnormal), and the service C queries the service a for the working state of the service E; if the service C inquires that the working state of the service E is { "flag": 1,1 } or { "flag": 2,1 }, it indicates that the service E is normal or is restarting, and at this time, the service C waits to receive heartbeat information sent by the service E at the next heartbeat detection time point of the set heartbeat detection time point; if the service C inquires that the working state of the service E is { "flag": 0,1 }, the service E is indicated to work abnormally, and the service C calls a system program to judge whether the service E is unloaded; if the service C determines that the service E is unloaded, the service C indicates that the service E is removed from the topological structure, and the topological structure is reconstructed by the rest of the service A, the service B, the service C and the service D, for example, a system program is called to remove the service E from the topological structure; if the service C determines that the service E is not unloaded, the service C indicates to restart the service E so as to automatically process the service E in the abnormal working state, and sends a restart record instruction to the service A so as to indicate the service A to record the state that the service E is being restarted, for example, the working state of the service E recorded by the service A is updated to { "flag": 2,1 }.
It should be noted that, as a possible implementation manner, the manner in which the first service queries the second service for the working state of the service suspected of being abnormal may be implemented by sending a query instruction from the first service to the second service and by the second service directly feeding back the working state of the service suspected of being abnormal to the first service, or by the first service obtaining recorded heartbeat information corresponding to the service suspected of being abnormal from the second service and then analyzing the heartbeat information fed back by the second service by the first service; the method for querying the working state of the service suspected to be abnormal from the second service by the first service is not limited in the embodiment of the present application, as long as the first service can query the working state of the service suspected to be abnormal from the second service, for example, the first service queries the working state of the service suspected to be abnormal from the second service, and may also be a method for the first service to invoke an interface of the second service, and locally analyze heartbeat information corresponding to the service suspected to be abnormal, which is recorded by the second service, at the second service, so as to obtain the working state of the service suspected to be abnormal.
Based on the same inventive concept as the above-mentioned service monitoring method provided in the embodiment of the present application, please refer to fig. 10, where fig. 10 is a schematic structural diagram of a service node 300 provided in the embodiment of the present application, the service node 300 runs a first service, and the first service and a plurality of other services together form a topology structure; each service contained in the topological structure is in a corresponding relation with at least one other service, and each service is used for receiving heartbeat information sent by the other service corresponding to the service and carrying out anomaly detection on the other service corresponding to the service; the service node 300 comprises a determining module 301 and a processing module 302; wherein:
the judging module 301 is configured to judge whether a condition that the first service receives the heartbeat information is abnormal;
the processing module 302 is configured to determine whether an abnormal service occurs in other services corresponding to the first service if it is determined that the condition that the first service receives the heartbeat information is abnormal.
Optionally, as a possible implementation manner, when there are a plurality of other services corresponding to the first service, the determining module 301 is specifically configured to, when determining whether the condition that the first service receives the heartbeat information is abnormal:
judging whether the quantity of the heartbeat information received by the first service at the set heartbeat detection time point is the same as a set value or not, wherein the set value is the quantity of the heartbeat information which needs to be received by the first service at the set heartbeat detection time point;
if the first service and the second service are different, determining that the condition that the first service receives the heartbeat information is abnormal;
and if so, determining that the condition that the first service receives the heartbeat information is normal.
Optionally, as a possible implementation manner, a correspondence relationship established between each service and any other service corresponding to the service is a first-class correspondence relationship or a second-class correspondence relationship, where each service in all services included in the topology structure establishes the first-class correspondence relationship with one service in the other services corresponding to the service, each service directly determines whether the service establishing the first-class correspondence relationship with the service is abnormal, and each service assists in determining whether the service establishing the second-class correspondence relationship with the service is abnormal;
when determining whether an abnormal service occurs in the other services corresponding to the first service, the processing module 302 is specifically configured to:
determining default target heartbeat information compared with set heartbeat information by the heartbeat information received by the first service at the set heartbeat detection time point, wherein the set heartbeat information comprises all heartbeat information which needs to be received by the first service at the heartbeat detection time point;
judging whether the service needing to send the target heartbeat information is a service establishing a first type of corresponding relation with the first service or not according to the target heartbeat information;
if yes, determining that the service establishing the first type corresponding relation with the first service is abnormal.
Optionally, as a possible implementation manner, when determining, according to the target heartbeat information, whether the service that needs to send the target heartbeat information is a service that establishes a first-class correspondence with the first service, the processing module 302 is specifically configured to:
judging whether the target heartbeat information contains first identification information used for representing the first-class corresponding relation;
if the first type of corresponding relation is established between the service needing to send the target heartbeat information and the first service, the service needing to send the target heartbeat information is determined to be the service establishing the first type of corresponding relation with the first service.
Optionally, as a possible implementation manner, if it is determined that a service establishing the first-class correspondence relationship with the first service is abnormal, the processing module 302 is further configured to:
judging whether a service establishing a first type corresponding relation with a first service has a second type corresponding relation with other services in the topological structure;
if the target heartbeat information is established, updating the identification information used for indicating the working state in the target heartbeat information into an abnormal identification so as to update the set heartbeat information.
Optionally, as a possible implementation manner, the processing module 302 is further configured to:
if the service establishing the first type corresponding relation with the first service does not establish the second type corresponding relation with other services in the topological structure, judging whether the service establishing the first type corresponding relation with the first service is unloaded;
if the first type of service is unloaded, indicating that the service establishing the first type of corresponding relation with the first service is removed from the topological structure and reconstructing the topological structure based on the rest services; the first service removes the target heartbeat information from the set heartbeat information;
and if the first type of service is not unloaded, indicating to restart the service establishing the first type of corresponding relation with the first service, and updating the identification information used for indicating the working state in the target heartbeat information into a restart identification so as to update the set heartbeat information.
Optionally, as a possible implementation manner, the processing module 302 is further configured to:
if the service needing to send the target heartbeat information is judged not to be the service establishing the first-class corresponding relation with the first service, determining the service needing to send the target heartbeat information to be the suspected abnormal service and determining the second service establishing the first-class corresponding relation with the suspected abnormal service according to second identification information contained in the target heartbeat information, wherein the second identification information is used for indicating the service establishing the first-class corresponding relation with the suspected abnormal service to be the second service;
querying the second service for the working state of the service suspected of being abnormal;
if the suspected abnormal service is normal or is restarted, waiting to receive heartbeat information sent by the suspected abnormal service at the next heartbeat detection time point of the set heartbeat detection time point;
if the suspected abnormal service work is abnormal, judging whether the suspected abnormal service is unloaded;
if the service is unloaded, indicating to remove the service suspected to be abnormal from the topological structure and reconstructing the topological structure based on the rest services; removing the target heartbeat information from the set heartbeat information;
if the service is not unloaded, the suspected abnormal service is restarted, and a restart recording instruction is sent to the second service, so that the second service records the condition that the suspected abnormal service is restarted.
Optionally, as a possible implementation, the topology is a ring topology; in the ring topology structure, each service and the last service adjacent along the first direction in the ring topology structure form a corresponding relation;
when determining whether the condition that the first service receives the heartbeat information is abnormal, the determining module 301 is specifically configured to:
judging whether heartbeat information sent by the last service adjacent to the first service in the first direction in the ring topology structure is received at the set heartbeat detection time point;
if so, judging that the condition that the first service receives the heartbeat information is normal;
if not, judging that the condition that the first service receives the heartbeat information is abnormal;
if it is determined that the condition that the first service receives the heartbeat information is abnormal, the processing module 302 is specifically configured to:
an exception is determined to occur in a previous service adjacent to the first service along a first direction in the ring topology.
It should be noted that the determining module 301 and the processing module 302 included in the service node 300 may be functional modules belonging to the first service, for example, a program code included in the first service is executed by the processor 102 in the server 100, so as to implement the service monitoring method described above; in some other possible implementation manners of the embodiment of the present application, the determining module 301 and the processing module 302 included in the service node 300 may also be function modules that are not associated with the first service, and at this time, the first service may implement the service monitoring method in a manner of calling and executing the determining module 301 and the processing module 302.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.
In summary, according to the service monitoring method, the service node, the server, and the computer-readable storage medium provided in the embodiments of the present application, a plurality of services including a first service together form a topology, and each service included in the topology is associated with at least one other service, so that each service in the topology can determine whether a service corresponding to each service in the association is abnormal by determining whether the receiving condition of heartbeat information is abnormal, thereby monitoring the working states of the plurality of services, and compared with the prior art, the monitoring function of the topology formed by the plurality of services is not required to be provided by an independent monitoring service, but each service in the topology monitors other services corresponding to the service in the topology, therefore, the complete loss of the topological structure service monitoring function caused by single-point failure of the independent monitoring service is avoided, and the monitoring service can be continuously provided for the topological structure.
In addition, the topological structure is set to be ring-shaped, and each service in the ring-shaped topological structure forms a corresponding relation with the last service adjacent to the ring-shaped topological structure along the first direction, so that each service in the ring-shaped topological structure monitors the last service adjacent to the ring-shaped topological structure along the first direction, the data amount in service monitoring can be simplified, and the redundancy is reduced.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (11)

1. A service monitoring method is characterized in that the method is applied to a service node running a first service, and the first service and a plurality of other services jointly form a topological structure;
each service contained in the topological structure is in a corresponding relation with at least one other service, and each service is used for receiving heartbeat information sent by the other service corresponding to the service and carrying out anomaly detection on the other service corresponding to the service;
the corresponding relation established between each service and any other corresponding service is a first-class corresponding relation or a second-class corresponding relation, wherein each service in all the services contained in the topological structure establishes the first-class corresponding relation with one of the other corresponding services, each service directly judges whether the service establishing the first-class corresponding relation with the service is abnormal, and each service assists in judging whether the service establishing the second-class corresponding relation with the service is abnormal;
the direct judgment is that the service judges whether other services corresponding to the service are abnormal according to whether heartbeat information sent by other services corresponding to the service is received, and the auxiliary judgment is that the service judges whether other services corresponding to the service are suspected to be abnormal according to whether heartbeat information sent by other services corresponding to the service is received;
the method comprises the following steps:
the first service judges whether the receiving condition of the heartbeat information is abnormal or not;
if the receiving condition of the heartbeat information is determined to be abnormal, the first service judges whether abnormal services exist in other services corresponding to the first service.
2. The method of claim 1, wherein when there are a plurality of other services corresponding to the first service, the determining, by the first service, whether the receiving condition of the heartbeat information is abnormal includes:
the first service judges whether the quantity of the heartbeat information received at a set heartbeat detection time point is the same as a set value, wherein the set value is the quantity of the heartbeat information which needs to be received by the first service at the set heartbeat detection time point;
if the two types of information are different, the first service determines that the receiving condition of the heartbeat information is abnormal;
and if the received heartbeat information is the same as the received heartbeat information, the first service determines that the receiving condition of the heartbeat information is normal.
3. The method of claim 2, wherein the determining, by the first service, whether the abnormal service occurs in the other services corresponding to the first service comprises:
the first service determines that the heartbeat information received at the set heartbeat detection time point is compared with set heartbeat information, and the target heartbeat information is default, wherein the set heartbeat information comprises all heartbeat information which needs to be received by the first service at the heartbeat detection time point;
the first service judges whether the service needing to send the target heartbeat information is the service establishing the first-class corresponding relation with the first service or not according to the target heartbeat information;
if so, the first service determines that the service establishing the first-class corresponding relation with the first service is abnormal.
4. The method of claim 3, wherein the determining, by the first service according to the target heartbeat information, whether the service that needs to send the target heartbeat information is a service that establishes the first-class correspondence with the first service includes:
the first service judges whether the target heartbeat information contains first identification information used for representing the first-class corresponding relation;
if the first type of service exists, the first service determines that the service needing to send the target heartbeat information is the service establishing the first type of corresponding relationship with the first service.
5. The method according to claim 3 or 4, wherein if the first service determines that the service establishing the first type correspondence with the first service is abnormal, the method further comprises:
the first service judges whether the service establishing the first class corresponding relation with the first service establishes the second class corresponding relation with other services in the topological structure;
if so, the first service updates the identification information used for indicating the working state in the target heartbeat information into an abnormal identification so as to update the set heartbeat information.
6. The method of claim 5, wherein the method further comprises:
if the service establishing the first-class corresponding relationship with the first service does not establish the second-class corresponding relationship with other services in the topological structure, the first service judges whether the service establishing the first-class corresponding relationship with the first service is unloaded;
if the first service is unloaded, the first service instruction removes the service establishing the first class corresponding relation with the first service in the topological structure and reconstructs the topological structure based on the rest services; the first service removes the target heartbeat information from the set heartbeat information;
and if the first service is not unloaded, the first service indicates to restart the service establishing the first type of corresponding relationship with the first service, and the identification information used for indicating the working state in the target heartbeat information is updated to be a restart identification so as to update the set heartbeat information.
7. The method of claim 3 or 4, further comprising:
if the first service determines that the service needing to send the target heartbeat information is not the service establishing the first-class corresponding relationship with the first service, the first service determines, according to second identification information contained in the target heartbeat information, that the service needing to send the target heartbeat information is a service suspected to be abnormal and determines a second service establishing the first-class corresponding relationship with the service suspected to be abnormal, wherein the second identification information is used for indicating that the service establishing the first-class corresponding relationship with the service suspected to be abnormal is the second service;
the first service inquires the second service about the working state of the service suspected to be abnormal;
if the first service inquires that the service suspected to be abnormal is normal or is restarted, the first service waits to receive heartbeat information sent by the service suspected to be abnormal at a next heartbeat detection time point of the set heartbeat detection time point;
if the suspected abnormal service work is abnormal, the first service judges whether the suspected abnormal service is unloaded;
if the service is unloaded, the first service instruction removes the service suspected to be abnormal from the topological structure and reconstructs the topological structure based on the rest services; the first service removes the target heartbeat information from the set heartbeat information;
if the suspected abnormal service is not unloaded, the first service indicates to restart the suspected abnormal service, and a restart recording instruction is sent to the second service, so that the second service records the condition that the suspected abnormal service is being restarted.
8. The method of claim 1, wherein the topology is a ring topology; in the ring topology structure, each service and the last service adjacent along the first direction in the ring topology structure form the corresponding relation;
the first service determining whether the receiving condition of the heartbeat information is abnormal includes:
the first service judges whether heartbeat information sent by a previous service adjacent to the first service in a first direction in the ring topology structure is received at a set heartbeat detection time point;
if so, the first service judges that the receiving condition of the heartbeat information is normal;
if the heartbeat information is not received, the first service judges that the receiving condition of the heartbeat information is abnormal;
if the first service determines that the receiving condition of the heartbeat information is abnormal, the first service determines whether an abnormal service occurs in other services corresponding to the first service, including:
the first service determines that an abnormality occurs in a last service adjacent to the first service in a first direction in the ring topology.
9. A service node, wherein the service node runs a first service, and the first service and a plurality of other services together form a topology;
each service contained in the topological structure is in a corresponding relation with at least one other service, and each service is used for receiving heartbeat information sent by the other service corresponding to the service and carrying out anomaly detection on the other service corresponding to the service;
the corresponding relation established between each service and any other corresponding service is a first-class corresponding relation or a second-class corresponding relation, wherein each service in all the services contained in the topological structure establishes the first-class corresponding relation with one of the other corresponding services, each service directly judges whether the service establishing the first-class corresponding relation with the service is abnormal, and each service assists in judging whether the service establishing the second-class corresponding relation with the service is abnormal;
the direct judgment is that the service judges whether other services corresponding to the service are abnormal according to whether heartbeat information sent by other services corresponding to the service is received, and the auxiliary judgment is that the service judges whether other services corresponding to the service are suspected to be abnormal according to whether heartbeat information sent by other services corresponding to the service is received;
the service node comprises:
the judging module is used for judging whether the condition that the first service receives the heartbeat information is abnormal or not;
and the processing module is used for judging whether abnormal services occur in other services corresponding to the first service if the condition that the first service receives the heartbeat information is determined to be abnormal.
10. A server, comprising:
a memory for storing one or more programs;
a processor;
the one or more programs, when executed by the processor, implement the method of any of claims 1-8.
11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-8.
CN201910649658.8A 2019-07-18 2019-07-18 Service monitoring method, service node, server and computer readable storage medium Active CN110417586B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910649658.8A CN110417586B (en) 2019-07-18 2019-07-18 Service monitoring method, service node, server and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910649658.8A CN110417586B (en) 2019-07-18 2019-07-18 Service monitoring method, service node, server and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110417586A CN110417586A (en) 2019-11-05
CN110417586B true CN110417586B (en) 2022-04-08

Family

ID=68361945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910649658.8A Active CN110417586B (en) 2019-07-18 2019-07-18 Service monitoring method, service node, server and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110417586B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651294B (en) * 2020-05-13 2023-07-25 浙江华创视讯科技有限公司 Node abnormality detection method and device
CN112711466B (en) * 2021-03-25 2021-08-10 北京金山云网络技术有限公司 Hanging affair inspection method and device, electronic equipment and storage medium
CN113645102B (en) * 2021-10-14 2022-02-08 腾讯科技(深圳)有限公司 Method and device for determining route convergence time
CN114189464A (en) * 2021-11-24 2022-03-15 国能大渡河瀑布沟发电有限公司 Communication abnormity monitoring and alarming method for power monitoring system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102215123A (en) * 2011-06-07 2011-10-12 南京邮电大学 Multi-ring-network-topology-structure-based large-scale trunking system
CN103763155A (en) * 2014-01-24 2014-04-30 国家电网公司 Multi-service heartbeat monitoring method for distributed type cloud storage system
CN104811325A (en) * 2014-01-24 2015-07-29 华为技术有限公司 Cluster node controller monitoring method, related device and controller
CN107733684A (en) * 2017-08-31 2018-02-23 北京宇航***工程研究所 A kind of multi-controller computing redundancy cluster based on Loongson processor
CN109361525A (en) * 2018-10-25 2019-02-19 珠海派诺科技股份有限公司 Restart method, apparatus, controlling terminal and medium that distributed deployment services more
CN109714183A (en) * 2017-10-26 2019-05-03 阿里巴巴集团控股有限公司 Data processing method and device in a kind of cluster

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102215123A (en) * 2011-06-07 2011-10-12 南京邮电大学 Multi-ring-network-topology-structure-based large-scale trunking system
CN103763155A (en) * 2014-01-24 2014-04-30 国家电网公司 Multi-service heartbeat monitoring method for distributed type cloud storage system
CN104811325A (en) * 2014-01-24 2015-07-29 华为技术有限公司 Cluster node controller monitoring method, related device and controller
CN107733684A (en) * 2017-08-31 2018-02-23 北京宇航***工程研究所 A kind of multi-controller computing redundancy cluster based on Loongson processor
CN109714183A (en) * 2017-10-26 2019-05-03 阿里巴巴集团控股有限公司 Data processing method and device in a kind of cluster
CN109361525A (en) * 2018-10-25 2019-02-19 珠海派诺科技股份有限公司 Restart method, apparatus, controlling terminal and medium that distributed deployment services more

Also Published As

Publication number Publication date
CN110417586A (en) 2019-11-05

Similar Documents

Publication Publication Date Title
CN110417586B (en) Service monitoring method, service node, server and computer readable storage medium
CN110661659B (en) Alarm method, device and system and electronic equipment
CN110798375B (en) Monitoring method, system and terminal equipment for enhancing high availability of container cluster
US8156378B1 (en) System and method for determination of the root cause of an overall failure of a business application service
CN111459770A (en) Server operation state warning method and device, server and storage medium
CN113778985A (en) Microservice architecture monitoring method, microservice architecture monitoring device, computer equipment and storage medium
CN109218401B (en) Log collection method, system, computer device and storage medium
CN110543512A (en) Information synchronization method, device and system
CN113608839A (en) Cluster alarm method and device, computer equipment and storage medium
CN106021070A (en) Method and device for server cluster monitoring
CN117130730A (en) Metadata management method for federal Kubernetes cluster
JP2003233512A (en) Client monitoring system with maintenance function, monitoring server, program, and client monitoring/ maintaining method
CN111510351B (en) Anomaly detection method and device based on Promissuris monitoring system
CN112882892B (en) Data processing method and device, electronic equipment and storage medium
CN115712521A (en) Cluster node fault processing method, system and medium
CN111309515A (en) Disaster recovery control method, device and system
JP4230946B2 (en) Application monitoring apparatus, program thereof, and recording medium thereof.
CN114816866A (en) Fault processing method and device, electronic equipment and storage medium
CN114143330A (en) Configuration method, device and system of time server
CN114036032A (en) Real-time program monitoring method and device
CN113676356A (en) Alarm information processing method and device, electronic equipment and readable storage medium
CN113835916A (en) Ambari big data platform-based alarm method, system and equipment
CN113381884A (en) Full link monitoring method and device for monitoring alarm system
CN112068935A (en) Method, device and equipment for monitoring deployment of kubernets program
CN110837431A (en) Service control method, service control device, computer equipment and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant