CN109039718B - Online service warning method and system - Google Patents

Online service warning method and system Download PDF

Info

Publication number
CN109039718B
CN109039718B CN201810799040.5A CN201810799040A CN109039718B CN 109039718 B CN109039718 B CN 109039718B CN 201810799040 A CN201810799040 A CN 201810799040A CN 109039718 B CN109039718 B CN 109039718B
Authority
CN
China
Prior art keywords
alarm
service
information
record information
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810799040.5A
Other languages
Chinese (zh)
Other versions
CN109039718A (en
Inventor
喻志曾
李振
周小溪
马磊
耿直
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Manyun Software Technology Co Ltd
Original Assignee
Jiangsu Manyun Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Manyun Software Technology Co Ltd filed Critical Jiangsu Manyun Software Technology Co Ltd
Priority to CN201810799040.5A priority Critical patent/CN109039718B/en
Publication of CN109039718A publication Critical patent/CN109039718A/en
Application granted granted Critical
Publication of CN109039718B publication Critical patent/CN109039718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/064Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis involving time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Alarm Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses an alarming method and a system for on-line service, wherein the method comprises the following steps: when receiving an alarm request message sent by an online service, acquiring the current error report message of the alarm service; the alarm request information includes: the alert service and service leader; when the preset alarm condition is met, sending alarm prompt information to the service responsible person; and creating alarm record information with a state to be processed under the alarm service according to the current error reporting information and the alarm request information. The invention sends the alarm prompt information to the relevant service responsible person in real time, thereby being convenient for finding out the problems existing on the line as soon as possible; the current error reporting information is acquired, so that an engineer can conveniently locate a specific fault position at once, and the solution efficiency is improved; the state of the alarm record information can be changed according to the feedback of the service responsible person, so that the management is facilitated, and the monitoring and tracking of the alarm are facilitated.

Description

Online service warning method and system
Technical Field
The invention relates to the field of online alarm, in particular to an online service alarm method and system.
Background
The IT industry is characterized in that an online system continuously operates, the stability of the system is the life line of an enterprise, and the fault problem needs to be accurately positioned and quickly solved at the first time.
At present, when an online service reports errors or is abnormal, a project principal or a team cannot obtain corresponding error reporting information at the first time, often, a customer service can obtain the information and contacts related personnel after receiving feedback of a user, the related personnel need to troubleshoot problems after receiving notification, and error reasons are found, so that a long time is wasted. When the related personnel receive the alarm notification, the related personnel may be going out for work, on the way to work, and the like, and no condition or tool can see specific errors, so that the problems cannot be checked in time, and the time that the wrong service cannot be used normally is further prolonged.
Disclosure of Invention
The invention aims to provide an on-line service warning method and system, which can inform relevant service responsible persons in time when the on-line service has problems so as to solve the problem of faults as soon as possible.
The technical scheme provided by the invention is as follows:
an alarm method for on-line service comprises the following steps: when receiving an alarm request message sent by an online service, acquiring the current error report message of the alarm service; the alarm request information includes: the alert service and service leader; when the preset alarm condition is met, sending alarm prompt information to the service responsible person; and creating alarm record information with a state to be processed under the alarm service according to the current error reporting information and the alarm request information.
In the technical scheme, when the online service has a problem, the alarm prompt message can be sent to the service responsible person in time to process the problem in time; and the error reporting information is automatically collected, the labor of an engineer for manually screening the error reporting information is reduced, and the failure solving efficiency is improved.
Further, the alarm request information further includes: a type of alarm; when receiving an alarm request message sent by an online service, the method for acquiring the current error report message of the alarm service specifically comprises the following steps: when the alarm type is an abnormal alarm type, acquiring at least one piece of error report log information in the current service time period under the alarm service as the current error report information; and when the alarm type is an error alarm type, acquiring core data under the alarm service as the current error reporting information.
In the technical scheme, different alarm types and different current error reporting information acquisition modes are provided, and various alarm types are used and managed.
Further, when receiving an alarm request message sent by an online service, after obtaining the current error report message of the alarm service, the method further includes the following steps: and when the existing state of the alarm service is the alarm record information to be processed or in the process, updating the error reporting information and the updating time of the alarm record information according to the current error reporting information and the alarm time in the alarm request information.
In the technical scheme, the error information in the current error information updating alarm record information ensures the real-time performance and the accuracy of the alarm record information, and the updating and modifying time facilitates the service responsible person to know the updating time of the error information in the alarm record information.
Further, when receiving an alarm request message sent by an online service, after obtaining the current error report message of the alarm service, the method further includes the following steps: when the alarm service does not exist alarm record information in a state of waiting for processing or in processing, or neglected record information, the alarm service is considered to be in accordance with a preset alarm condition; wherein, the neglected record information refers to the alarm record information with the state of neglecting; when the alarm service does not exist alarm record information in a state of waiting for processing or in processing but exists neglected record information, judging whether the neglected record information in an effective time range contains the acquired current error reporting information; when the neglected record information in the valid time range contains the acquired current error reporting information, not sending an alarm prompt message to the service responsible person; and when the neglected record information in the valid time range does not contain the acquired current error reporting information, the preset alarm condition is considered to be met.
In the technical scheme, different processing modes are adopted according to different states of the alarm record information under the alarm service, so that the service responsible person is not disturbed frequently, the service responsible person can be informed in time when problems exist, and the timeliness of the alarm is ensured.
Further, the method also comprises the following steps: when receiving response information in processing sent by a service responsible person according to the received alarm prompt information, updating the state of the corresponding alarm record information under the alarm service into the state in processing; and when receiving response information of solution sent by the service responsible person according to the received alarm prompt information, updating the state of the corresponding alarm record information under the alarm service to solution.
Further, the method also comprises the following steps: and when receiving the neglect response information sent by the service responsible person according to the received alarm prompt information, updating the state of the corresponding alarm record information under the alarm service to be ignored, and recording the effective time range of the alarm record information.
In the technical scheme, the service responsible person can send corresponding response information according to the actual processing condition of the service responsible person, and the state of the corresponding alarm record information is updated, so that the management is facilitated, and meanwhile, the monitoring and tracking of the alarm are facilitated.
The invention also provides an alarm system for on-line service, which comprises: the log acquisition module is used for acquiring the current error report information of the alarm service when receiving the alarm request information sent by the online service; the alarm request information includes: the alert service and service leader; the alarm sending module is used for sending alarm prompt information to the service responsible person when the preset alarm condition is met; and the record creating module is used for creating alarm record information with a state of waiting for processing under the alarm service according to the current error reporting information and the alarm request information.
In the technical scheme, when the online service has a problem, the alarm prompt message can be sent to the service responsible person in time to process the problem in time; and the error reporting information is automatically collected, the labor of an engineer for manually screening the error reporting information is reduced, and the failure solving efficiency is improved.
Further, still include: and the record updating module is used for updating the error reporting information and the updating time of the alarm record information according to the current error reporting information and the alarm time in the alarm request information when the existing state of the alarm service is the alarm record information to be processed or in the process.
Further, still include: the condition judgment module is used for considering that the alarm request information accords with the preset alarm condition when the on-line service corresponding to the alarm request information does not have alarm record information in a state of waiting for processing or in processing and neglected record information; wherein, the neglected record information refers to the alarm record information with the state of neglecting; and when the alarm service does not have the alarm record information in the state of waiting for processing or in processing but has the neglected record information, judging whether the neglected record information in the valid time range contains the acquired current error reporting information; and when the neglected record information in the valid time range does not contain the acquired current error reporting information, the preset alarm condition is considered to be met; and the alarm sending module is further used for not sending alarm prompt information to the service responsible person when the neglected record information in the valid time range contains the acquired current error reporting information.
Further, still include: and the state updating module is used for updating the state of the corresponding alarm record information under the alarm service to be ignored and recording the effective time range of the alarm record information when receiving the ignore response information sent by the service responsible person according to the received alarm prompt information.
Compared with the prior art, the method and the system for alarming the online service have the advantages that:
the invention sends the alarm prompt information to the relevant service responsible person in real time, thereby being convenient for finding out the problems existing on the line as soon as possible; the current error reporting information is acquired, so that an engineer can conveniently locate a specific fault position at once, and the solution efficiency is improved; the state of the alarm record information can be changed according to the feedback of the service responsible person, so that the management is facilitated, and the monitoring and tracking of the alarm are facilitated.
Drawings
The above features, technical features, advantages and implementations of an online service alerting method and system will be further described in the following detailed description of preferred embodiments in a clearly understandable manner, in conjunction with the accompanying drawings.
FIG. 1 is a flow chart of one embodiment of an alerting method for an online service of the present invention;
FIG. 2 is a flow chart of another embodiment of an alerting method for an online service of the present invention;
FIG. 3 is a schematic diagram of an embodiment of an alert system for online services in accordance with the present invention;
fig. 4 is a schematic structural diagram of another embodiment of the alarm system for online service according to the present invention.
The reference numbers illustrate:
10. the system comprises a log obtaining module, 20 an alarm sending module, 30 a record creating module, 40 a record updating module, 50 a condition judging module and 60 a state updating module.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will be made with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.
For the sake of simplicity, the drawings only schematically show the parts relevant to the present invention, and they do not represent the actual structure as a product. In addition, in order to make the drawings concise and understandable, components having the same structure or function in some of the drawings are only schematically illustrated or only labeled. In this document, "one" means not only "only one" but also a case of "more than one".
In an embodiment of the present invention, as shown in fig. 1, an alarm method for online service includes the following steps:
s101, when receiving an alarm request message sent by an online service, acquiring the current error report message of the alarm service; the alarm request information includes: alarm service and service responsible;
s102, when the preset alarm condition is met, sending alarm prompt information to a service responsible person;
s103, according to the current error reporting information and the alarm request information, alarm record information with a state to be processed is established under the alarm service.
Specifically, the online service includes various running services, servers and other software programs or hardware entities that support normal running of the online system. Problems may occur with each online service (whether hardware or software) and when a problem occurs, an alarm request message is sent.
The alarm request information sent by the online service comprises: alarm service, i.e. the name of the on-line service where the problem occurs, for example: a login service, a server number 10, etc. The service person in charge, i.e. the person in charge of the service in question. The alarm service is to determine which on-line service has a problem, and the reason for including the service responsible person in the alarm request message is to send the alarm prompt message to the correct person as soon as possible, so that the on-line service with the problem can be recovered to normal as soon as possible.
Optionally, the alert request information further includes: alarm time, i.e. the time when a problem is serviced online, for subsequent recording and management; and/or, to alert the general content, i.e., the summary of problems with the online service. The overall alarm content comprises any one or more of the following: alarm types, such as: abnormal alarm for indicating abnormal condition of service, error alarm for indicating service operation error, etc. If abnormal alarm occurs, the actual value and the threshold value can be put into the overall alarm content.
In actual operation, each time the on-line service finds an operation error, the on-line service can also count whether the current error reporting quantity exceeds a threshold value or not besides reporting the alarm request information of which the alarm type is an error alarm. The threshold value is artificially set according to the actual requirement, such as: the threshold value of the number of error reports in one day is 100, if the current number of error reports is counted to be 105, and exceeds 100, the alarm request information with the alarm type of error alarm is reported when the operation error is found, and the alarm request information with the alarm type of abnormal alarm is also reported when the current number of error reports is counted to exceed the threshold value. The actual value in the alarm request message for which the alarm type is an abnormal alarm is 105, and the threshold value is 100.
When receiving the alarm request information sent by the online service, the current error reporting information of the alarm service indicated in the alarm request information can be automatically acquired, thereby facilitating the subsequent recording and management.
The preset alarm condition is adjusted by an engineer according to the actual situation. The preset alarm condition in this embodiment may be: the alarm record information of which the state is in process or to be processed does not exist under the alarm service, and the alarm service is considered to be in accordance with the preset alarm condition as long as the condition is met.
In other embodiments, the preset alarm condition may be: there is no alarm record information whose status is in process or pending or ignored. The following steps can be also included: there is no alarm record information in the processing state or waiting for processing, and there is no alarm record information which contains the current registration information, is in the valid time range and is in the neglected state. The subsequent execution steps can be adaptively adjusted according to different preset alarm condition settings.
The sending of the alarm prompt message can adopt different sending modes, such as: the invention is not limited to the sending mode, and only needs to enable the service responsible person to receive the warning prompt information. The alert prompt information may include only: alarm service, current error reporting information, and alarm time. The content is as brief and clear as possible, and visual fatigue caused by excessive information is prevented. The service responsible person only needs the cell-phone can be networked and can receive warning prompt message through communication APP customer ends such as nail at any time and any place, look over concrete wrong content, can feed back response message simultaneously and change the state of reporting an emergency and asking for help or increased vigilance the record information.
After the alarm prompt information is sent to the service responsible person, an alarm record information with a state to be processed is established under the alarm service. The alarm record information comprises: overall alarm content, service responsible person, creation time, modification time, error reporting information, status.
When the alarm record information is created, the creation time and the modification time of the alarm record information can directly adopt the alarm time in the alarm request information. And storing the acquired current error reporting information as the error reporting information in the alarm record information. Optionally, the alarm record information further includes: and (5) alarm service. Although the alarm record is under the name of the alarm service directly created, the alarm service can also be stored in the alarm record for easy viewing.
Specific examples are as follows: receiving an alarm request message, wherein the alarm service included in the alarm request message is service A, the service responsible person is Zhang III, the alarm time is 2018/6/2818:30:26, and the overall alarm content is an error alarm; the preset alarm condition is that no alarm record information is to be processed or in process under the alarm service. When receiving the alarm request message, the current error report information of the service A is firstly obtained, and comprises error report 1 and error report 2. Finding that the alarm record information of which the state is to be processed or in the process does not exist under the service A, indicating that the alarm record information accords with the preset alarm condition, sending alarm prompt information to a tee through a nail, and establishing an alarm record information of which the state is to be processed under the service A, wherein the alarm record information comprises: the error alarm, Zhang III, the creation time and the modification time are 2018/6/2818:30:26, the error information is error 1 and error 2, and the state is pending.
In the embodiment, when the online service has a problem, the alarm prompt message can be sent to the service responsible person in time to process the problem in time; and the error reporting information is automatically collected, the labor of an engineer for manually screening the error reporting information is reduced, and the failure solving efficiency is improved.
In another embodiment of the present invention, in order to avoid redundancy, the same parts as those in the above embodiment are not described repeatedly, and the method includes the following steps:
s201, when receiving an alarm request message sent by an online service, acquiring the current error report message of the alarm service; the alarm request information includes: alarm service, service leader, and alarm type.
Step S201 specifically includes:
s211, when the alarm type is an abnormal alarm type, acquiring at least one error report log message in the current service time period under the alarm service as the current error report message;
s221, when the alarm type is the error alarm type, core data under the alarm service is acquired as current error reporting information.
Specifically, different alarm types may be different from each other when acquiring the current error reporting information.
If the type of the alarm is abnormal, the current error reporting quantity of the online service statistics exceeds a threshold value, so that a plurality of pieces of error reporting log information in the current service time period of the alarm service are captured from the system monitoring log platform when the current error reporting information is obtained.
The current service time period is determined according to the setting of the system, and some systems take one hour as a unit, so that the current service time period is within the current hour; some systems are in units of 3 hours, and the current service period is the current 3 hours. For example: the alarm time is 2018/3/2115: 20:32 in units of one hour, then the current service period is 2018/3/2115:00:00-15:59: 59.
In order to make the collected current error information as compact, important and clear as possible, only important error log information is screened as the current error information, for example: only the error log information after the preset keywords is captured. The preset keywords are set according to the actual use condition.
Practical examples are as follows: the alarm service is service B, the alarm type is abnormal alarm, the preset keyword is error, and the alarm time is 2018/3/2115: 20: 32. The current error information reporting process of the service B is as follows: the error log information under the next directory of error under the error page in service B2018/3/2115: 00:00-15:59:59 has several scratch entries, which are two pieces of error log information as shown in the following table one: "com. And other error log information under the error page is ignored. The captured pieces of error log information are the current error information of the service B.
Watch 1
Figure BDA0001736665790000081
If the type of the error alarm indicates that the online service has an operation error, the core data of the alarm service is generally acquired and used as the current error reporting information. For example: the alarm service is a No. 5 server, and the error alarm type takes the IP and the problem reason of the No. 5 server as the current error report information of the No. 5 server.
S202 judges whether the alarm service has alarm record information in a state of waiting for processing or in processing, if yes, S203 is executed, and if not, S204 is executed.
S203, when the existing state of the alarm service is the alarm record information to be processed or in the process, updating the error reporting information and the updating time of the alarm record information according to the current error reporting information and the alarm time in the alarm request information.
Specifically, when there is alarm record information in a state of waiting for processing or in processing in the alarm service, it indicates that the service manager has received the alarm prompt information, and does not need to notify again.
And the update time in the existing alarm record information is updated with the alarm time in the alarm request information of this time. And updating error information in the alarm record information by using the current error information.
The update modification time is the update time for the service responsible person to know the error information in the alarm record information conveniently.
The current error reporting information is obtained according to the received alarm request information, namely the latest error reporting condition, so that the real-time performance and the accuracy of the alarm record information are ensured by updating the error reporting information in the alarm record information by using the current error reporting information.
S204, judging whether the record ignoring information exists under the alarm service, wherein the record ignoring information refers to the alarm record information with the state of ignoring, if not, executing S205, and if so, executing S206.
Specifically, the state of the alarm record information is changed according to the information fed back by the service responsible person, and the omission of the record information indicates that the corresponding error report information does not affect the large office or the subsequent automatic solution, and can be omitted.
S205, when the alarm record information in the state of waiting for processing or in processing does not exist under the alarm service, and the record information does not exist, the alarm service is considered to be in accordance with the preset alarm condition, and the process skips to S207.
S206, when the alarm record information in the state of waiting for processing or in processing does not exist under the alarm service, but the neglected record information exists, judging whether the neglected record information in the effective time range contains the acquired current error reporting information, if not, judging that the preset alarm condition is met, and jumping to S207, and if so, executing S208.
Specifically, if there is already one piece of neglected record information within the valid time range that includes the current error reporting information, it is indicated that the fault has been handled by the service responsible person, and is considered to be negligible, and the valid time range is set, but is currently within the valid time range, so that there is no need to send an alarm prompt message, and repeated processing is avoided.
The alarm time can be used for judging whether the neglected recording information is in the effective time range, the alarm time accurately reflects the time of error information reporting, and the judgment accuracy is improved. Of course, the determination can also be directly made according to the current time of the system.
S207, when the preset alarm condition is met, sending alarm prompt information to a service responsible person, and creating alarm record information with a state to be processed under the alarm service according to the current error report information and the alarm request information.
The sending of the alarm prompt message can adopt different sending modes, such as: the invention is not limited to the sending mode, and only needs to enable the service responsible person to receive the warning prompt information. The alert prompt information may include only: alarm service, current error reporting information, and alarm time. The content is as brief and clear as possible, and visual fatigue caused by excessive information is prevented.
The service responsible person only needs the cell-phone can be networked and can receive warning prompt message through communication APP customer ends such as nail at any time and any place, look over concrete wrong content, can feed back response message simultaneously and change the state of reporting an emergency and asking for help or increased vigilance the record information.
S208, when the neglected record information in the valid time range contains the acquired current error reporting information, the warning prompt information is not sent to the service responsible person.
Specifically, the present embodiment specifically defines the preset alarm condition, that is, there is no alarm record information whose state is to be processed or in processing or no ignore record information in the alarm service, or there is no alarm record information whose state is to be processed or in processing but no ignore record information which is in an effective time range and includes the acquired current error reporting information. And when the condition is in accordance with any one of the conditions, the preset alarm condition is considered to be met, and alarm prompt information is sent to the service responsible person.
For example: the alarm time is 2016/6/2804: 05:30, the current error information comprises 2 pieces of error information a1 and a2, the neglected recording information comprises 3 pieces of neglected recording information, the effective time range of the first piece of neglected recording information is 2016/6/2802: 00:00-03:00:00, the alarm time is obtained according to the error time, the current time passes the point, namely the first piece of neglected recording information is not in the effective time range. The valid time range of the second piece of the skip recording information is 2016/6/2803:25:00-04:15:00, if the current time is within the valid time range, the second piece of the skip recording information is considered to be the skip recording information within the valid time range, and if the error information includes only a1, the second piece of the skip recording information is not valid. The valid time range of the third record is 2016/6/2803:45:45-04:45:00, the current time is in the valid time range, the information of the third record is considered to be in the valid time range, if the error information of the third record comprises a1, a2 and a3, the third record completely contains the current error information, and therefore, the alarm prompt information can not be sent.
Of course, when determining whether the neglected record information is within the valid time range, the current system time may be used for the determination, and the warning time may also be used for the determination, which is determined according to the actual requirement.
In the embodiment, different processing modes are adopted according to different states of the alarm record information under the alarm service, so that the service responsible person is not disturbed frequently, and can be informed in time when problems exist, and the timeliness of the alarm is ensured.
The method for alarming on-line service in the embodiment further comprises the following steps:
s209, when receiving the response information in the process sent by the service responsible person according to the received alarm prompt information, updating the state of the corresponding alarm record information in the alarm service to be in the process.
S219, when receiving the response information to the resolution sent by the service administrator according to the received alarm notification information, updating the state of the corresponding alarm record information under the alarm service to the resolution.
S229, when receiving the ignore response message sent by the service administrator according to the received alert prompt message, updates the state of the corresponding alert record message under the alert service to ignore, and records the valid time range of the alert record message.
Specifically, the service responsible person may send corresponding response information according to the actual processing condition of the service responsible person, and update the state of the corresponding alarm record information.
In practical applications, the service responsible person may send the response information in various ways, such as: when the alarm prompt information is sent, the alarm prompt information is sent to a service responsible person in a message form, the message has 3 buttons which are used for processing, solving and ignoring respectively, and the service responsible person can think that the corresponding response record is sent by clicking the corresponding button. If the 'ignore' button is clicked, the service responsible person is required to fill in the valid time range, so that the subsequent recording is facilitated in the alarm recording information.
For example: zhang III receives the alarm prompt information about the service B, and after the alarm prompt information is checked, the error report information is found to be caused by the change of the server, so that the error report information does not affect the large area and can be ignored, therefore, the alarm prompt information clicks 'ignore' and 2 hours of ignore time is set to be sent as the ignore response information. The service B may update the state of the alarm record information whose state is to be processed to ignore, and calculate the valid time range of the piece of the ignored record information.
The starting point of the validity time range may vary from setting to setting, for example: the creation time in the ignored record information is calculated as a starting point, the modification time in the ignored record information is calculated as a starting point, the time when the ignored response information is received is calculated as a starting point, or the starting point and the ending point of the valid time range are included in the ignored response information.
Optionally, if the warning prompt message is sent for the first time, the service principal does not send any response message within a preset response time period (for example, 15 minutes, which may be set according to actual requirements), and the warning prompt message may be sent again at a preset response frequency without interruption (for example, once in 3 minutes) until the service principal sends the response message. Optionally, when the number of times of sending the alarm prompt message reaches a preset number (for example, 10 times), the alarm prompt message may be directly prompted in a voice manner, for example: and (5) dialing a nailed telephone.
Alternatively, when the state of the alarm log information becomes in process, the progress of the process may be inquired of the service person in charge at a preset processing frequency (for example, once in half an hour) until the state of the alarm log information becomes to be solved or ignored. The change condition of the processing progress can be sent to a department group related to the alarm record information in real time, so that the service responsible person can process the problem immediately, and related people can know the condition and provide help in time. Alternatively, when the processing progress has not changed for a plurality of times in the inquiry, the processing progress can be directly reminded in a voice mode, for example: and (5) dialing a nailed telephone. The specific times are set by self-defining, for example: 3 times.
The embodiment sends the alarm prompt information to the relevant service responsible person in real time, so that the problems existing on the line can be found as soon as possible; the current error reporting information is acquired, so that an engineer can conveniently locate a specific fault position at once, and the solution efficiency is improved; the state of the alarm record information can be changed according to the feedback of the service responsible person, so that the management is facilitated, and the monitoring and tracking of the alarm are facilitated.
In one embodiment of the present invention, as shown in fig. 3, an alarm system for online service includes:
the log obtaining module 10 is configured to obtain current error reporting information of an alarm service when receiving alarm request information sent by an online service; the alarm request information includes: alarm service and service responsible;
the alarm sending module 20 is electrically connected with the log obtaining module 10 and is used for sending alarm prompt information to a service responsible person when the preset alarm condition is met;
and the record creating module 30 is electrically connected with the alarm sending module 20 and is used for creating alarm record information with a state of pending under the alarm service according to the current error reporting information and the alarm request information.
Specifically, the online service includes various running services, servers and other software programs or hardware entities that support normal running of the online system. Problems may occur with each online service (whether hardware or software) and when a problem occurs, an alarm request message is sent.
The alarm request information sent by the online service comprises: alarm service, i.e. the name of the on-line service where the problem occurs, for example: a login service, a server number 10, etc. The service person in charge, i.e. the person in charge of the service in question. The alarm service is to determine which on-line service has a problem, and the reason for including the service responsible person in the alarm request message is to send the alarm prompt message to the correct person as soon as possible, so that the on-line service with the problem can be recovered to normal as soon as possible.
Optionally, the alert request information further includes: alarm time, i.e. the time when a problem is serviced online, for subsequent recording and management; and/or, to alert the general content, i.e., the summary of problems with the online service. The overall alarm content comprises any one or more of the following: alarm types, such as: abnormal alarm for indicating abnormal condition of service, error alarm for indicating service operation error, etc. If abnormal alarm occurs, the actual value and the threshold value can be put into the overall alarm content. For a specific example, please refer to the corresponding method embodiment, which is not described herein.
When receiving the alarm request information sent by the online service, the current error reporting information of the alarm service indicated in the alarm request information can be automatically acquired, thereby facilitating the subsequent recording and management.
The preset alarm condition is adjusted by an engineer according to the actual situation. The preset alarm condition in this embodiment may be: the alarm record information of which the state is in process or to be processed does not exist under the alarm service, and the alarm service is considered to be in accordance with the preset alarm condition as long as the condition is met.
In other embodiments, the preset alarm condition may be: there is no alarm record information whose status is in process or pending or ignored. The following steps can be also included: there is no alarm record information in the processing state or waiting for processing, and there is no alarm record information which contains the current registration information, is in the valid time range and is in the neglected state. The subsequent execution steps can be adaptively adjusted according to different preset alarm condition settings.
The sending of the alarm prompt message can adopt different sending modes, such as: the invention is not limited to the sending mode, and only needs to enable the service responsible person to receive the warning prompt information. The alert prompt information may include only: alarm service, current error reporting information, and alarm time. The content is as brief and clear as possible, and visual fatigue caused by excessive information is prevented. The service responsible person only needs the cell-phone can be networked and can receive warning prompt message through communication APP customer ends such as nail at any time and any place, look over concrete wrong content, can feed back response message simultaneously and change the state of reporting an emergency and asking for help or increased vigilance the record information.
After the alarm prompt information is sent to the service responsible person, an alarm record information with a state to be processed is established under the alarm service. The alarm record information comprises: overall alarm content, service responsible person, creation time, modification time, error reporting information, status.
When the alarm record information is created, the creation time and the modification time of the alarm record information can directly adopt the alarm time in the alarm request information. And storing the acquired current error reporting information as the error reporting information in the alarm record information. Optionally, the alarm record information further includes: and (5) alarm service. Although the alarm record is under the name of the alarm service directly created, the alarm service can also be stored in the alarm record for easy viewing. For a specific example, please refer to the corresponding method embodiment, which is not described herein.
In the embodiment, when the online service has a problem, the alarm prompt message can be sent to the service responsible person in time to process the problem in time; and the error reporting information is automatically collected, the labor of an engineer for manually screening the error reporting information is reduced, and the failure solving efficiency is improved.
In another embodiment of the present invention, as shown in fig. 4, in order to avoid redundancy, the same parts as those in the above embodiment of the system are not described again, and include:
the log obtaining module 10 is configured to obtain current error reporting information of an alarm service when receiving alarm request information sent by an online service; the alarm request information includes: alarm service, alarm type, and service principal.
The log obtaining module 10 is configured to, when receiving an alarm request message sent by an online service, obtain current error report information of the alarm service, specifically:
the log obtaining module 10 is configured to, when the alarm type is an abnormal alarm type, obtain at least one error report log information in a current service time period under the alarm service as current error report information; and when the alarm type is the error alarm type, acquiring core data under the alarm service as current error reporting information.
Specifically, different alarm types may be different from each other when acquiring the current error reporting information.
If the type of the alarm is abnormal, the current error reporting quantity of the online service statistics exceeds a threshold value, so that a plurality of pieces of error reporting log information in the current service time period of the alarm service are captured from the system monitoring log platform when the current error reporting information is obtained.
The current service time period is determined according to the setting of the system, and some systems take one hour as a unit, so that the current service time period is within the current hour; some systems are in units of 3 hours, and the current service period is the current 3 hours. For example: the alarm time is 2018/3/2115: 20:32 in units of one hour, then the current service period is 2018/3/2115:00:00-15:59: 59.
In order to make the collected current error information as compact, important and clear as possible, only important error log information is screened as the current error information, for example: only the error log information after the preset keywords is captured. The preset keywords are set according to the actual use condition. For practical examples, please refer to corresponding method embodiments, which are not described herein.
If the type of the error alarm indicates that the online service has an operation error, the core data of the alarm service is generally acquired and used as the current error reporting information. For example: the alarm service is a No. 5 server, and the error alarm type takes the IP and the problem reason of the No. 5 server as the current error report information of the No. 5 server.
And the record updating module 40 is electrically connected with the log obtaining module 10 and is used for updating the error reporting information and the updating time of the alarm record information according to the current error reporting information and the alarm time in the alarm request information when the alarm record information in the alarm service exists in a state of waiting for processing or being processed.
Specifically, when there is alarm record information in a state of waiting for processing or in processing in the alarm service, it indicates that the service manager has received the alarm prompt information, and does not need to notify again.
And the update time in the existing alarm record information is updated with the alarm time in the alarm request information of this time. And updating error information in the alarm record information by using the current error information.
The update modification time is the update time for the service responsible person to know the error information in the alarm record information conveniently.
The current error reporting information is obtained according to the received alarm request information, namely the latest error reporting condition, so that the real-time performance and the accuracy of the alarm record information are ensured by updating the error reporting information in the alarm record information by using the current error reporting information.
The condition judgment module 50 is electrically connected with the record updating module 40 and is used for considering that the alarm request information conforms to the preset alarm condition when the on-line service does not have the alarm record information with the state of waiting for processing or in processing and does not have the neglected record information; wherein, the neglected record information refers to the alarm record information with the state of neglecting;
and when the alarm record information in the state of waiting for processing or in processing does not exist under the alarm service, but the neglected record information exists, judging whether the neglected record information in the effective time range contains the acquired current error reporting information;
and when the neglected record information in the valid time range does not contain the acquired current error reporting information, the preset alarm condition is considered to be met;
the alarm sending module 20 is electrically connected to the condition determining module 50, and is further configured to not send an alarm prompt message to the service responsible person when there is a piece of neglected record information within the valid time range that includes the obtained current error reporting information.
Specifically, the state of the alarm record information is changed according to the information fed back by the service responsible person, and the omission of the record information indicates that the corresponding error report information does not affect the large office or the subsequent automatic solution, and can be omitted.
If one piece of neglected record information in the valid time range contains the current error reporting information, the fault is treated by the service responsible person, is considered to be ignorable, and the valid time range is set and is in the valid time range, so that the alarm prompt information does not need to be sent, and the repeated treatment is avoided.
The alarm time can be used for judging whether the neglected recording information is in the effective time range, the alarm time accurately reflects the time of error information reporting, and the judgment accuracy is improved. Of course, the determination can also be directly made according to the current time of the system.
The alarm sending module 20 is used for sending alarm prompt information to a service responsible person when the preset alarm condition is met;
and the record creating module 30 and the alarm sending module 20 are configured to create alarm record information with a state to be processed under the alarm service according to the current error reporting information and the alarm request information.
Specifically, the present embodiment specifically defines the preset alarm condition, that is, there is no alarm record information whose state is to be processed or in processing or no ignore record information in the alarm service, or there is no alarm record information whose state is to be processed or in processing but no ignore record information which is in an effective time range and includes the acquired current error reporting information. And when the condition is in accordance with any one of the conditions, the preset alarm condition is considered to be met, and alarm prompt information is sent to the service responsible person.
The sending of the alarm prompt message can adopt different sending modes, such as: the invention is not limited to the sending mode, and only needs to enable the service responsible person to receive the warning prompt information. The alert prompt information may include only: alarm service, current error reporting information, and alarm time. The content is as brief and clear as possible, and visual fatigue caused by excessive information is prevented.
The service responsible person only needs the cell-phone can be networked and can receive warning prompt message through communication APP customer ends such as nail at any time and any place, look over concrete wrong content, can feed back response message simultaneously and change the state of reporting an emergency and asking for help or increased vigilance the record information.
For a specific example, please refer to the corresponding method embodiment, which is not described herein.
Of course, when determining whether the neglected record information is within the valid time range, the current system time may be used for the determination, and the warning time may also be used for the determination, which is determined according to the actual requirement.
In the embodiment, different processing modes are adopted according to different states of the alarm record information under the alarm service, so that the service responsible person is not disturbed frequently, and can be informed in time when problems exist, and the timeliness of the alarm is ensured.
The alarm system for online service of the present embodiment further includes:
a state updating module 60 electrically connected to the alarm sending module 20, configured to update the state of the corresponding alarm record information under the alarm service to ignore when receiving an ignore response message sent by the service responsible person according to the received alarm prompt message, and record the valid time range of the alarm record information;
when receiving response information in processing sent by the service responsible person according to the received alarm prompt information, updating the state of the corresponding alarm record information under the alarm service into the processing;
and when receiving the solution response information sent by the service responsible person according to the received alarm prompt information, updating the state of the corresponding alarm record information under the alarm service to solution.
Specifically, the service responsible person may send corresponding response information according to the actual processing condition of the service responsible person, and update the state of the corresponding alarm record information. For a specific example, please refer to the corresponding method embodiment, which is not described herein.
The starting point of the validity time range may vary from setting to setting, for example: the creation time in the ignored record information is calculated as a starting point, the modification time in the ignored record information is calculated as a starting point, the time when the ignored response information is received is calculated as a starting point, or the starting point and the ending point of the valid time range are included in the ignored response information.
Optionally, if the warning prompt message is sent for the first time, the service principal does not send any response message within a preset response time period (for example, 15 minutes, which may be set according to actual requirements), and the warning prompt message may be sent again at a preset response frequency without interruption (for example, once in 3 minutes) until the service principal sends the response message. Optionally, when the number of times of sending the alarm prompt message reaches a preset number (for example, 12 times), the alarm prompt message can be directly reminded in a voice mode, for example: and (5) dialing a nailed telephone.
Alternatively, when the state of the alarm log information becomes in process, the progress of the process may be inquired of the service person in charge at a preset processing frequency (for example, once in half an hour) until the state of the alarm log information becomes to be solved or ignored. The change condition of the processing progress can be sent to a department group related to the alarm record information in real time, so that the service responsible person can process the problem immediately, and related people can know the condition and provide help in time. Alternatively, when the processing progress has not changed for a plurality of times in the inquiry, the processing progress can be directly reminded in a voice mode, for example: and (5) dialing a nailed telephone. The specific times are set by self-defining, for example: 3 times. The embodiment sends the alarm prompt information to the relevant service responsible person in real time, so that the problems existing on the line can be found as soon as possible; the current error reporting information is acquired, so that an engineer can conveniently locate a specific fault position at once, and the solution efficiency is improved; the state of the alarm record information can be changed according to the feedback of the service responsible person, so that the management is facilitated, and the monitoring and tracking of the alarm are facilitated.
It should be noted that the above embodiments can be freely combined as necessary. The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (8)

1. An alarm method for on-line service is characterized by comprising the following steps:
when receiving an alarm request message sent by an online service, acquiring the current error report message of the alarm service; the alarm request information includes: the alert service and service leader;
when the preset alarm condition is met, sending alarm prompt information to the service responsible person;
according to the current error reporting information and the alarm request information, alarm record information with a state to be processed is established under the alarm service;
when receiving an alarm request message sent by an online service, the method further comprises the following steps after obtaining the current error report message of the alarm service:
when the alarm service does not exist alarm record information in a state of waiting for processing or in processing, or neglected record information, the alarm service is considered to be in accordance with a preset alarm condition; wherein, the neglected record information refers to the alarm record information with the state of neglecting;
when the alarm service does not exist alarm record information in a state of waiting for processing or in processing but exists neglected record information, judging whether the neglected record information in an effective time range contains the acquired current error reporting information;
when the neglected record information in the valid time range contains the acquired current error reporting information, not sending an alarm prompt message to the service responsible person;
and when the neglected record information in the valid time range does not contain the acquired current error reporting information, the preset alarm condition is considered to be met.
2. The alerting method of an online service of claim 1, wherein:
the alarm request information further includes: a type of alarm;
when receiving an alarm request message sent by an online service, the method for acquiring the current error report message of the alarm service specifically comprises the following steps:
when the alarm type is an abnormal alarm type, acquiring at least one piece of error report log information in the current service time period under the alarm service as the current error report information;
and when the alarm type is an error alarm type, acquiring core data under the alarm service as the current error reporting information.
3. The method for alarming of online service according to claim 1, wherein when receiving the alarm request message sent by the online service, after obtaining the current error report message of the alarm service, the method further comprises the following steps:
and when the existing state of the alarm service is the alarm record information to be processed or in the process, updating the error reporting information and the updating time of the alarm record information according to the current error reporting information and the alarm time in the alarm request information.
4. The method for alerting of an online service of claim 1, further comprising the steps of:
when receiving response information in processing sent by a service responsible person according to the received alarm prompt information, updating the state of the corresponding alarm record information under the alarm service into the state in processing;
and when receiving response information of solution sent by the service responsible person according to the received alarm prompt information, updating the state of the corresponding alarm record information under the alarm service to solution.
5. The method for alerting of an online service of claim 1, further comprising the steps of:
and when receiving the neglect response information sent by the service responsible person according to the received alarm prompt information, updating the state of the corresponding alarm record information under the alarm service to be ignored, and recording the effective time range of the alarm record information.
6. An alert system for an online service, comprising:
the log acquisition module is used for acquiring the current error report information of the alarm service when receiving the alarm request information sent by the online service; the alarm request information includes: the alert service and service leader;
the alarm sending module is used for sending alarm prompt information to the service responsible person when the preset alarm condition is met;
a record creating module, configured to create alarm record information in a state of waiting for processing under the alarm service according to the current error reporting information and the alarm request information;
the condition judgment module is used for considering that the alarm request information accords with the preset alarm condition when the on-line service corresponding to the alarm request information does not have alarm record information in a state of waiting for processing or in processing and neglected record information; wherein, the neglected record information refers to the alarm record information with the state of neglecting;
and when the alarm service does not have the alarm record information in the state of waiting for processing or in processing but has the neglected record information, judging whether the neglected record information in the valid time range contains the acquired current error reporting information;
and when the neglected record information in the valid time range does not contain the acquired current error reporting information, the preset alarm condition is considered to be met;
and the alarm sending module is further used for not sending alarm prompt information to the service responsible person when the neglected record information in the valid time range contains the acquired current error reporting information.
7. The alert system for online services of claim 6, further comprising:
and the record updating module is used for updating the error reporting information and the updating time of the alarm record information according to the current error reporting information and the alarm time in the alarm request information when the existing state of the alarm service is the alarm record information to be processed or in the process.
8. The alert system for online services of claim 6, further comprising:
and the state updating module is used for updating the state of the corresponding alarm record information under the alarm service to be ignored and recording the effective time range of the alarm record information when receiving the ignore response information sent by the service responsible person according to the received alarm prompt information.
CN201810799040.5A 2018-07-19 2018-07-19 Online service warning method and system Active CN109039718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810799040.5A CN109039718B (en) 2018-07-19 2018-07-19 Online service warning method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810799040.5A CN109039718B (en) 2018-07-19 2018-07-19 Online service warning method and system

Publications (2)

Publication Number Publication Date
CN109039718A CN109039718A (en) 2018-12-18
CN109039718B true CN109039718B (en) 2021-06-25

Family

ID=64644501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810799040.5A Active CN109039718B (en) 2018-07-19 2018-07-19 Online service warning method and system

Country Status (1)

Country Link
CN (1) CN109039718B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111342980A (en) * 2020-02-25 2020-06-26 京东数字科技控股有限公司 Early warning information spreading method, system, computer equipment and storage medium
CN113535759A (en) * 2020-04-14 2021-10-22 ***通信集团上海有限公司 Data labeling method, device, equipment and medium
CN112053050A (en) * 2020-08-27 2020-12-08 北京云迹科技有限公司 Assessment method, device and system suitable for cooperation efficiency between robots

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6766369B1 (en) * 1998-03-09 2004-07-20 Net Zero, Inc. Internet service error tracking
CN101355455A (en) * 2008-09-12 2009-01-28 中兴通讯股份有限公司 Alarm system and method for service management platform
CN104538336A (en) * 2015-01-07 2015-04-22 海太半导体(无锡)有限公司 Alarm recognizing and processing system and method for semiconductor encapsulation equipment
CN104967537A (en) * 2015-06-30 2015-10-07 北京奇艺世纪科技有限公司 Alarm information pushing method and device
CN106713049A (en) * 2017-02-04 2017-05-24 杭州迪普科技股份有限公司 Alarm method and device of monitor
CN106972978A (en) * 2017-05-27 2017-07-21 郑州云海信息技术有限公司 A kind of ALM method for pushing and device
CN107104840A (en) * 2017-05-21 2017-08-29 郑州云海信息技术有限公司 A kind of daily record monitoring method, apparatus and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6766369B1 (en) * 1998-03-09 2004-07-20 Net Zero, Inc. Internet service error tracking
CN101355455A (en) * 2008-09-12 2009-01-28 中兴通讯股份有限公司 Alarm system and method for service management platform
CN104538336A (en) * 2015-01-07 2015-04-22 海太半导体(无锡)有限公司 Alarm recognizing and processing system and method for semiconductor encapsulation equipment
CN104967537A (en) * 2015-06-30 2015-10-07 北京奇艺世纪科技有限公司 Alarm information pushing method and device
CN106713049A (en) * 2017-02-04 2017-05-24 杭州迪普科技股份有限公司 Alarm method and device of monitor
CN107104840A (en) * 2017-05-21 2017-08-29 郑州云海信息技术有限公司 A kind of daily record monitoring method, apparatus and system
CN106972978A (en) * 2017-05-27 2017-07-21 郑州云海信息技术有限公司 A kind of ALM method for pushing and device

Also Published As

Publication number Publication date
CN109039718A (en) 2018-12-18

Similar Documents

Publication Publication Date Title
US10673683B2 (en) Data network notification bar processing system
CN109039718B (en) Online service warning method and system
US7266734B2 (en) Generation of problem tickets for a computer system
CN109308252B (en) Fault positioning processing method and device
JP3805951B2 (en) Image forming apparatus management system and management method
US7574502B2 (en) Early warning of potential service level agreement violations
JP4666482B2 (en) Business management device, business management method, and business management program
US11243830B2 (en) Incident detection and management
CN110535713B (en) Monitoring management system and monitoring management method
US10970150B1 (en) Incident detection and management
CN112631913A (en) Method, device, equipment and storage medium for monitoring operation fault of application program
CN110795264A (en) Monitoring management method and system and intelligent management terminal
US20220156138A1 (en) Incident detection and management
US8631278B2 (en) Diagnostics for remote computer systems
GB2372674A (en) Network management
CN110727555A (en) Service interface management method, device, medium and computer equipment
US20140317459A1 (en) Backup system defect detection
JP2007141007A (en) Support systemization for failure in system operation monitoring
KR100970211B1 (en) Method and Apparatus for Monitoring Service Status Via Special Message Watcher in Authentication Service System
CN110266536B (en) Internet of vehicles resource management method and device
US11106527B2 (en) Hardware error corrections based on policies
CN111983960A (en) Monitoring system and method
CN110557295A (en) Alarm positioning method and device for network equipment
US20030200309A1 (en) Transmission network system
JP2003271238A (en) Remote maintenance method and system thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20181218

Assignee: Nanjing Manyun Cold Chain Technology Co.,Ltd.

Assignor: JIANGSU MANYUN SOFTWARE TECHNOLOGY Co.,Ltd.

Contract record no.: X2023980038397

Denomination of invention: An alarm method and system for online services

Granted publication date: 20210625

License type: Common License

Record date: 20230724