CN107766208B

CN107766208B - Method, system and device for monitoring business system

Info

Publication number: CN107766208B
Application number: CN201711024783.7A
Authority: CN
Inventors: 胡文彬; 刘祥涛; 赵彦晖; 孙淏添
Original assignee: Shenzhen Zhongrun Sifang Information Technology Co ltd
Current assignee: Shenzhen Zhongrun Sifang Information Technology Co ltd
Priority date: 2017-10-27
Filing date: 2017-10-27
Publication date: 2021-01-05
Anticipated expiration: 2037-10-27
Also published as: CN107766208A

Abstract

The invention discloses a method for monitoring service systems, which comprises the steps of collecting log files of each service system; analyzing each log file to obtain log information of each corresponding log file; judging whether all log information is normal or not; if so, determining that all the service systems are normal, otherwise, determining that the service system corresponding to the abnormal log information is abnormal. Therefore, the method collects the log files of all the service systems, analyzes the log information of all the service systems, and can accurately position the abnormal service system if the service system in the monitoring range is abnormal and the service system corresponding to the abnormal log information is abnormal. The invention also discloses a system and a device for monitoring the service system, and the effect is as above.

Description

Method, system and device for monitoring business system

Technical Field

The present invention relates to the field of computers, and in particular, to a method, a system, and an apparatus for monitoring a service system.

Background

When a business system processes business data, a process is often "dead" due to sudden situations such as program bug (fault) or operating system environment problems, and the process "dead" situation shows that the process of the business system exists, but the business system does not process the data any more. This "deadlock" situation has a severe impact on the progress of the business. Therefore, the operation condition of the business system needs to be monitored.

In the prior art, the working condition of a business system is judged by checking the processing condition of an interface program. For example, if the number of the detected pending requests is 100 and is far beyond the normal value, it is determined that the request has backlog, and it is determined that the service system has an abnormality. However, the interface program is processed by a plurality of service systems in an interactive manner, so that the method monitors the working conditions of the plurality of service systems, and when an abnormality occurs, the abnormal service system cannot be accurately located, that is, it cannot be determined which service system is abnormal, and a maintenance worker needs to check one by one, which is very low in efficiency.

Therefore, how to monitor the service system and accurately locate the service system with the abnormality when the abnormality exists is a problem to be solved by those skilled in the art.

Disclosure of Invention

The invention aims to provide a method, a system and a device for monitoring a service system, which are used for monitoring the service system and accurately positioning the abnormal service system when the abnormal service system exists.

In order to solve the above technical problem, the present invention provides a method for monitoring a service system, including:

collecting log files of all service systems;

analyzing each log file to obtain corresponding log information of each log file;

judging whether all the log information is normal or not;

if so, determining that all the service systems are normal, otherwise, determining that the service system corresponding to the abnormal log information is abnormal.

Preferably, the log information is specifically a line number of the content of the log file and an update time of the log file.

Preferably, the determining whether all the log information is normal specifically includes:

calculating each updating time interval of the updating time of each log file in the current monitoring period and the updating time of each log file in the corresponding previous monitoring period;

calculating the increment of each row number between the row number of each log file in the current monitoring period and the row number of each log file in the corresponding previous monitoring period;

if all the updating time intervals are larger than the updating time interval threshold value and all the row number increments are larger than the row number increment threshold value, determining that all the log information is normal;

and if the updating time interval of the log file is less than or equal to the updating time interval threshold or the line number increment is less than or equal to the line number increment threshold, determining that the log information of the log file is abnormal.

calculating each updating time interval of each updating time and the corresponding current system time;

if all the updating time intervals are smaller than or equal to the updating time interval threshold value and all the row number increments are larger than the row number increment threshold value, determining that all the log information is normal;

and if the updating time interval of the log file is larger than the updating time interval threshold or the row number increment is smaller than or equal to the row number increment threshold, determining that the log information of the log file is abnormal.

Preferably, if the service system has an exception, the method further includes:

sending abnormal early warning information;

the abnormal early warning information comprises information of the abnormal business system.

Preferably, after the sending the abnormal early warning information, the method further includes:

and saving the sent abnormal early warning information.

The invention also provides a system for monitoring the service system, which comprises:

the log acquisition module is used for log files of all the service systems;

the log analysis module is used for analyzing each log file to obtain corresponding log information of each log file; and the log information is used for judging whether all the log information is normal or not, if so, determining that all the service systems are normal, otherwise, determining that the service systems corresponding to the abnormal log information are abnormal.

Preferably, further comprising:

the early warning sending module is used for sending abnormal early warning information;

Preferably, further comprising:

and the database module is used for storing the sent abnormity early warning information.

The invention also provides a device for monitoring the service system, which comprises a processor, wherein the processor is used for realizing the steps of any method for monitoring the service system when executing the program stored in the memory.

The method for monitoring the service system provided by the invention collects the log files of each service system; analyzing each log file to obtain log information of each corresponding log file; judging whether all log information is normal or not; if so, determining that all the service systems are normal, otherwise, determining that the service system corresponding to the abnormal log information is abnormal. Therefore, the method collects the log files of all the service systems, analyzes the log information of all the service systems, and can accurately position the abnormal service system when the service system is abnormal compared with the prior art if the service system in the monitoring range is abnormal and the service system corresponding to the abnormal log information is abnormal. The system and the device for monitoring the business system have the advantages.

Drawings

In order to illustrate the embodiments of the present invention more clearly, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained by those skilled in the art without inventive effort.

Fig. 1 is a flowchart of a method for monitoring a service system according to an embodiment of the present invention;

fig. 2 is a flowchart of another method for monitoring a service system according to an embodiment of the present invention;

fig. 3 is a structural diagram of a system for monitoring a service system according to an embodiment of the present invention;

fig. 4 is a structural diagram of another system for monitoring a service system according to an embodiment of the present invention;

fig. 5 is a structural diagram of an apparatus for monitoring a service system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without any inventive step, are within the scope of the present invention.

In order to make the technical solutions of the present invention better understood, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.

Fig. 1 is a flowchart of a method for monitoring a service system according to an embodiment of the present invention, and as shown in fig. 1, the method for monitoring a service system includes the following steps:

s10: and collecting log files of all the service systems.

The log files of the business systems in the monitoring range are collected, and specifically, the log files to be analyzed can be collected from the log directories of the business systems according to the configuration information of the business systems. The configuration information may include a name, a log directory, and a log file name of the corresponding service system. According to the configuration information of each service system, the log files of each service system in the monitoring range can be collected, and the monitoring of a plurality of service systems is realized.

It should be noted that, during collection, the file attributes of the log files need to be kept from being modified, and each collected log file can be placed in each analysis processing directory. Specifically, the log files of the business systems can be collected to the corresponding analysis processing directories through the command program. The command program may be, for example:

cp-p/syspro/log/file.log/logmon/data/syspro/cur_log/file.log

the syspro is a directory established according to the name of the corresponding service system, and log files of different service systems correspond to different directories. cur _ log is a subdirectory for storing the log file of the current monitoring period in the syspro directory, and correspondingly, last _ log is a subdirectory for storing the log file of the last monitoring period in the syspro directory.

Of course, other command programs may be adopted to collect the log files of each service system to the corresponding analysis processing directories, which is not described herein again.

S11: and analyzing each log file to obtain the log information of each corresponding log file.

The system is mainly responsible for analyzing all collected log files of all service systems and acquiring log information of all the log files.

S12: and judging whether all log information is normal or not.

If so, the process proceeds to step S13, and if not, the process proceeds to step S14.

S13: all the service systems are normal.

And if all log information is normal, all corresponding service systems are normal.

S14: and the business system corresponding to the abnormal log information is abnormal.

And if the log information has abnormity, the abnormal log information is shown to be abnormal in the corresponding service system.

Collecting log files of all service systems; analyzing each log file to obtain log information of each corresponding log file; judging whether all log information is normal or not; if so, determining that all the service systems are normal, otherwise, determining that the service system corresponding to the abnormal log information is abnormal. Therefore, the method collects the log files of all the service systems, analyzes the log information of all the service systems, and can accurately position the service system with the abnormality if the service system in the monitoring range has the abnormality and the service system corresponding to the abnormal log information is abnormal.

On the basis of the above embodiment, in order to more accurately judge the operating condition of each business system, as a preferred implementation, the log information is specifically the line number of the content of the log file and the update time of the log file.

And analyzing each log file, and acquiring the updating time of each current log file and the file content line number of each current log file. And judging whether the updating time of all the current log files and the file content line number of all the current log files are normal or not, if so, determining that all the service systems are normal, otherwise, determining that the service systems corresponding to the log files with abnormal updating time or abnormal line number are abnormal.

On the basis of the foregoing embodiment, in order to determine the operating status of each business system more accurately, as a preferred implementation, step S12 specifically includes:

and calculating each updating time interval of the updating time of each log file in the current monitoring period and the updating time of each log file in the corresponding last monitoring period.

It should be noted that, each currently acquired log file is each log file of the current monitoring period, and each corresponding log file of the previous monitoring period may be found from the analysis processing directory, and the update time of each log file of the previous monitoring period is obtained through analysis.

Taking the directory syspro as an example, the command program for acquiring the update time of the corresponding log file in the current monitoring period may be, for example:

start/logmon/data/syspro/cur_log/file.log

taking the directory syspro as an example, the command program for obtaining the update time of the log file in the previous monitoring period may be, for example:

start/logmon/data/syspro/last_log/file.log

and subtracting to calculate the update time interval.

Of course, other command programs may be used to obtain the update time of the log file, which is not described herein again.

And calculating the increment of each line number between the line number of each log file in the current monitoring period and the line number of each log file in the corresponding previous monitoring period.

Specifically, the log files of the previous monitoring period can be found from the log collection directory, and the line number of the content of each log file of the previous monitoring period is obtained through analysis.

Taking the directory syspro as an example, the command procedure for acquiring the line number of the content of the log file in the current monitoring period may be, for example:

wc–l/logmon/data/syspro/cur_log/file.log

taking the directory syspro as an example, the command procedure for obtaining the line number of the content of the log file in the previous monitoring period may be, for example:

wc–l/logmon/data/syspro/last_log/file.log

and subtracting to calculate the row number increment.

Of course, other command programs may also be selected to obtain the number of lines of the content of the log file, which is not described herein again.

And if all the updating time intervals are larger than the updating time interval threshold value and all the row number increments are larger than the row number increment threshold value, determining that all the log information is normal.

The updating time interval is larger than a preset updating time interval threshold value, which indicates that the updating time interval is normal. And the row number increment is larger than a preset row number increment threshold value, which indicates that the row number increment is normal. And if the update time interval and the row number increment of one log file are normal, the log information of the log file is normal. And if the updating time interval and the row number increment of all the log files are normal, determining that all the log information is normal.

And if the update time interval of the log file is less than or equal to the update time interval threshold or the row number increment is less than or equal to the row number increment threshold, determining that the log information of the log file is abnormal.

If the update time interval of a certain log file is smaller than or equal to the update time interval threshold, it indicates that the update time interval is too small, that is, the update time of the log file in the current monitoring period is too close to the update time of the log file in the last monitoring period and too far away from the current system time, and the log file is most likely not updated for a long time. For example, the preset update time interval threshold is 1 minute, and the calculated update time interval of the log file is 0, which indicates that the log file is not updated.

If the row number increment for a log file is less than or equal to the row number increment threshold, for example, the preset row number increment threshold is 10 rows, and the calculated row number increment for the log file is 5 rows. The log file may not be processing the traffic data normally because it is not possible to generate only such a small amount of logs within one monitoring period if the traffic data is processed normally.

Therefore, whether the log information is normal can be judged according to the updating time interval of the log files in the current monitoring period and the previous monitoring period and the line number increment of the log files in the current monitoring period and the previous monitoring period, and whether the log information is normal can be judged more accurately.

On the basis of the foregoing embodiment, in order to determine the operating status of each service system more accurately and conveniently, as a preferred implementation, step S12 specifically includes:

each update time interval of each update time and the corresponding current system time is calculated.

And calculating the updating time interval of each log file of the current monitoring period and each updating time interval of the corresponding current system time. It should be noted that the update time interval is different from what is indicated by the aforementioned update time interval.

It should be noted that, each currently acquired log file is each log file of the current monitoring period, and each corresponding log file of the previous monitoring period can be found from each analysis processing directory, and the line number of the content of each log file of the previous monitoring period is obtained.

And if all the updating time intervals are smaller than or equal to the updating time interval threshold value and all the row number increments are larger than the row number increment threshold value, determining that all the log information is normal.

The updating time interval is smaller than or equal to a preset updating time interval threshold value, which indicates that the updating time interval is normal. And the row number increment is larger than a preset row number increment threshold value, which indicates that the row number increment is normal. And if the update time interval and the row number increment of one log file are normal, the log information of the log file is normal. And if the updating time interval and the line number increment of all the log files are normal, determining that all the log information is normal.

It should be noted that, because the update time interval in this embodiment is different from the content of the update time interval in the foregoing, the update time interval threshold in this embodiment is not connected to the update time interval threshold in the foregoing, and may be equal to or unequal to the update time interval threshold, which is not limited in this invention.

And if the update time interval of the log file is larger than the update time interval threshold or the row number increment is smaller than or equal to the row number increment threshold, determining that the log information of the log file is abnormal.

If the update time interval of a certain log file is greater than the update time interval threshold, it indicates that the update time interval is too large, that is, the update time of the log file in the current monitoring period is too far from the current system time, and the log file is most likely not updated for a long time.

If the row number increment is less than or equal to the row number increment threshold, e.g., the preset row number increment threshold is 10 rows, and the calculated row number increment for the log file is 5 rows, it indicates that the log file may not be processing the traffic data normally, because if the traffic data is processed normally, it is not possible to generate only such a small amount of logs during a monitoring period.

On the basis of the above embodiment, in order to notify the operation and maintenance staff to perform maintenance in time after the abnormality occurs in the service system, as a preferred implementation manner, if the service system has an abnormality, the method further includes sending abnormality warning information, where the abnormality warning information includes information of the service system having the abnormality.

It should be noted that the information of the abnormal service system may be the name of the abnormal service system, the number of the abnormal service system, or other information, as long as the service system can be located, and the specific type of the information of the service system is not limited in the present invention.

In addition, the abnormality early warning information may be sent to the operation and maintenance personnel by a mail or a short message through the stored related information, and of course, the abnormality early warning information may also be sent by other methods, which is not described herein again.

On the basis of the foregoing embodiment, in order to facilitate subsequent checking or confirming of the working condition of the business system, as a preferred implementation, after the sending of the abnormality warning information, the method further includes saving the sent abnormality warning information.

Specifically, the abnormal early warning information may be stored in the database module in the form of an early warning information table, where the early warning information table may include information such as an abnormal business system name, early warning detailed information, early warning sending time, an early warning sending form, and an early warning receiver.

In the following, taking monitoring the working condition of a service system as an example, the method for monitoring a service system provided by the present invention is described in detail, and with reference to fig. 2, fig. 2 is a flowchart of another method for monitoring a service system provided by the present invention, which includes:

s20: and collecting a log file to be analyzed from a log directory of the service system.

S21: and obtaining the update time of the log file, comparing the update time with the update time of the log file in the last monitoring period, and obtaining the file update time interval.

The obtained log file is the log file of the current monitoring period.

S22: and acquiring the line number of the log file, comparing the line number with the line number of the log file content in the previous monitoring period, and acquiring the line number increment of the file content.

S23: if the file updating time interval does not exceed the updating time interval threshold or the line number increment of the file content does not exceed the line number increment threshold of the file content, the business system is possibly abnormal, and abnormal early warning information is sent.

S24: and saving the abnormal early warning information for subsequent checking and confirmation.

The method for monitoring a service system provided in this embodiment monitors the working condition of a service system, and determines whether the service system is normal by comparing the update time and the number of lines of the log file in the previous monitoring period. For a plurality of service systems, the working condition of each service system is monitored according to the method, so that the working conditions of the plurality of service systems can be monitored.

The foregoing describes in detail an embodiment of a method for monitoring a service system, and based on the method for monitoring a service system described in the foregoing embodiment, an embodiment of the present invention provides a system for monitoring a service system corresponding to the method. Since the embodiment of the system part corresponds to the embodiment of the method part, the embodiment of the system part is described with reference to the embodiment of the method part, and is not described in detail here.

Fig. 3 is a structural diagram of a system for monitoring a service system according to an embodiment of the present invention, as shown in fig. 3, including:

and the log collection module 30 is used for collecting log files of all the service systems.

A log analysis module 31, configured to analyze each log file to obtain log information of each corresponding log file; and the log information is used for judging whether all log information is normal or not, if so, all service systems are determined to be normal, and otherwise, the service system corresponding to the abnormal log information is abnormal.

In the system for monitoring a service system provided by this embodiment, the log collection module collects log files of each service system, and the log analysis module analyzes each log file to obtain log information of each corresponding log file; judging whether all log information is normal or not; if so, determining that all the service systems are normal, otherwise, determining that the service system corresponding to the abnormal log information is abnormal. It can be seen that, the log collection module collects log files of each service system, the log analysis module analyzes log information of each service system, and if a service system in the monitoring range is abnormal, the service system corresponding to the abnormal log information is abnormal.

On the basis of the foregoing embodiment, in order to more accurately determine the operating status of each service system, as a preferred embodiment, the log information acquired by the log analysis module 31 is specifically the line number of the content of the log file and the update time of the log file.

On the basis of the foregoing embodiment, in order to more accurately determine the working condition of each service system, as a preferred implementation, the log analysis module 31 specifically includes:

and the obtaining submodule is used for analyzing the line number and the updating time of the content of each log file obtained by each log file.

And the calculating submodule is used for calculating each updating time interval of the updating time of each log file in the current monitoring period and the updating time of each log file in the corresponding previous monitoring period, and calculating each row number increment between the row number of each log file in the current monitoring period and the row number of each log file in the corresponding previous monitoring period.

And the analysis submodule is used for determining that all log information is normal and all service systems are normal if all the updating time intervals are larger than the updating time interval threshold and all the row number increments are larger than the row number increment threshold, and determining that the log information of the log file is abnormal if the updating time interval of the log file is smaller than or equal to the updating time interval threshold or the row number increments are smaller than or equal to the row number increment threshold, so that the service system corresponding to the log information is abnormal.

On the basis of the foregoing embodiment, as a preferred implementation manner, in order to more accurately and conveniently judge the working condition of each service system, the log analysis module 31 specifically includes:

And the calculating submodule is used for calculating each updating time interval of each updating time and the corresponding current system time and calculating each row number increment between the row number of each log file in the current monitoring period and the row number of each log file in the corresponding previous monitoring period.

And the analysis submodule is used for determining that all log information is normal if all the updating time intervals are smaller than or equal to the updating time interval threshold and all the row number increments are larger than the row number increment threshold, so that all the service systems are determined to be normal, and determining that the log information of the log file is abnormal if the updating time intervals of the log file are larger than the updating time interval threshold or the row number increments are smaller than or equal to the row number increment threshold, so that the service system corresponding to the log information is abnormal.

Referring to fig. 4, fig. 4 is a block diagram of another system for monitoring a service system according to an embodiment of the present invention.

On the basis of the foregoing embodiment, in order to notify the operation and maintenance staff to perform maintenance in time after the abnormality occurs in the service system, as a preferred implementation manner, the system further includes an early warning sending module 40, configured to send abnormality early warning information, where the abnormality early warning information includes information of the service system in which the abnormality occurs.

On the basis of the above embodiment, in order to facilitate subsequent checking or confirming the working condition of the business system, as shown in fig. 4, as a preferred embodiment, the system further includes a database module 41 for storing the sent abnormal early warning information.

Preferably, the database module 41 may store not only the early warning information table, but also a service configuration table, an early warning threshold value table, an early warning sending table, and the like. The service configuration table may store configuration information of the service system, and is used to specify a log monitoring object, which may include a service system name, a log directory, a log file name, and the like. The early warning threshold table can store early warning threshold information, including a file update time interval threshold, a file line number increment threshold, and the like. The early warning sending table can store relevant information of early warning sending, and is used for appointing an early warning receiver, wherein the relevant information comprises information such as an early warning sending type, an early warning receiver, a receiver mailbox, a receiver mobile phone number and the like.

The foregoing describes in detail an embodiment of a method for monitoring a service system, and based on the method for monitoring a service system described in the foregoing embodiment, an embodiment of the present invention further provides a device for monitoring a service system corresponding to the method. Since the embodiment of the apparatus portion and the embodiment of the method portion correspond to each other, the embodiment of the apparatus portion is described with reference to the embodiment of the method portion, and is not described in detail here.

Fig. 5 is a structural diagram of an apparatus for monitoring a service system according to an embodiment of the present invention, as shown in fig. 5, including:

a memory 50 and a processor 51.

A memory 50 for storing a computer program.

The processor 51, when executing the computer program stored in the memory 50, may implement the following steps:

and collecting log files of all the service systems.

And analyzing each log file to obtain the log information of each corresponding log file.

And judging whether all log information is normal or not.

In some embodiments of the present invention, the processor 51 may be further configured to execute the computer program in the memory 50 to implement the following steps:

and analyzing the log files to obtain the corresponding line number of the content of each log file and the update time of the log files.

And judging whether the line number of the contents of all the log files and the updating time of the log files are normal or not.

If so, determining that all the service systems are normal, otherwise, determining that the service system corresponding to the content row of the abnormal log file or the update time of the log file is abnormal.

and sending abnormal early warning information.

The abnormal early warning information comprises information of an abnormal service system.

and saving the sent abnormal early warning information.

In the apparatus for monitoring service systems provided in this embodiment, when the processor executes the computer program in the memory, the processor collects log files of each service system; analyzing each log file to obtain log information of each corresponding log file; judging whether all log information is normal or not; if so, determining that all the service systems are normal, otherwise, determining that the service system corresponding to the abnormal log information is abnormal. The device for monitoring the service system provided by the embodiment can accurately position the abnormal service system when the abnormal service system exists.

The method, system and apparatus for monitoring a service system provided by the present invention are described in detail above. The embodiments are described in a progressive mode in the specification, the emphasis of each embodiment is different from that of other embodiments, and the same and similar parts among the embodiments are referred to each other.

It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A method of monitoring a business system, comprising:

collecting log files of all service systems;

judging whether all the log information is normal or not;

if so, determining that all the service systems are normal, otherwise, determining that the service system corresponding to the abnormal log information is abnormal;

the log information is specifically the line number of the content of the log file and the update time of the log file;

the specific steps of judging whether all the log information is normal are as follows:

if the update time interval of the log file is less than or equal to the update time interval threshold or the row number increment is less than or equal to the row number increment threshold, determining that the log information of the log file is abnormal;

or the judging whether all the log information is normal specifically includes:

2. The method of claim 1, wherein if there is an exception in the service system, further comprising:

sending abnormal early warning information;

3. The method of claim 2, further comprising, after the sending the abnormal pre-warning information:

and saving the sent abnormal early warning information.

4. A system for monitoring a business system, comprising:

the log acquisition module is used for log files of all the service systems;

the log analysis module is used for analyzing each log file to obtain corresponding log information of each log file; the log information is used for judging whether all the log information is normal or not, if so, all the service systems are determined to be normal, otherwise, the service systems corresponding to the abnormal log information are abnormal;

or the judging whether all the log information is normal specifically includes:

5. The system of claim 4, further comprising:

6. The system of claim 5, further comprising:

7. An apparatus for monitoring a business system, comprising a processor for implementing the steps of the method for monitoring a business system according to any one of claims 1 to 3 when executing a program stored in a memory.