CN112612679A - System running state monitoring method and device, computer equipment and storage medium - Google Patents

System running state monitoring method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112612679A
CN112612679A CN202011602580.3A CN202011602580A CN112612679A CN 112612679 A CN112612679 A CN 112612679A CN 202011602580 A CN202011602580 A CN 202011602580A CN 112612679 A CN112612679 A CN 112612679A
Authority
CN
China
Prior art keywords
state
matching
service request
data
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011602580.3A
Other languages
Chinese (zh)
Inventor
潘鸿
郭安福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiping Finance Technology Services Shanghai Co ltd
Original Assignee
Taiping Finance Technology Services Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiping Finance Technology Services Shanghai Co ltd filed Critical Taiping Finance Technology Services Shanghai Co ltd
Priority to CN202011602580.3A priority Critical patent/CN112612679A/en
Publication of CN112612679A publication Critical patent/CN112612679A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application relates to a system running state monitoring method and device, computer equipment and a storage medium. The method comprises the following steps: acquiring an operation file corresponding to a system, and extracting state data from the operation file; matching the state data with a preset state matching library to obtain a matching result; when the matching result corresponds to successful matching, determining the running state of the system according to a state matching library which is successfully matched; and when the matching result corresponds to a matching failure, analyzing the service request data in the state data to determine the operating state of the system. By adopting the method, the monitoring capability of the running state of the application system can be improved.

Description

System running state monitoring method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for monitoring a system operating state, a computer device, and a storage medium.
Background
With the rapid development of the internet, especially the rise of micro service architecture, container technology and the like, the modern application system is made larger and more complex. The fault symptoms and reasons of the application system are more and more diversified, so that the judgment of the health state of the system is more complicated.
In the face of diversity and complexity of application system fault symptoms, the conventional scheme can utilize existing application performance monitoring software (APM), such as open source tools like Zipkin, skywalk, Pinpoint, and the like, and some commercial software, such as Dynatrace, auspice, and the like, to monitor the health state of the application system.
However, the conventional tool or software can only monitor the health state of the application system according to the completed request, which results in a relatively delayed monitoring of the health state of the system, and a failure of the application system cannot be found in time, thereby resulting in a relatively low monitoring accuracy of the running state of the system.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a method, an apparatus, a computer device and a storage medium for monitoring an operating state of an application system, which can improve the capability of monitoring the operating state of the application system.
A system operation state monitoring method comprises the following steps:
acquiring an operation file corresponding to a system, and extracting state data from the operation file;
matching the state data with a preset state matching library to obtain a matching result;
when the matching result corresponds to the matching success, determining the running state of the system according to the state matching library which is matched successfully;
and when the matching result corresponds to the matching failure, analyzing the service request data in the state data to determine the operating state of the system.
In one embodiment, extracting state data from the run file comprises:
acquiring an analysis algorithm corresponding to the running file;
analyzing the operation file according to an analysis algorithm to obtain an operation analysis file;
and extracting state data corresponding to the preset state features from the operation analysis file.
In one embodiment, parsing the operation file according to a parsing algorithm to obtain an operation parsing file includes:
extracting identifiers from the running file according to an analysis algorithm;
dividing the running file into an identifier and a field according to the identifier;
and obtaining the operation analysis file according to the identifier and the field.
In one embodiment, the state matching library includes a failure matching library and a normal matching library; generating a state matching library, comprising:
acquiring a corresponding historical operating file of the system operating in a preset time period;
determining initial response time according to actual response time corresponding to each service request in the historical operating file;
when the actual response time is longer than the initial response time, judging that the request state of the service request is a fault state;
otherwise, judging the request state of the service request as a normal state;
and generating a normal matching library according to the service request in the normal state, and generating a fault matching library according to the service request in the fault state.
In one embodiment, after the normal matching library is generated according to the service request in the normal state and the fault matching library is generated according to the service request in the fault state, the method further includes:
acquiring corresponding fault data in a fault matching library within a preset time period and corresponding normal data in a normal matching library within the preset time period;
determining the fault misjudgment rate according to the relative value of the fault data and the normal data;
and when the fault misjudgment rate is greater than the preset misjudgment threshold value, adjusting the initial response time so that the fault misjudgment rate obtained according to the adjusted initial response time is less than the preset misjudgment threshold value.
In one embodiment, determining the initial response time according to the actual response time corresponding to each service request in the history running file includes:
extracting actual response time corresponding to each service request from a historical operating file;
sequencing the service requests according to the actual response time corresponding to each service request;
and taking the actual response time corresponding to the service request at the preset sequencing position as the initial response time.
In one embodiment, the step of determining the matching result includes:
when the state data is successfully matched with any preset state matching library, judging that the matching result is successfully matched correspondingly;
and when the state data is failed to be matched with any preset state matching library, judging that the matching result is corresponding to the matching failure.
In one embodiment, the state data is matched with a preset state matching library to obtain a matching result; when the matching result corresponds to the matching success, determining the running state of the system according to the state matching library which is matched successfully, wherein the method comprises the following steps:
matching the state data with a preset fault matching library;
when the state data is successfully matched with the fault matching library, judging that the matching result is successfully matched correspondingly, and judging that the running state of the system is a fault state; otherwise, matching the state data with a preset normal matching library;
when the state data is successfully matched with the normal matching library, judging that the matching result is successfully matched and the running state of the system is a healthy state; otherwise, judging that the matching result is corresponding to matching failure.
In one embodiment, when the matching result corresponds to a matching failure, analyzing the service request data in the status data to determine the operating status of the system, including:
when the matching result is matching failure, extracting corresponding service request data in a preset time period from the state matching library;
acquiring a service request state corresponding to each service request data;
classifying the service request data according to the service request state;
acquiring the quantity variation trend of each category of service request data in a preset time period;
and determining the running state of the system according to the quantity change trend of the service request data of each category.
In one embodiment, when the matching result corresponds to a matching failure, analyzing the service request data in the status data to determine the operating status of the system, including:
when the matching result corresponds to a matching failure, extracting the service request in an unfinished state from the state data;
determining the current response time corresponding to the service request in an uncompleted state;
comparing the current response time with the initial response time corresponding to the current time to obtain a comparison result;
determining the request state of the service request in an unfinished state according to the comparison result;
and determining the running state of the system according to the request state.
In one embodiment, the current response time is compared with the initial response time corresponding to the current time to obtain a comparison result; determining the request state of the service request in the unfinished state according to the comparison result, wherein the request state comprises the following steps:
if the current response time is longer than the initial response time corresponding to the current time, judging that the comparison result is a comparison failure and the request state of the service request in the unfinished state is a normal state;
otherwise, the comparison result is judged to be successful, and the request state of the service request in the unfinished state is judged to be a fault state.
In one embodiment, after determining the request status of the service request in the incomplete status according to the comparison result, the method further includes:
adding the service request in the normal state to a normal matching library;
and adding the service request in the fault state to a fault matching library.
A system operating condition monitoring apparatus, the apparatus comprising:
the acquisition module is used for acquiring an operation file corresponding to the system and extracting state data from the operation file;
the matching module is used for matching the state data with a preset state matching library to obtain a matching result;
the first determining module is used for determining the running state of the system according to the state matching library which is successfully matched when the matching result corresponds to the matching success;
and the second determining module is used for analyzing the service request data in the state data to determine the running state of the system when the matching result corresponds to the matching failure.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the method when the processor executes the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
The method and the device for monitoring the system running state, the computer equipment and the storage medium acquire the running file corresponding to the system running; extracting state data from the running file; matching the state data with a preset state matching library to obtain a matching result; when the matching result corresponds to the matching success, determining the running state of the system according to the corresponding state matching library when the matching is successful; and when the matching result corresponds to the matching failure, analyzing the service request data in the state data to determine the operating state of the system. The operating state of the system is judged by using the corresponding operating file during the operation of the system, so that the operating state of the system can be judged in time according to the data in operation, but not according to the finished data of the system, the fault data in the operating process of the system can be found in time, and the monitoring accuracy of the operating state of the system is improved.
Drawings
FIG. 1 is a diagram of an exemplary system operating condition monitoring method;
FIG. 2 is a flow chart illustrating a method for monitoring the operational status of a system according to an embodiment;
FIG. 3 is a flow diagram for determining an operational state of a system using a state matching library in one embodiment;
FIG. 4 is a flow diagram illustrating a process for generating a state matching library, according to an embodiment;
FIG. 5 is a flow chart illustrating a system state discrimination process when a state matching library fails to match according to an embodiment;
fig. 6 is a schematic diagram illustrating an embodiment of analyzing service request data in status data to determine an operating status of a system;
fig. 7 is a schematic diagram of analyzing service request data in status data to determine an operating status of the system according to another embodiment;
fig. 8 is a schematic diagram of a run file acquisition method provided in an embodiment;
FIG. 9 is a block diagram showing an arrangement of a system operation state monitoring apparatus;
FIG. 10 is a diagram showing an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The system running state monitoring method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 110 communicates with the server 120 through a network. The server 120 acquires an operation file corresponding to the system and extracts state data from the operation file; matching the state data with a preset state matching library to obtain a matching result; when the matching result corresponds to the matching success, determining the running state of the system according to the state matching library which is matched successfully; and when the matching result corresponds to the matching failure, analyzing the service request data in the state data to determine the operating state of the system. Further, the server 120 may transmit the operation state to the terminal 110 and display it. The terminal 110 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 120 may be implemented by an independent server or a server cluster formed by a plurality of servers.
In one embodiment, as shown in fig. 2, a method for monitoring the operating status of a system is provided, which is described by taking the method as an example applied to the server 120 in fig. 1, and includes the following steps:
step S202, acquiring an operation file corresponding to the system, and extracting state data from the operation file.
The system may be a system that can execute a specific service, for example, the system may include one or more servers, and the servers in the system may be classified into a load server, a service server, or an application server according to the function of each server in the system. In particular, the load server may be a central server in a distributed deployment environment, and the central server may monitor the operational data of the node servers of other distributed deployments. The service server may be a server capable of executing a specific service, such as a personnel management server. And each server may receive or generate a service request and perform service processing on the service request.
The running file is a data file generated in the running process of the system, and may specifically be a log file generated in the running process of the system, which is not limited herein. Specifically, the running file records running data corresponding to the system in the running process, for example, the running file may include a log file corresponding to the system when processing the received service request, and the log file may include time for receiving the service request, time for completing the service request, a request address corresponding to the received service request, a server identifier corresponding to the execution service request, and the like.
Specifically, the running file includes state data and non-state data corresponding to the system running, and in a specific implementation, the computer device may extract the state data from the running data according to a preset state extraction algorithm. The state data refers to data capable of representing a specific state, and specifically may be information such as a service request unique identifier, a service request address, a service request method, service request start time, a service request completion flag, and service request completion time.
And step S204, matching the state data with a preset state matching library to obtain a matching result.
The state matching library is preset, for example, the computer can store the state data in the matching library in advance to obtain the state matching library. The number of the state matching libraries can be one or more, and the data stored in different state matching libraries are different, for example, the state matching library can be a fault matching library in which fault data is stored, and for example, the state matching library can also be a normal matching library in which normal data is stored. The fault data refers to data corresponding to a system fault, and the normal data refers to data corresponding to a system in normal operation.
In specific implementation, the computer device matches the state data extracted from the running file with one or more preset state matching libraries to obtain a matching result. When the number of the state matching libraries is multiple, the method also comprises the step of setting the same or different priority orders for different state matching libraries, so that the computer can respectively match the state data with the state matching libraries according to the preset order.
And step S206, when the matching result corresponds to successful matching, determining the running state of the system according to the state matching library which is successfully matched.
The successful matching means that the matching degree of the state data and the data files in the state matching library reaches a preset threshold value. Specifically, when the number of the state matching banks is multiple, the matching success may represent that the state data is successfully matched with any one or more of the state matching banks.
In one embodiment, the step of determining the matching result includes: when the state data is successfully matched with any preset state matching library, judging that the matching result is successfully matched correspondingly; and when the state data is failed to be matched with any preset state matching library, judging that the matching result is corresponding to the matching failure.
Specifically, the state matching library comprises a fault matching library and a normal matching library, when the state data is successfully matched with the normal matching library, the operating state of the system is determined according to the normal matching library, if the operating state of the system can be judged to be normal operation, and when the state data is successfully matched with the fault matching library, the operating state of the system is determined according to the fault matching library, if the operating state of the system can be judged to be fault operation.
Referring to fig. 3, fig. 3 is a flow chart illustrating a method for determining an operation status of a system using a status matching base according to an embodiment. In one embodiment, the state data is matched with a preset state matching library to obtain a matching result; when the matching result corresponds to the matching success, determining the running state of the system according to the state matching library which is matched successfully; the method comprises the following steps: matching the state data with a preset fault matching library; when the state data is successfully matched with the fault matching library, judging that the matching result is successfully matched correspondingly, and judging that the running state of the system is a fault state; otherwise, matching the state data with a preset normal matching library; when the state data is successfully matched with the normal matching library, judging that the matching result is successfully matched and the running state of the system is a healthy state; otherwise, judging that the matching result is corresponding to matching failure.
And when the computer equipment judges that the system is in the fault operation state, early warning information can be sent in time, for example, early warning alarm, early warning information and other warnings can be sent, so that the fault operation state of the system can be early warned in time, and the system with the fault can be processed in time.
And step S208, when the matching result corresponds to a matching failure, analyzing the service request data in the state data to determine the running state of the system.
When the state data cannot be matched with the preset state matching library, the situation that the current operation state of the system cannot be judged by data prestored in the preset state matching library is shown, so that the computer equipment fails to judge the matching result by using the preset state matching library. The computer device then further includes analyzing the service request data in the status data to further determine an operational status of the system.
The service request data may be data obtained by processing a service request received by the system within a preset time period, or may also be data obtained by identifying the service request, for example, the service request data may include information such as a time for receiving the service request, a time when the service request is completed, and a request address for receiving the service request. In particular, the computer device may also analyze the service request data to determine an operational state of the system. For example, the request time of the service request may be analyzed to determine whether the service request is overtime, the request address of the service request may be analyzed to determine whether the request address of the service request is an illegal address, and the service request data in a period of time may be statistically analyzed to determine whether the service request in the period of time has a fault or not.
The system operation state monitoring method acquires an operation file corresponding to system operation; extracting state data from the running file; matching the state data with a preset state matching library to obtain a matching result; when the matching result corresponds to the matching success, determining the running state of the system according to the corresponding state matching library when the matching is successful; and when the matching result corresponds to the matching failure, analyzing the service request data in the state data to determine the operating state of the system. The operating state of the system is judged by using the corresponding operating file during the operation of the system, so that the operating state of the system can be judged in time according to the data in operation, but not according to the finished data of the system, the fault data in the operating process of the system can be found in time, and the monitoring accuracy of the operating state of the system is improved.
The method specifically comprises the steps of firstly establishing a fault matching library and a normal matching library, and then extracting application state data from an operation file such as a log file; and comparing the applied state data with the data in the fault matching library and the normal matching library to analyze the health state of the system. And realize the functions of warning and the like around the health state of the system.
In one embodiment, the state matching library includes a failure matching library and a normal matching library; generating a state matching library, comprising: acquiring a corresponding historical operating file of the system operating in a preset time period; determining initial response time according to actual response time corresponding to each service request in the historical operating file; when the actual response time is longer than the initial response time, judging that the request state of the service request is a fault state; otherwise, judging the request state of the service request as a normal state; and generating a normal matching library according to the service request in the normal state, and generating a fault matching library according to the service request in the fault state.
The history running file is a file corresponding to a period of history, for example, the history running file may be data corresponding to a day or a month, and the like, which is not limited herein. The actual response time is the response time of the system to the received service request in the historical time. The initial response time is extracted from the actual response time, for example, it may be a time randomly determined from the actual response time, or it may be a time determined from the actual response time according to other logic, or it may be a statistical time obtained after analyzing the actual response time, and the like, which is not limited herein.
Specifically, the selected initial response time is used as a time threshold, then the actual response time corresponding to each service request in the historical operating file is compared with the determined initial response time in the computer equipment, and when the actual response time is greater than the initial response time, the actual response time is greater than the preset time, which indicates that the system may have a response overtime condition, so that the request state of the service request is determined to be a fault state; otherwise, judging the request state of the service request as a fault state; and generating a normal matching library according to the service request in the normal state, and generating a fault matching library according to the service request in the fault state.
In one embodiment, determining the initial response time according to the actual response time corresponding to each service request in the history running file includes: extracting actual response time corresponding to each service request from a historical operating file; sequencing the service requests according to the actual response time corresponding to each service request; and taking the actual response time corresponding to the service request at the preset sequencing position as the initial response time.
Specifically, the actual response times may be sorted from small to large in the computer device according to the size of each actual response time, and the actual response time corresponding to the service request at the preset sorting position is used as the initial response time. If the preset sorting position corresponds to the sorting value at the 80% position, and if the actual response time sorted at the 80% position is used as the initial response time, in other embodiments, the preset sorting position may be adjusted according to a specific scenario, which is not limited herein.
As shown in fig. 4, fig. 4 is a schematic diagram of a generation flow of a state matching library in an embodiment. In fig. 4, the computer device obtains a history running file corresponding to the system when the history n is 30 days, and obtains an actual response time corresponding to one or more service requests from the history running file. And sequencing the actual response time from small to large according to the length of the actual response time, and taking the actual response time at the position of m (0< m <1) as a time threshold R, wherein m sets an initial position value of 0.9 according to experience, and can also be set as other values. And then the computer equipment divides the service requests in the historical service data into different state matching libraries according to the size relationship between the actual response time and the initial response time corresponding to each service request. Specifically, when R is less than or equal to R, the corresponding service request is synchronized to the normal matching library, otherwise, the service request is synchronized to the failure matching library.
In one embodiment, after the normal matching library is generated according to the service request in the normal state and the fault matching library is generated according to the service request in the fault state, the method further includes: acquiring corresponding fault data in a fault matching library within a preset time period and corresponding normal data in a normal matching library within the preset time period; determining the fault misjudgment rate according to the relative value of the fault data and the normal data; and when the fault misjudgment rate is greater than the preset misjudgment threshold value, adjusting the initial response time so that the fault misjudgment rate obtained according to the adjusted initial response time is less than the preset misjudgment threshold value.
In order to ensure the accuracy of the data in the fault matching library and the normal matching library, the method also comprises the step of adjusting the initial response time so as to ensure the accuracy of the state matching library divided according to the initial response time. Specifically, the computer device analyzes data in the fault matching library, for example, counts the number of times E of system faults occurring in n days of history, and analyzes data in the normal matching library to unify the number of times E of false alarm in n days of history, calculates a false alarm rate as rate/(E + E), then compares the false alarm rate with a preset false judgment threshold, and when the false alarm rate obtained by actual calculation is greater than the preset false judgment threshold, it indicates that the probability of dividing errors is greater when the state matching library is divided by using the initial response time, so that the initial response time needs to be adjusted, so that the fault false judgment rate obtained according to the adjusted initial response time is less than the preset false judgment threshold. It should be noted that, in a specific embodiment, the initial value n is 30 and m is 0.9 may be set empirically, and the computer device may also dynamically adjust the values of n and m as the history running file is rich, for example, when the required false alarm rate is not more than 1%.
In a specific application scenario, in a specific application scenario: generating a fault matching library and a non-fault matching library: the computer device obtains the corresponding running file within 10 s. For the service requests in the running file, the request time corresponding to the service requests is sorted according to the time, the request time at 90% of the position is set as the initial response time, for example, 2s, then when a new service request is received, the request time of the new service request is compared with 2s, when the request time is less than 2s, the service request is divided into normal requests, otherwise, the service request is divided into fault requests, and therefore an initial fault matching library and a normal matching library can be obtained.
And updating the fault matching library and the normal matching library: the computer device analyzes the fault request in the fault matching library again to obtain a false alarm rate, for example, the false alarm rate may be 10%, 5%, or 1%, and if the preset fault rate is within 1%, the fault rate at this time is 5%, which indicates that the time of 2s at this time is not in accordance with the standard, and needs to dynamically update and adjust 2s until the fault rate is controlled within 1%, which indicates that the initial response time at this time is appropriate. When the request is received again, the determination is made according to the newly determined initial response time.
As shown in fig. 5, fig. 5 provides a flow chart of system state discrimination when the matching of the state matching library fails. In one embodiment, when the matching result corresponds to a matching failure, analyzing the service request data in the status data to determine the operating status of the system, including: and further analyzing the state data to judge whether the state data is a latent fault feature, and when the state data is the latent fault feature, judging that the state data comprises fault data and the running state of the system is a fault state. Conversely, when the fault is not a latent fault feature, it is determined that the fault data is not included in the status data, and the operating status of the system is a healthy status.
In one embodiment, when the matching result corresponds to a matching failure, analyzing the service request data in the status data to determine the operating status of the system, including: when the matching result corresponds to a matching failure, extracting the service request in an unfinished state from the state data; determining the current response time corresponding to the service request in an uncompleted state; comparing the current response time with the initial response time corresponding to the current time to obtain a comparison result; determining the request state of the service request in an unfinished state according to the comparison result; and determining the running state of the system according to the request state.
The state data includes service requests in a completed state and service requests in an uncompleted state. When the state data and the preset state matching library are failed to be matched, it is indicated that the request state of the service request in the state data cannot be judged by using the current preset state matching library, and meanwhile, the health state of the system cannot be judged. Therefore, the uncompleted service request can be extracted from the status data in the computer equipment, and the processing status of the system for the service request in the uncompleted status can be judged by analyzing the uncompleted service request in real time.
Specifically, the current response time corresponding to the service request in the uncompleted state is compared with the initial response time in the computer device to obtain a comparison result. In one embodiment, the current response time is compared with the initial response time corresponding to the current time to obtain a comparison result; determining the request state of the service request in the unfinished state according to the comparison result, wherein the request state comprises the following steps: if the current response time is longer than the initial response time corresponding to the current time, judging that the comparison result is a comparison failure and the request state of the service request in the unfinished state is a normal state; otherwise, the comparison result is judged to be successful, and the request state of the service request in the unfinished state is judged to be a fault state.
As shown in fig. 6, fig. 6 is a schematic diagram of analyzing service request data in status data to determine an operating status of the system according to an embodiment. Specifically, the computer device extracts the service request in the incomplete state from the state data, and calculates a current response time r' (the incomplete request has consumed time) corresponding to the service request in the incomplete state, specifically, the current response time is equal to a difference between the current time and the request start time. If R' > R, it means that the service request state in the incomplete state is in the request overtime state, so that the incomplete service request can be synchronized to the fault matching library, and at the same time, it can also be determined that the system is possibly in the fault state, and an alarm is triggered. Otherwise, if R' is less than or equal to R, it indicates that the service request state in the incomplete state is not overtime, so the service request which is not overtime can be synchronized to the normal matching library. The alarm of the system is realized based on the Alertmanager which is open source, and can be used for connecting various notification channels, such as short messages, WeChat, nails, mails and the like. On the one hand, is responsible for providing grouping, noise reduction, alarm suppression, etc., and on the other hand, is responsible for interfacing with a wide variety of receivers, such as mail, nails, WeChats, short messages, etc. When the alarm receives the alarm, the alarm is sent to the correct receiver through the route, and the receiver sends the alarm to the related personnel.
Further, when the computer device determines that R' > R, whether the system is in a real fault state can be verified again through manual work, if the system is determined to be in the real fault state, the incomplete service request is synchronized to the fault matching library, and meanwhile, the system can be determined to be possibly in the fault state, and an alarm is triggered. If the manual checking system is not in a real fault state, the incomplete service request is marked as a misjudgment fault request, and the incomplete service request is synchronized to a normal matching library.
With the rapid development of the internet, especially the rise of micro-service architecture, container technology and the like, the scale of a modern application system is larger and more complex, the number of nodes and service requests are exponentially increased, and the stability of the system is more and more difficult to guarantee. In the face of the diversity and complexity of faults, ecological intelligent monitoring modes such as panoramic monitoring, intelligent diagnosis and diverse data acquisition are adopted, and early fault finding and rapid diagnosis are realized. In the embodiment, the service requests in the incomplete state in the state data are analyzed and processed, so that the operating state of the system can be found in time, particularly when the system has a fault, the health state of the system is monitored through the service requests in the incomplete state instead of judging the health state of the system after the system finishes the service requests, the monitoring effect and the monitoring timeliness of the health state of the system are improved, and the monitoring efficiency of the system is further improved.
In one embodiment, when the matching result corresponds to a matching failure, analyzing the service request data in the status data to determine the operating status of the system, including: when the matching result is matching failure, extracting corresponding service request data in a preset time period from the state matching library; acquiring a service request state corresponding to each service request data; classifying the service request data according to the service request state; acquiring the quantity variation trend of each category of service request data in a preset time period; and determining the running state of the system according to the quantity change trend of the service request data of each category.
Specifically, the computer device extracts service request data corresponding to a preset time period from the fault matching library and the normal matching library, and performs category division on the service request data according to state data corresponding to each service request, so that service requests (unfinished service requests) in normal processing, finished service requests and the like can be obtained. And also can count the corresponding factor indexes in the preset time period, such as the newly added service request quantity, the total request quantity of the currently completed service request and the request quantity of the currently processed service request.
Specifically, as shown in table one, table one is a rule table for determining the state of the service system, as shown in fig. 7, and fig. 7 is a schematic diagram for analyzing service request data in state data to determine the operating state of the system according to another embodiment. The rule for judging the health state of the system according to the factor indexes analyzed by statistics is as follows: when there is a new service request in the preset time period, the amount of the uncompleted service request is increasing, and the amount of the completed service request is not changed, it is determined that the request processing in the system is likely to be blocked and needs to be processed immediately. If the computer device determines that there is a newly added service request within the preset time period, there are unfinished service requests, and the amount of the finished service requests is also increasing, then analysis and judgment can be performed according to the size relationship between the current response time of the unfinished service requests in the state data and the preset response time. If there is no new service request in the preset time period, the request amount of the uncompleted service request is decreasing, and the request amount of the completed service request is increasing, it indicates that the uncompleted service request is still being processed normally, and it is determined that the system is in a healthy state. If there are no new service requests and there are no unfinished service requests within the preset time period, the computer device determines that the system may be in an idle state at this time.
Table-rule table for determining state of service system
Figure BDA0002869702920000141
In one embodiment, after determining the request status of the service request in the incomplete status according to the comparison result, the method further includes: adding the service request in the normal state to a normal matching library; and adding the service request in the fault state to a fault matching library.
In one embodiment, extracting state data from the run file comprises: acquiring an analysis algorithm corresponding to the running file; analyzing the operation file according to an analysis algorithm to obtain an operation analysis file; and extracting state data corresponding to the preset state features from the operation analysis file.
Specifically, the source of the operation file may include multiple types of service systems, and the types of the different service systems corresponding to your service operation file may include multiple types, and the different types of service operation files may correspond to different or the same formats and parsing algorithms. As shown in fig. 8, fig. 8 is a schematic diagram of a run file obtaining method provided in an embodiment. In fig. 8, the collection process of the operation files includes that a plurality of operation file collectors collect operation files corresponding to the distributed system, the proxy service system, and the application service system, and then the operation files are uniformly aggregated into the operation file aggregation library.
The sources of the operation files can comprise a distributed business system, a proxy service business system, a business application business system and the like. Specifically, the operation file collector is used for respectively collecting service operation files from different types of service systems, and then the collected service operation files are integrated to obtain an operation file aggregation library, wherein the operation file aggregation library has operation files corresponding to system operation.
The operating file comprises state data corresponding to the system operation and non-state data corresponding to the system operation, and operating files from different sources may correspond to different data types, so in specific implementation, the method further comprises the steps of obtaining the data types corresponding to the operating files from different sources, then obtaining an analysis algorithm matched with the data types, analyzing the operating files according to the matched analysis algorithm to obtain analysis files, and then extracting the state data corresponding to the preset state features from the analysis files.
Specifically, incremental analysis is performed on the running file aggregation library in the computer device (for example, data is counted once every preset time and data analysis is performed), if the analysis result includes information such as a request unique identifier, a request address, a request method, a request start time, a request completion flag, and a request completion time, the information is regarded as state data, and the extracted state data is stored in the state matching library. And if the analysis result is not the state data, filtering and not processing.
In one embodiment, parsing the operation file according to a parsing algorithm to obtain an operation parsing file includes: extracting identifiers from the running file according to an analysis algorithm; dividing the running file into an identifier and a field according to the identifier; and obtaining the operation analysis file according to the identifier and the field.
Specifically, the format of the running file may be that different fields are divided by using identifiers, that is, the identifiers are used to divide the fields in the running file to obtain a running parsing file. If the first behavior of the running file corresponding to the distributed running system is: URL, space, status code, space, response time, space, browser version, etc., where space may be an identifier and other data is a field. Therefore, the running file can be split in the computer equipment by taking the blank as an identifier according to the corresponding data format to obtain a plurality of fields, and then the preset state data is matched with each field, so that the corresponding state data is extracted from the fields.
Based on the scheme, the method and the device realize early discovery of the fault hidden danger of the system. In the face of diversity and complexity of fault characteristics, based on application state characteristic monitoring, a log aggregation library, a fault matching library and the like are collected and built to provide support for judging initial characteristics of faults and finding out hidden faults in advance. And a primary intelligent diagnosis function, which can quickly match fault characteristics, intelligently diagnose and primarily analyze fault reasons and reduce communication coordination cost in fault treatment through a fault intelligent analyzer. And the joint diagnosis function judges the health state of the system from the dimension of the request being processed, and makes up the deficiency of the monitoring and judgment of the existing monitoring software or tool on the health state of the system. Through multi-dimensional performance index analysis, the capability of rapidly troubleshooting and accurately positioning problems is improved, and the system monitoring efficiency is improved.
It should be understood that although the various steps in the flow charts of fig. 2-8 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-8 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.
In one embodiment, as shown in fig. 9, fig. 9 provides a system operation status monitoring apparatus, which may be a part of a computer device using a software module or a hardware module, or a combination of the two, and specifically includes:
an obtaining module 902, configured to obtain an operation file corresponding to a system, and extract state data from the operation file;
a matching module 904, configured to match the state data with a preset state matching library to obtain a matching result;
a first determining module 906, configured to determine, when the matching result corresponds to a successful matching, an operating state of the system according to the state matching library for which the matching is successful;
and a second determining module 908, configured to analyze the service request data in the status data to determine an operating status of the system when the matching result corresponds to a matching failure.
In an embodiment, the obtaining module 902 is further configured to obtain an analysis algorithm corresponding to the running file; analyzing the operation file according to an analysis algorithm to obtain an operation analysis file; and extracting state data corresponding to the preset state features from the operation analysis file.
In one embodiment, the obtaining module 902 is further configured to extract the identifier from the running file according to a parsing algorithm; dividing the running file into an identifier and a field according to the identifier; and obtaining the operation analysis file according to the identifier and the field.
In one embodiment, the system operation state monitoring device further comprises a matching library generation module, configured to obtain a historical operation file corresponding to the system operating within a preset time period; determining initial response time according to actual response time corresponding to each service request in the historical operating file; when the actual response time is longer than the initial response time, judging that the request state of the service request is a normal state; otherwise, judging the request state of the service request as a fault state; and generating a normal matching library according to the service request in the normal state, and generating a fault matching library according to the service request in the fault state.
In one embodiment, the system running state monitoring device is further configured to obtain fault data corresponding to a preset time period in the fault matching library and normal data corresponding to a preset time period in the normal matching library; determining the fault misjudgment rate according to the relative value of the fault data and the normal data; and when the fault misjudgment rate is greater than the preset misjudgment threshold value, adjusting the initial response time so that the fault misjudgment rate obtained according to the adjusted initial response time is less than the preset misjudgment threshold value.
In one embodiment, the system running state monitoring device is further configured to extract actual response time corresponding to each service request from the historical running file; sequencing the service requests according to the actual response time corresponding to each service request; and taking the actual response time corresponding to the service request at the preset sequencing position as the initial response time.
In one embodiment, the system running state monitoring device is further configured to determine that the matching result corresponds to a successful matching when the state data is successfully matched with any one preset state matching library; and when the state data is failed to be matched with any preset state matching library, judging that the matching result is corresponding to the matching failure.
In one embodiment, the system running state monitoring device is further configured to match the state data with a preset fault matching library; when the state data is successfully matched with the fault matching library, judging that the matching result is successfully matched correspondingly, and judging that the running state of the system is a fault state; otherwise, matching the state data with a preset normal matching library; when the state data is successfully matched with the normal matching library, judging that the matching result is successfully matched and the running state of the system is a healthy state; otherwise, judging that the matching result is corresponding to matching failure.
In one embodiment, the second determining module 908 is further configured to, when the matching result corresponds to a matching failure, extract corresponding service request data within a preset time period from the state matching library; acquiring a service request state corresponding to each service request data; classifying the service request data according to the service request state; acquiring the quantity variation trend of each category of service request data in a preset time period; and determining the running state of the system according to the quantity change trend of the service request data of each category.
In one embodiment, the second determining module 908 is further configured to extract the service request in an incomplete state from the state data when the matching result corresponds to a matching failure; determining the current response time corresponding to the service request in an uncompleted state; comparing the current response time with the initial response time corresponding to the current time to obtain a comparison result; determining the request state of the service request in an unfinished state according to the comparison result; and determining the running state of the system according to the request state.
In an embodiment, the second determining module 908 is further configured to determine that the comparison result is a comparison failure and the request status of the service request in the uncompleted state is a normal status if the current response time is greater than the initial response time corresponding to the current time; otherwise, the comparison result is judged to be successful, and the request state of the service request in the unfinished state is judged to be a fault state.
In one embodiment, the system running state monitoring device is further configured to add the service request in the normal state to the normal matching library; and adding the service request in the fault state to a fault matching library.
For specific limitations of the system operation state monitoring device, reference may be made to the above limitations of the system operation state monitoring method, which are not described herein again. The modules in the system operation state monitoring device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing system running state monitoring data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a system operation state monitoring method.
Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program implementing: acquiring an operation file corresponding to a system, and extracting state data from the operation file; matching the state data with a preset state matching library to obtain a matching result; when the matching result corresponds to the matching success, determining the running state of the system according to the state matching library which is matched successfully; and when the matching result corresponds to the matching failure, analyzing the service request data in the state data to determine the operating state of the system.
In one embodiment, the processor when executing the computer program is further operable when extracting the state data from the run file to: acquiring an analysis algorithm corresponding to the running file; analyzing the operation file according to an analysis algorithm to obtain an operation analysis file; and extracting state data corresponding to the preset state features from the operation analysis file.
In one embodiment, the processor, when executing the computer program, further performs the following steps when parsing the run file according to the parsing algorithm to obtain the run parse file: extracting identifiers from the running file according to an analysis algorithm; dividing the running file into an identifier and a field according to the identifier; and obtaining the operation analysis file according to the identifier and the field.
In one embodiment, the computer program when executed by the processor is further operable to: acquiring a corresponding historical operating file of the system operating in a preset time period; determining initial response time according to actual response time corresponding to each service request in the historical operating file; when the actual response time is longer than the initial response time, judging that the request state of the service request is a fault state; otherwise, judging the request state of the service request as a normal state; and generating a normal matching library according to the service request in the normal state, and generating a fault matching library according to the service request in the fault state.
In one embodiment, the processor, when executing the computer program, implements generating a normal matching library according to the normal state service request, and after generating a failure matching library according to the failure state service request, is further configured to: acquiring corresponding fault data in a fault matching library within a preset time period and corresponding normal data in a normal matching library within the preset time period; determining the fault misjudgment rate according to the relative value of the fault data and the normal data; and when the fault misjudgment rate is greater than the preset misjudgment threshold value, adjusting the initial response time so that the fault misjudgment rate obtained according to the adjusted initial response time is less than the preset misjudgment threshold value.
In one embodiment, the processor, when executing the computer program, is further configured to determine the initial response time according to the actual response time corresponding to each service request in the historical operating file: extracting actual response time corresponding to each service request from a historical operating file; sequencing the service requests according to the actual response time corresponding to each service request; and taking the actual response time corresponding to the service request at the preset sequencing position as the initial response time.
In one embodiment, the computer program when executed by the processor is further operable to: when the state data is successfully matched with any preset state matching library, judging that the matching result is successfully matched correspondingly; and when the state data is failed to be matched with any preset state matching library, judging that the matching result is corresponding to the matching failure.
In one embodiment, the processor implements matching of the state data with a preset state matching library to obtain a matching result when executing the computer program; when the matching result corresponds to a matching success, determining the running state of the system according to the state matching library which is successfully matched, and further using the running state of the system to: matching the state data with a preset fault matching library; when the state data is successfully matched with the fault matching library, judging that the matching result is successfully matched correspondingly, and judging that the running state of the system is a fault state; otherwise, matching the state data with a preset normal matching library; when the state data is successfully matched with the normal matching library, judging that the matching result is successfully matched and the running state of the system is a healthy state; otherwise, judging that the matching result is corresponding to matching failure.
In one embodiment, the processor, when executing the computer program, implements that when the matching result corresponds to a matching failure, analyzing the service request data in the status data to determine the operating status of the system is further configured to: when the matching result is matching failure, extracting corresponding service request data in a preset time period from the state matching library; acquiring a service request state corresponding to each service request data; classifying the service request data according to the service request state; acquiring the quantity variation trend of each category of service request data in a preset time period; and determining the running state of the system according to the quantity change trend of the service request data of each category.
In one embodiment, the processor, when executing the computer program, is further configured to, when the matching result corresponds to a matching failure, analyze the service request data in the status data to determine an operating status of the system: when the matching result corresponds to a matching failure, extracting the service request in an unfinished state from the state data; determining the current response time corresponding to the service request in an uncompleted state; comparing the current response time with the initial response time corresponding to the current time to obtain a comparison result; determining the request state of the service request in an unfinished state according to the comparison result; and determining the running state of the system according to the request state.
In one embodiment, when the processor executes the computer program, the current response time is compared with the initial response time corresponding to the current time to obtain a comparison result; and when determining the request state of the service request in the unfinished state according to the comparison result, the method is further used for: if the current response time is longer than the initial response time corresponding to the current time, judging that the comparison result is a comparison failure and the request state of the service request in the unfinished state is a normal state; otherwise, the comparison result is judged to be successful, and the request state of the service request in the unfinished state is judged to be a fault state.
In one embodiment, the processor, when executing the computer program, after determining the request status of the service request in the incomplete status according to the comparison result, is further configured to: adding the service request in the normal state to a normal matching library; and adding the service request in the fault state to a fault matching library.
In one embodiment, a computer readable storage medium is provided, storing a computer program that when executed by a processor implements: acquiring an operation file corresponding to a system, and extracting state data from the operation file; matching the state data with a preset state matching library to obtain a matching result; when the matching result corresponds to the matching success, determining the running state of the system according to the state matching library which is matched successfully; and when the matching result corresponds to the matching failure, analyzing the service request data in the state data to determine the operating state of the system.
In one embodiment, the computer program when executed by the processor further operable to: acquiring an analysis algorithm corresponding to the running file; analyzing the operation file according to an analysis algorithm to obtain an operation analysis file; and extracting state data corresponding to the preset state features from the operation analysis file.
In one embodiment, the computer program when executed by the processor further performs the following steps when parsing the run file according to a parsing algorithm to obtain a run parse file: extracting identifiers from the running file according to an analysis algorithm; dividing the running file into an identifier and a field according to the identifier; and obtaining the operation analysis file according to the identifier and the field.
In one embodiment, the computer program when executed by the processor is further operable to: acquiring a corresponding historical operating file of the system operating in a preset time period; determining initial response time according to actual response time corresponding to each service request in the historical operating file; when the actual response time is longer than the initial response time, judging that the request state of the service request is a fault state; otherwise, judging the request state of the service request as a normal state; and generating a normal matching library according to the service request in the normal state, and generating a fault matching library according to the service request in the fault state.
In one embodiment, the computer program when executed by the processor implements generating the normal matching library according to the normal state service request, and after generating the fault matching library according to the fault state service request, the computer program is further configured to: acquiring corresponding fault data in a fault matching library within a preset time period and corresponding normal data in a normal matching library within the preset time period; determining the fault misjudgment rate according to the relative value of the fault data and the normal data; and when the fault misjudgment rate is greater than the preset misjudgment threshold value, adjusting the initial response time so that the fault misjudgment rate obtained according to the adjusted initial response time is less than the preset misjudgment threshold value.
In one embodiment, the computer program when executed by the processor further causes the apparatus to determine an initial response time based on actual response times corresponding to the service requests in the historical operating file further comprising: extracting actual response time corresponding to each service request from a historical operating file; sequencing the service requests according to the actual response time corresponding to each service request; and taking the actual response time corresponding to the service request at the preset sequencing position as the initial response time.
In one embodiment, the computer program when executed by the processor is further operable to: when the state data is successfully matched with any preset state matching library, judging that the matching result is successfully matched correspondingly; and when the state data is failed to be matched with any preset state matching library, judging that the matching result is corresponding to the matching failure.
In one embodiment, the computer program, when executed by the processor, implements matching of the state data with a preset state matching library to obtain a matching result; when the matching result corresponds to a matching success, determining the running state of the system according to the state matching library which is successfully matched, and further using the running state of the system to: matching the state data with a preset fault matching library; when the state data is successfully matched with the fault matching library, judging that the matching result is successfully matched correspondingly, and judging that the running state of the system is a fault state; otherwise, matching the state data with a preset normal matching library; when the state data is successfully matched with the normal matching library, judging that the matching result is successfully matched and the running state of the system is a healthy state; otherwise, judging that the matching result is corresponding to matching failure.
In one embodiment, the computer program when executed by the processor implements that when the matching result corresponds to a matching failure, analyzing the service request data in the status data to determine the operating status of the system is further configured to: when the matching result is matching failure, extracting corresponding service request data in a preset time period from the state matching library; acquiring a service request state corresponding to each service request data; classifying the service request data according to the service request state; acquiring the quantity variation trend of each category of service request data in a preset time period; and determining the running state of the system according to the quantity change trend of the service request data of each category.
In one embodiment, the computer program when executed by the processor is further configured to, when the matching result corresponds to a matching failure, analyze the service request data in the status data to determine an operating status of the system: when the matching result corresponds to a matching failure, extracting the service request in an unfinished state from the state data; determining the current response time corresponding to the service request in an uncompleted state; comparing the current response time with the initial response time corresponding to the current time to obtain a comparison result; determining the request state of the service request in an unfinished state according to the comparison result; and determining the running state of the system according to the request state.
In one embodiment, when executed by a processor, the computer program implements a comparison between the current response time and an initial response time corresponding to the current time to obtain a comparison result; and when determining the request state of the service request in the unfinished state according to the comparison result, the method is further used for: if the current response time is longer than the initial response time corresponding to the current time, judging that the comparison result is a comparison failure and the request state of the service request in the unfinished state is a normal state; otherwise, the comparison result is judged to be successful, and the request state of the service request in the unfinished state is judged to be a fault state.
In one embodiment, the computer program when executed by the processor is further configured to, after determining the request status of the service request in the incomplete status according to the comparison result: adding the service request in the normal state to a normal matching library; and adding the service request in the fault state to a fault matching library.
In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, the computer program can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (15)

1. A method for monitoring system operating conditions, the method comprising:
acquiring an operation file corresponding to a system, and extracting state data from the operation file;
matching the state data with a preset state matching library to obtain a matching result;
when the matching result corresponds to successful matching, determining the running state of the system according to a state matching library which is successfully matched;
and when the matching result corresponds to a matching failure, analyzing the service request data in the state data to determine the operating state of the system.
2. The method of claim 1, wherein extracting state data from the run file comprises:
acquiring an analysis algorithm corresponding to the running file;
analyzing the operation file according to the analysis algorithm to obtain an operation analysis file;
and extracting state data corresponding to preset state features from the operation analysis file.
3. The method of claim 2, wherein parsing the run file according to the parsing algorithm to obtain a run parse file comprises:
extracting identifiers from the running file according to the parsing algorithm;
dividing the running file into an identifier and a field according to the identifier;
and obtaining an operation analysis file according to the identifier and the field.
4. The method of claim 1, wherein the state matching library comprises a failure matching library and a normal matching library; the generating step of the state matching library comprises the following steps:
acquiring a corresponding historical operating file of the system operating in a preset time period;
determining initial response time according to actual response time corresponding to each service request in the historical operating file;
when the actual response time is longer than the initial response time, determining that the request state of the service request is a fault state;
otherwise, judging the request state of the service request to be a normal state;
and generating a normal matching library according to the service request in the normal state, and generating a fault matching library according to the service request in the fault state.
5. The method of claim 4, wherein after the generating the normal matching library according to the normal service request and the generating the fault matching library according to the fault service request, the method further comprises:
acquiring corresponding fault data in a preset time period in the fault matching library and corresponding normal data in a preset time period in the normal matching library;
determining a fault misjudgment rate according to the relative value of the fault data and the normal data;
and when the fault misjudgment rate is greater than a preset misjudgment threshold value, adjusting the initial response time so that the fault misjudgment rate obtained according to the adjusted initial response time is smaller than the preset misjudgment threshold value.
6. The method of claim 4, wherein determining an initial response time according to an actual response time corresponding to each of the service requests in the historical operating file comprises:
extracting actual response time corresponding to each service request from the historical operating file;
sequencing the service requests according to the actual response time corresponding to each service request;
and taking the actual response time corresponding to the service request at the preset sequencing position as the initial response time.
7. The method of claim 1, wherein the step of determining the matching result comprises:
when the state data is successfully matched with any preset state matching library, judging that the matching result is successfully matched correspondingly;
and when the state data is failed to be matched with any preset state matching library, judging that the matching result is correspondingly failed to be matched.
8. The method according to claim 1, wherein the matching of the state data with a preset state matching library obtains a matching result; when the matching result corresponds to successful matching, determining the running state of the system according to the state matching library which is successfully matched, wherein the method comprises the following steps:
matching the state data with a preset fault matching library;
when the state data is successfully matched with the fault matching library, judging that the matching result is matched successfully and the running state of the system is a fault state; otherwise, matching the state data with a preset normal matching library;
when the state data is successfully matched with the normal matching library, judging that the matching result is successfully matched correspondingly, and judging that the running state of the system is a healthy state; otherwise, judging that the matching result is corresponding to matching failure.
9. The method according to claim 1, wherein when the matching result corresponds to a matching failure, analyzing the service request data in the status data to determine the operating status of the system includes:
when the matching result is matching failure, extracting corresponding service request data in a preset time period from the state matching library;
acquiring a service request state corresponding to each service request data;
classifying the service request data according to the service request state;
acquiring the quantity variation trend of the service request data of each category in the preset time period;
and determining the operation state of the system according to the quantity change trend of the service request data of each category.
10. The method according to claim 1, wherein when the matching result corresponds to a matching failure, analyzing the service request data in the status data to determine the operating status of the system includes:
when the matching result corresponds to a matching failure, extracting a service request in an unfinished state from the state data;
determining the current response time corresponding to the service request in an uncompleted state;
comparing the current response time with the initial response time corresponding to the current time to obtain a comparison result;
determining the request state of the service request in an uncompleted state according to the comparison result;
and determining the running state of the system according to the request state.
11. The method according to claim 10, wherein the current response time is compared with an initial response time corresponding to the current time to obtain a comparison result; determining the request state of the service request in an uncompleted state according to the comparison result, wherein the request state comprises the following steps:
if the current response time is longer than the initial response time corresponding to the current time, judging that the comparison result is a comparison failure and the request state of the service request in an uncompleted state is a normal state;
otherwise, judging that the comparison result is successful, and the request state of the service request in an uncompleted state is a fault state.
12. The method according to claim 10, wherein after determining the request status of the service request in an uncompleted state according to the comparison result, the method further comprises:
adding the service request in a normal state to a normal matching library;
and adding the service request in a fault state to a fault matching library.
13. A system operational status monitoring apparatus, the apparatus comprising:
the acquisition module is used for acquiring an operation file corresponding to a system and extracting state data from the operation file;
the matching module is used for matching the state data with a preset state matching library to obtain a matching result;
the first determining module is used for determining the running state of the system according to the state matching library which is successfully matched when the matching result corresponds to successful matching;
and the second determining module is used for analyzing the service request data in the state data to determine the running state of the system when the matching result corresponds to matching failure.
14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 12.
15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 12.
CN202011602580.3A 2020-12-29 2020-12-29 System running state monitoring method and device, computer equipment and storage medium Pending CN112612679A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011602580.3A CN112612679A (en) 2020-12-29 2020-12-29 System running state monitoring method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011602580.3A CN112612679A (en) 2020-12-29 2020-12-29 System running state monitoring method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112612679A true CN112612679A (en) 2021-04-06

Family

ID=75249246

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011602580.3A Pending CN112612679A (en) 2020-12-29 2020-12-29 System running state monitoring method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112612679A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114756304A (en) * 2022-04-28 2022-07-15 中国银行股份有限公司 Configuration method of application version package for dynarace monitoring
CN116340433A (en) * 2023-05-31 2023-06-27 中国水利水电第七工程局有限公司 Construction monitoring information storage calculation method, storage medium, equipment and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105071946A (en) * 2015-07-03 2015-11-18 北京奇虎科技有限公司 System monitoring method and device
CN110377445A (en) * 2019-06-28 2019-10-25 苏州浪潮智能科技有限公司 Failure prediction method and device
CN110750425A (en) * 2019-10-25 2020-02-04 上海中通吉网络技术有限公司 Database monitoring method, device and system and storage medium
CN111444072A (en) * 2020-03-26 2020-07-24 世纪龙信息网络有限责任公司 Client abnormality identification method and device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105071946A (en) * 2015-07-03 2015-11-18 北京奇虎科技有限公司 System monitoring method and device
CN110377445A (en) * 2019-06-28 2019-10-25 苏州浪潮智能科技有限公司 Failure prediction method and device
CN110750425A (en) * 2019-10-25 2020-02-04 上海中通吉网络技术有限公司 Database monitoring method, device and system and storage medium
CN111444072A (en) * 2020-03-26 2020-07-24 世纪龙信息网络有限责任公司 Client abnormality identification method and device, computer equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114756304A (en) * 2022-04-28 2022-07-15 中国银行股份有限公司 Configuration method of application version package for dynarace monitoring
CN114756304B (en) * 2022-04-28 2024-06-21 中国银行股份有限公司 Configuration method of application version package for dynatrace monitoring
CN116340433A (en) * 2023-05-31 2023-06-27 中国水利水电第七工程局有限公司 Construction monitoring information storage calculation method, storage medium, equipment and system
CN116340433B (en) * 2023-05-31 2023-07-28 中国水利水电第七工程局有限公司 Construction monitoring information storage calculation method, storage medium, equipment and system

Similar Documents

Publication Publication Date Title
US10540358B2 (en) Telemetry data contextualized across datasets
US11226975B2 (en) Method and system for implementing machine learning classifications
US8676965B2 (en) Tracking high-level network transactions
CN109587008B (en) Method, device and storage medium for detecting abnormal flow data
US20180365085A1 (en) Method and apparatus for monitoring client applications
US8352790B2 (en) Abnormality detection method, device and program
WO2017101606A1 (en) System and method for collecting and analyzing data
CN112631913B (en) Method, device, equipment and storage medium for monitoring operation faults of application program
CN109918279B (en) Electronic device, method for identifying abnormal operation of user based on log data and storage medium
US20070130330A1 (en) System for inventing computer systems and alerting users of faults to systems for monitoring
CN111143163B (en) Data monitoring method, device, computer equipment and storage medium
CN110362473B (en) Test environment optimization method and device, storage medium and terminal
CN111240876B (en) Fault positioning method and device for micro-service, storage medium and terminal
CN109726091B (en) Log management method and related device
CN113672475B (en) Alarm processing method and device, computer equipment and storage medium
CN113254255B (en) Cloud platform log analysis method, system, device and medium
CN111078513A (en) Log processing method, device, equipment, storage medium and log alarm system
CN113505044B (en) Database warning method, device, equipment and storage medium
CN112612679A (en) System running state monitoring method and device, computer equipment and storage medium
CN111444072A (en) Client abnormality identification method and device, computer equipment and storage medium
CN106294406B (en) Method and equipment for processing application access data
CN113326064A (en) Method for dividing business logic module, electronic equipment and storage medium
CN108111328B (en) Exception handling method and device
CN116662127A (en) Method, system, equipment and medium for classifying and early warning equipment alarm information
CN115529219A (en) Alarm analysis method and device, computer readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination