CN111353890A - Application log-based application anomaly detection method and device - Google Patents

Application log-based application anomaly detection method and device Download PDF

Info

Publication number
CN111353890A
CN111353890A CN202010236180.9A CN202010236180A CN111353890A CN 111353890 A CN111353890 A CN 111353890A CN 202010236180 A CN202010236180 A CN 202010236180A CN 111353890 A CN111353890 A CN 111353890A
Authority
CN
China
Prior art keywords
time period
data
anomaly detection
application
characteristic data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010236180.9A
Other languages
Chinese (zh)
Inventor
程鹏
任政
武文轩
吴冕冠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202010236180.9A priority Critical patent/CN111353890A/en
Publication of CN111353890A publication Critical patent/CN111353890A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Medical Informatics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides an application log-based application anomaly detection method and device, wherein the application log-based application anomaly detection method comprises the following steps: acquiring transaction characteristic data to be subjected to anomaly detection, wherein the transaction characteristic data comprises: the response time of the current time period of the application, the response time of the time period of yesterday, the response time of the data of the last week in the time period, the success rate of the current time period, the success rate of the time period of yesterday, the success rate of the data of the last week in the time period, the service return code of the current time period, the service return code of the time period of yesterday, the service return code of the data of the last week in the time period, the homonymy data and the ring ratio data; and carrying out anomaly detection on the application according to the transaction characteristic data and a pre-generated vertical integration algorithm model. The invention can effectively avoid overfitting phenomenon of the supervised learning algorithm caused by abnormal imbalance of the positive and negative samples, and can greatly improve the detection precision.

Description

Application log-based application anomaly detection method and device
Technical Field
The invention relates to the technical field of information, in particular to an application log-based application anomaly detection method and device.
Background
In the prior art, as more and more applications are applied to the cloud, more and more types of application alarms are applied, and new problems are brought, for example: the traditional fixed threshold alarm uses a fixed threshold as the basis of the alarm, namely, the alarm is carried out when the alarm is not in the threshold range. However, in daily financial transactions, the condition that the index is not within the threshold range in some time periods is normal, and the alarm using the fixed threshold can cause false alarm; similarly, the fixed threshold value can not give an alarm to cause the phenomenon of missing report when the index is in the threshold value range in certain time period but belongs to the abnormal condition. And by adopting an anomaly detection algorithm based on time characteristics, the situations of fixed threshold false alarm and failure in alarm can be better compensated, and dynamic threshold alarm is realized. The anomaly detection algorithm comprises an unsupervised learning algorithm and a supervised learning algorithm, the unsupervised learning algorithm does not need to label samples artificially, the supervised learning algorithm needs to label samples artificially, the supervised learning algorithm requires that the proportion of positive samples and negative samples in the samples is close to 1:1, and the anomaly detection based on the application logs is often unbalanced in the abnormality of the positive samples and the negative samples, so that the effect of the anomaly detection directly using the supervised learning algorithm is often poor.
Based on multi-dimensional anomaly detection, the root cause attribute causing the anomaly can not be visually represented through a detection result, for example, the anomaly detection attribute of a certain transaction comprises response time, transaction success rate and a service return code, the anomaly of the transaction can be accurately identified by using dynamic threshold alarm realized by an algorithm, but the root cause attribute causing the transaction anomaly can not be given out from the detection result.
Disclosure of Invention
Aiming at the problems in the prior art, the application log-based application anomaly detection method and device provided by the invention utilize the output of an unsupervised learning algorithm-isolated forest algorithm as the input of a supervised learning algorithm, the ratio of positive and negative samples in the output of the isolated forest algorithm is close to 1:1 by adjusting the parameters of the isolated forest algorithm, and finally the output is input into the supervised learning algorithm (logistic regression algorithm), so that the anomaly detection precision of the supervised learning algorithm is improved.
In order to solve the technical problems, the invention provides the following technical scheme:
in a first aspect, the present invention provides an application log-based application anomaly detection method, including:
acquiring transaction characteristic data to be subjected to anomaly detection, wherein the transaction characteristic data comprises: the response time of the current time period of the application, the response time of the time period of yesterday, the response time of the data of the last week in the time period, the success rate of the current time period, the success rate of the time period of yesterday, the success rate of the data of the last week in the time period, the service return code of the current time period, the service return code of the time period of yesterday, the service return code of the data of the last week in the time period, the homonymy data and the ring ratio data;
and carrying out anomaly detection on the application according to the transaction characteristic data and a pre-generated vertical integration algorithm model.
In one embodiment, the step of generating a vertically integrated algorithm model comprises:
acquiring the transaction characteristic data within a preset time;
marking the transaction characteristic data;
inputting the marked transaction characteristic data into an isolated forest algorithm model to generate training data;
inputting the training data to a logistic regression algorithm model for training to generate the vertical integration algorithm model.
In an embodiment, the inputting the labeled transaction characteristic data into the isolated forest algorithm model to generate training data includes:
adjusting a membership parameter of the isolated forest algorithm model so that a ratio of normal samples to abnormal samples is 1.
In one embodiment, the application log-based application anomaly detection method further includes: and preprocessing the transaction characteristic data.
In a second aspect, the present invention provides an application log-based application anomaly detection apparatus, including:
the transaction characteristic data extraction unit is used for acquiring transaction characteristic data to be subjected to anomaly detection, and the transaction characteristic data comprises: the response time of the current time period of the application, the response time of the time period of yesterday, the response time of the data of the last week in the time period, the success rate of the current time period, the success rate of the time period of yesterday, the success rate of the data of the last week in the time period, the service return code of the current time period, the service return code of the time period of yesterday, the service return code of the data of the last week in the time period, the homonymy data and the ring ratio data;
and the anomaly detection unit is used for carrying out anomaly detection on the application according to the transaction characteristic data and the pre-generated vertical integration algorithm model.
In one embodiment, the application log-based application anomaly detection apparatus further includes: the vertical model generating unit is used for generating a vertical integration algorithm model; the vertical model generation unit includes:
the characteristic data acquisition module is used for acquiring the transaction characteristic data within preset time;
the characteristic data marking module is used for marking the transaction characteristic data;
the training data generation module is used for inputting the marked transaction characteristic data into the isolated forest algorithm model so as to generate training data;
and the vertical model generation module is used for inputting the training data into a logistic regression algorithm model for training so as to generate the vertical integration algorithm model.
In an embodiment, the anomaly detection unit is specifically configured to adjust a normalization parameter of the isolated forest algorithm model so that a ratio of normal samples to abnormal samples is 1.
In one embodiment, the application log-based application anomaly detection apparatus further includes: and the characteristic data preprocessing unit is used for preprocessing the transaction characteristic data.
In a third aspect, the present invention provides an electronic device, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the application log-based application anomaly detection method when executing the program.
In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of an application log-based application anomaly detection method.
From the above description, it can be seen that the application log-based application anomaly detection method and apparatus provided by the embodiments of the present invention are a multidimensional anomaly detection method applied by using a machine learning algorithm that is vertically integrated by isolated forests and logistic regression based on application log data, and the method can effectively avoid overfitting of a supervised learning algorithm due to imbalance between positive and negative samples, and can greatly improve detection accuracy, and is an effective and high-detection-accuracy integrated algorithm. In addition, the invention provides a multi-dimensional index abnormal root cause analysis method based on the feature importance on the basis of a logistic regression algorithm, the method takes the feature importance in logistic regression classification result parameters as weight, and can realize simple multi-dimensional root cause analysis by calculating the weighted value of normalized attributes and taking the ordered weighted value as a root cause comparison basis. The invention has three main beneficial effects:
firstly, through a vertical integration mode of an isolated forest algorithm and a logistic regression algorithm, the proportion of positive and abnormal samples input into the logistic regression algorithm is close to 1:1, the overfitting phenomenon caused by using a supervised learning algorithm is effectively avoided, and meanwhile, the detection precision is greatly improved (compared with a method based on a single method (the logistic regression algorithm or the isolated forest algorithm)).
Secondly, dynamic threshold detection and multi-dimensional application anomaly detection can be realized through extraction of the homometric ring ratio features and use of a logistic regression algorithm.
Third, simple multidimensional attribute root cause analysis can be achieved by using the feature importance of the logistic regression algorithm as the attribute weight.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a first flowchart illustrating an application log-based application anomaly detection method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating steps 300 of an application log-based application anomaly detection method according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating steps 303 of an application log-based application anomaly detection method according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a second method for detecting an application anomaly based on an application log according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating a method for detecting application anomalies based on application logs in an exemplary application of the present invention;
FIG. 6 is a diagram illustrating the application log-based application anomaly detection method according to an embodiment of the present invention;
FIG. 7 is a first schematic diagram illustrating an application log-based application anomaly detection apparatus according to an embodiment of the present invention;
FIG. 8 is a second schematic structural diagram of an application log-based application anomaly detection apparatus according to an embodiment of the present invention;
FIG. 9 is a schematic structural diagram of a vertical model generation unit in an embodiment of the present invention;
FIG. 10 is a third schematic diagram of an application log-based application anomaly detection apparatus according to an embodiment of the present invention;
fig. 11 is a schematic structural diagram of an electronic device in an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the present invention provides a specific implementation of an application anomaly detection method based on an application log, and referring to fig. 1, the method specifically includes the following steps:
step 100: acquiring transaction characteristic data to be subjected to anomaly detection, wherein the transaction characteristic data comprises: the application comprises the response time of the current time period, the response time of the time period yesterday, the response time of the data of the last week in the time period, the success rate of the current time period, the success rate of the time period yesterday in the time period, the success rate of the data of the last week in the time period, the service return code of the current time period, the service return code of the time period yesterday in the last week, the service return code of the data of the last week in the time period, the homonymy data and the ring ratio data.
It is understood that the data mining means identifies abnormal points in the data, and detects data far from other normal observation values, for example, the application-based abnormality detection needs to detect the application abnormality. In addition, transaction characteristic data is obtained based on a transaction log of the quick payment.
Step 200: and carrying out anomaly detection on the application according to the transaction characteristic data and a pre-generated vertical integration algorithm model.
When the step 200 is implemented, the method specifically comprises the following steps: the output of an unsupervised learning algorithm-isolated forest algorithm is used as the input of a supervised learning algorithm (logistic regression algorithm), the proportion of positive samples and negative samples in the output of the isolated forest algorithm is close to 1:1 by adjusting the parameters of the isolated forest algorithm, and then the output is input into the supervised learning algorithm (logistic regression algorithm), so that the accuracy of application anomaly detection of the application log is improved.
From the above description, the application log-based application anomaly detection method provided by the embodiment of the invention is a multidimensional anomaly detection method which is applied by using a machine learning algorithm vertically integrating an isolated forest and logistic regression based on application log data, can effectively avoid the overfitting phenomenon of a supervised learning algorithm caused by imbalance of positive and negative samples, can greatly improve the detection precision, and is an effective and high-detection-precision integrated algorithm. In addition, the invention provides a multi-dimensional index abnormal root cause analysis method based on the feature importance on the basis of a logistic regression algorithm, the method takes the feature importance in logistic regression classification result parameters as weight, and can realize simple multi-dimensional root cause analysis by calculating the weighted value of normalized attributes and taking the ordered weighted value as a root cause comparison basis.
In one embodiment, referring to FIG. 2, the step of generating a vertically integrated algorithm model comprises:
step 301: and acquiring the transaction characteristic data within a preset time.
Step 302: and marking the transaction characteristic data.
Step 303: and inputting the marked transaction characteristic data into the isolated forest algorithm model to generate training data.
Step 304: inputting the training data to a logistic regression algorithm model for training to generate the vertical integration algorithm model.
In steps 301 to 304, the response time, the success rate, the same ratio of the service return codes and the ring ratio of the data of the current time period, the time period of yesterday and the last week of the time period are respectively extracted as features, a transaction log of a specific time period is selected as training data, positive and negative samples (which transactions are abnormal and which are normal) can be artificially marked by service personnel, and then the service return codes in a preset ranking range are used as attribute indexes for abnormality detection according to the response time, the success rate and the number. And extracting an input isolated forest algorithm, and adjusting a stability parameter of the isolated forest algorithm to enable the detection result of the isolated forest algorithm to contain all marked abnormal samples, wherein the ratio of the positive samples to the abnormal samples is close to 1: 1. The detection result of the isolated forest algorithm is input into the logistic regression algorithm for training the logistic regression algorithm, due to the fact that the proportion of the normal samples and the abnormal samples input into the logistic regression algorithm is close to 1:1, the model can well avoid an overfitting phenomenon, the detection precision is high, as comparison, data in another time period are used for verifying the model precision, the f1-score value of the model reaches 96.6%, and compared with f1-score 86.7% of the isolated forest algorithm, the f1-score value is obviously improved.
In one embodiment, referring to fig. 3, step 303 further comprises:
step 3031: adjusting a membership parameter of the isolated forest algorithm model so that a ratio of normal samples to abnormal samples is 1.
It is understood that, in step 3031, the detection result of the isolated forest algorithm can also include all marked abnormal samples, and the proportion of the positive abnormal samples to the abnormal samples is close to 1 (not necessarily 1).
In an embodiment, referring to fig. 4, the application log-based application anomaly detection method further includes:
step 400: and preprocessing the transaction characteristic data.
Specifically, in the preprocessing stage, data with empty service return codes and other invalid data with response time, success rate and rank within a preset range are removed, and then the service return codes are digitized, because the service return codes are sensitive to ring ratio change, the ring ratio value of the service return codes with change in the top five ranks is assigned to be 1, and the value of the service return codes without change is assigned to be 0; the response time and the success rate are respectively normalized and the ratio of the same ratio and the ring ratio are calculated, and finally, a plurality of characteristics are obtained, such as: response time same ratio, response time ring ratio, success rate same ratio, success rate ring ratio and number ranking top five service return code same ratio.
From the above description, the application log-based application anomaly detection method provided by the embodiment of the invention is a multidimensional anomaly detection method which is applied by using a machine learning algorithm vertically integrating an isolated forest and logistic regression based on application log data, can effectively avoid the overfitting phenomenon of a supervised learning algorithm caused by imbalance of positive and negative samples, can greatly improve the detection precision, and is an effective and high-detection-precision integrated algorithm. In addition, the invention provides a multi-dimensional index abnormal root cause analysis method based on the feature importance on the basis of a logistic regression algorithm, the method takes the feature importance in logistic regression classification result parameters as weight, and can realize simple multi-dimensional root cause analysis by calculating the weighted value of normalized attributes and taking the ordered weighted value as a root cause comparison basis. The invention has three main beneficial effects:
firstly, through a vertical integration mode of an isolated forest algorithm and a logistic regression algorithm, the proportion of positive and abnormal samples input into the logistic regression algorithm is close to 1:1, the overfitting phenomenon caused by using a supervised learning algorithm is effectively avoided, and meanwhile, the detection precision is greatly improved (compared with a method based on a single method (the logistic regression algorithm or the isolated forest algorithm)).
Secondly, dynamic threshold detection and multi-dimensional application anomaly detection can be realized through extraction of the homometric ring ratio features and use of a logistic regression algorithm.
Third, simple multidimensional attribute root cause analysis can be achieved by using the feature importance of the logistic regression algorithm as the attribute weight.
To further illustrate the present solution, the present invention provides a specific application example of the application log-based application anomaly detection method, and the specific application example specifically includes the following contents, see fig. 5 and fig. 6.
In this specific application example, the terms of art are explained as follows:
unsupervised learning algorithm: and learning the training sample without the mark so as to find structural knowledge in the training sample.
And (3) a supervised learning algorithm: and learning the training samples with the marks, and marking and identifying data outside the training set as much as possible.
Root cause analysis: the root cause analysis is to analyze the associated data to obtain the possible root cause of the problem.
Feature importance: feature importance is generally considered to be the absolute value of the feature coefficient, i.e., the weight of the argument.
Isolated forest algorithm: an isolated forest is an unsupervised rapid anomaly detection method based on Ensemble and suitable for continuous data, is proposed by professor Zhou Shihua and the like, is different from other anomaly detection methods in that the separation degree among samples is described by equivalent indexes of distance and density, and an isolated forest algorithm detects an abnormal value by isolating a sample point, has linear time complexity and high precision, and is a state-of-the-art algorithm which meets the requirement of large data processing.
And (3) a logistic regression algorithm: the logistic regression algorithm is a classical supervised learning algorithm, the logistic regression assumes that data obey Bernoulli distribution, a maximum likelihood function method is used, gradient descent is used for solving parameters to obtain classification probability, the classification purpose is realized through threshold filtering, the algorithm requires that the proportion of positive and negative samples in the samples needs to be balanced, and the samples with unbalanced proportion are easy to generate overfitting.
S0: and acquiring transaction characteristic data to be subjected to anomaly detection.
And taking the transaction log of quick payment as data support, selecting the transaction log of two weeks as training data, and manually marking out positive and negative samples by service personnel. I.e., identify which transactions are anomalous and which are normal. And according to the feedback of the service personnel, the service return codes with the response time, the success rate and the number ranked in the top five are used as the attribute indexes of the abnormity detection. Additionally, the transaction characteristic data includes: the application comprises the response time of the current time period, the response time of the time period yesterday, the response time of the data of the last week in the time period, the success rate of the current time period, the success rate of the time period yesterday in the time period, the success rate of the data of the last week in the time period, the service return code of the current time period, the service return code of the time period yesterday in the last week, the service return code of the data of the last week in the time period, the homonymy data and the ring ratio data.
S1: and preprocessing the transaction characteristic data.
In the preprocessing stage, firstly, data with empty response time, success rate and service return codes of the top five ranks and other invalid data are removed, then the service return codes are digitized, and since the service return codes are sensitive to ring ratio change, the ring ratio value of the service return codes of the top five ranks with change is assigned to be 1, and the value of the service return codes without change is assigned to be 0; and respectively carrying out normalization and calculation on the response time and the success rate, and obtaining five characteristics, namely the response time same ratio, the response time ring ratio, the success rate same ratio, the success rate ring ratio and the service return codes with the number of the first five ranked.
S2: and inputting the extracted five transaction characteristic data into an isolated forest algorithm model to generate training data.
Inputting the extracted five feature data into an isolated forest algorithm model, and adjusting an isolated forest algorithm stabilization parameter (in the practice, the parameter value is 0.09), so that the detection result of the isolated forest algorithm contains all marked abnormal samples, and the proportion of the positive and abnormal samples is close to 1: 1.
S3: the training data is input to a logistic regression algorithm model for training to generate a vertical integration algorithm model.
Specifically, the detection result of the isolated forest algorithm is input into the logistic regression algorithm for training the logistic regression algorithm, the proportion of the normal sample and the abnormal sample input into the logistic regression algorithm is close to 1:1, the model can better avoid an overfitting phenomenon, the detection precision is higher, for comparison, the data of the other circle is used for verifying the model precision, the value of the model f1-score (vertical integration algorithm model) reaches 96.6%, and the comparison result is obviously improved compared with the value of f1-score 86.7% directly using the isolated forest algorithm.
From the above description, the application log-based application anomaly detection method provided by the embodiment of the invention is a multidimensional anomaly detection method which is applied by using a machine learning algorithm vertically integrating an isolated forest and logistic regression based on application log data, can effectively avoid the overfitting phenomenon of a supervised learning algorithm caused by imbalance of positive and negative samples, can greatly improve the detection precision, and is an effective and high-detection-precision integrated algorithm. In addition, the invention provides a multi-dimensional index abnormal root cause analysis method based on the feature importance on the basis of a logistic regression algorithm, the method takes the feature importance in logistic regression classification result parameters as weight, and can realize simple multi-dimensional root cause analysis by calculating the weighted value of normalized attributes and taking the ordered weighted value as a root cause comparison basis. The invention has three main beneficial effects:
firstly, through a vertical integration mode of an isolated forest algorithm and a logistic regression algorithm, the proportion of positive and abnormal samples input into the logistic regression algorithm is close to 1:1, the overfitting phenomenon caused by using a supervised learning algorithm is effectively avoided, and meanwhile, the detection precision is greatly improved (compared with a method based on a single method (the logistic regression algorithm or the isolated forest algorithm)).
Secondly, dynamic threshold detection and multi-dimensional application anomaly detection can be realized through extraction of the homometric ring ratio features and use of a logistic regression algorithm.
Third, simple multidimensional attribute root cause analysis can be achieved by using the feature importance of the logistic regression algorithm as the attribute weight.
Based on the same inventive concept, the embodiment of the present application further provides an application anomaly detection apparatus based on an application log, which can be used to implement the methods described in the above embodiments, such as the following embodiments. Because the principle of solving the problem of the application log-based application anomaly detection device is similar to that of the application log-based application anomaly detection method, the application log-based application anomaly detection device can be implemented by the application log-based application anomaly detection method, and repeated parts are not described again. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. While the system described in the embodiments below is preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.
An embodiment of the present invention provides a specific implementation manner of an application log-based application anomaly detection apparatus capable of implementing an application log-based application anomaly detection method, and referring to fig. 7, the application log-based application anomaly detection apparatus specifically includes the following contents:
a transaction characteristic data extraction unit 10, configured to obtain transaction characteristic data to be subjected to anomaly detection, where the transaction characteristic data includes: the response time of the current time period of the application, the response time of the time period of yesterday, the response time of the data of the last week in the time period, the success rate of the current time period, the success rate of the time period of yesterday, the success rate of the data of the last week in the time period, the service return code of the current time period, the service return code of the time period of yesterday, the service return code of the data of the last week in the time period, the homonymy data and the ring ratio data;
and the anomaly detection unit 20 is used for carrying out anomaly detection on the application according to the transaction characteristic data and the pre-generated vertical integration algorithm model.
In one embodiment, referring to fig. 8, the application log-based application anomaly detection apparatus further includes: a vertical model generation unit 30 for generating a vertical integration algorithm model; referring to fig. 9, the vertical model generation unit 30 includes:
a characteristic data obtaining module 301, configured to obtain the transaction characteristic data within a preset time;
a characteristic data labeling module 302, configured to label the transaction characteristic data;
the training data generation module 303 is configured to input the labeled transaction characteristic data to the isolated forest algorithm model to generate training data;
a vertical model generation module 304, configured to input the training data to a logistic regression algorithm model for training to generate the vertical integration algorithm model.
In an embodiment, the anomaly detection unit is specifically configured to adjust a normalization parameter of the isolated forest algorithm model so that a ratio of normal samples to abnormal samples is 1.
In one embodiment, referring to fig. 10, the application log-based application anomaly detection apparatus further includes: and the characteristic data preprocessing unit 40 is used for preprocessing the transaction characteristic data.
From the above description, the application log-based application anomaly detection device provided by the embodiment of the invention is a multidimensional anomaly detection method which is applied by using a machine learning algorithm vertically integrating an isolated forest and logistic regression based on application log data, can effectively avoid the overfitting phenomenon of a supervised learning algorithm caused by imbalance of positive and negative samples, can greatly improve the detection precision, and is an effective and high-detection-precision integrated algorithm. In addition, the invention provides a multi-dimensional index abnormal root cause analysis method based on the feature importance on the basis of a logistic regression algorithm, the method takes the feature importance in logistic regression classification result parameters as weight, and can realize simple multi-dimensional root cause analysis by calculating the weighted value of normalized attributes and taking the ordered weighted value as a root cause comparison basis. The invention has three main beneficial effects:
firstly, through a vertical integration mode of an isolated forest algorithm and a logistic regression algorithm, the proportion of positive and abnormal samples input into the logistic regression algorithm is close to 1:1, the overfitting phenomenon caused by using a supervised learning algorithm is effectively avoided, and meanwhile, the detection precision is greatly improved (compared with a method based on a single method (the logistic regression algorithm or the isolated forest algorithm)).
Secondly, dynamic threshold detection and multi-dimensional application anomaly detection can be realized through extraction of the homometric ring ratio features and use of a logistic regression algorithm.
Third, simple multidimensional attribute root cause analysis can be achieved by using the feature importance of the logistic regression algorithm as the attribute weight.
An embodiment of the present application further provides a specific implementation manner of an electronic device, which is capable of implementing all steps in the application log-based application anomaly detection method in the foregoing embodiment, and referring to fig. 11, the electronic device specifically includes the following contents:
a processor (processor)1201, a memory (memory)1202, a communication interface 1203, and a bus 1204;
the processor 1201, the memory 1202 and the communication interface 1203 complete communication with each other through the bus 1204; the communication interface 1203 is configured to implement information transmission between related devices, such as a server-side device, an interface device, and a client device.
The processor 1201 is configured to call the computer program in the memory 1202, and the processor executes the computer program to implement all the steps in the application log-based application anomaly detection method in the above-described embodiment, for example, to implement the following steps when the processor executes the computer program:
step 100: acquiring transaction characteristic data to be subjected to anomaly detection, wherein the transaction characteristic data comprises: the response time of the current time period of the application, the response time of the time period of yesterday, the response time of the data of the last week in the time period, the success rate of the current time period, the success rate of the time period of yesterday, the success rate of the data of the last week in the time period, the service return code of the current time period, the service return code of the time period of yesterday, the service return code of the data of the last week in the time period, the homonymy data and the ring ratio data;
step 200: and carrying out anomaly detection on the application according to the transaction characteristic data and a pre-generated vertical integration algorithm model.
From the above description, the electronic device in the embodiment of the application is an effective and high-detection-precision integrated algorithm, and the multidimensional abnormality detection method is applied by using the machine learning algorithm vertically integrated by the isolated forest and the logistic regression based on the application log data, so that the overfitting phenomenon of the supervised learning algorithm due to the abnormal imbalance of the positive and negative samples can be effectively avoided, the detection precision can be greatly improved, and the method is an effective and high-detection-precision integrated algorithm. In addition, the invention provides a multi-dimensional index abnormal root cause analysis method based on the feature importance on the basis of a logistic regression algorithm, the method takes the feature importance in logistic regression classification result parameters as weight, and can realize simple multi-dimensional root cause analysis by calculating the weighted value of normalized attributes and taking the ordered weighted value as a root cause comparison basis. The invention has three main beneficial effects:
firstly, through a vertical integration mode of an isolated forest algorithm and a logistic regression algorithm, the proportion of positive and abnormal samples input into the logistic regression algorithm is close to 1:1, the overfitting phenomenon caused by using a supervised learning algorithm is effectively avoided, and meanwhile, the detection precision is greatly improved (compared with a method based on a single method (the logistic regression algorithm or the isolated forest algorithm)).
Secondly, dynamic threshold detection and multi-dimensional application anomaly detection can be realized through extraction of the homometric ring ratio features and use of a logistic regression algorithm.
Third, simple multidimensional attribute root cause analysis can be achieved by using the feature importance of the logistic regression algorithm as the attribute weight.
Embodiments of the present application further provide a computer-readable storage medium capable of implementing all steps in the application log-based application anomaly detection method in the above embodiments, where the computer-readable storage medium stores thereon a computer program, and when the computer program is executed by a processor, the computer program implements all steps of the application log-based application anomaly detection method in the above embodiments, for example, when the processor executes the computer program, the processor implements the following steps:
step 100: acquiring transaction characteristic data to be subjected to anomaly detection, wherein the transaction characteristic data comprises: the response time of the current time period of the application, the response time of the time period of yesterday, the response time of the data of the last week in the time period, the success rate of the current time period, the success rate of the time period of yesterday, the success rate of the data of the last week in the time period, the service return code of the current time period, the service return code of the time period of yesterday, the service return code of the data of the last week in the time period, the homonymy data and the ring ratio data;
step 200: and carrying out anomaly detection on the application according to the transaction characteristic data and a pre-generated vertical integration algorithm model.
From the above description, it can be seen that the computer-readable storage medium in the embodiment of the present application is a multidimensional anomaly detection method that is applied by using a machine learning algorithm that is vertically integrated by isolated forests and logistic regression based on application log data, and the method can effectively avoid overfitting of a supervised learning algorithm due to abnormal imbalance of positive and negative samples, and can greatly improve detection accuracy, and is an effective integrated algorithm with high detection accuracy. In addition, the invention provides a multi-dimensional index abnormal root cause analysis method based on the feature importance on the basis of a logistic regression algorithm, the method takes the feature importance in logistic regression classification result parameters as weight, and can realize simple multi-dimensional root cause analysis by calculating the weighted value of normalized attributes and taking the ordered weighted value as a root cause comparison basis. The invention has three main beneficial effects:
firstly, through a vertical integration mode of an isolated forest algorithm and a logistic regression algorithm, the proportion of positive and abnormal samples input into the logistic regression algorithm is close to 1:1, the overfitting phenomenon caused by using a supervised learning algorithm is effectively avoided, and meanwhile, the detection precision is greatly improved (compared with a method based on a single method (the logistic regression algorithm or the isolated forest algorithm)).
Secondly, dynamic threshold detection and multi-dimensional application anomaly detection can be realized through extraction of the homometric ring ratio features and use of a logistic regression algorithm.
Third, simple multidimensional attribute root cause analysis can be achieved by using the feature importance of the logistic regression algorithm as the attribute weight.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the hardware + program class embodiment, since it is substantially similar to the method embodiment, the description is simple, and the relevant points can be referred to the partial description of the method embodiment.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Although the present application provides method steps as in an embodiment or a flowchart, more or fewer steps may be included based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an actual apparatus or client product executes, it may execute sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the embodiments or methods shown in the figures.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. An application log-based application anomaly detection method is characterized by comprising the following steps:
acquiring transaction characteristic data to be subjected to anomaly detection, wherein the transaction characteristic data comprises: the response time of the current time period of the application, the response time of the time period of yesterday, the response time of the data of the last week in the time period, the success rate of the current time period, the success rate of the time period of yesterday, the success rate of the data of the last week in the time period, the service return code of the current time period, the service return code of the time period of yesterday, the service return code of the data of the last week in the time period, the homonymy data and the ring ratio data;
and carrying out anomaly detection on the application according to the transaction characteristic data and a pre-generated vertical integration algorithm model.
2. The application log-based application anomaly detection method according to claim 1, wherein the step of generating a vertical integration algorithm model comprises:
acquiring the transaction characteristic data within a preset time;
marking the transaction characteristic data;
inputting the marked transaction characteristic data into an isolated forest algorithm model to generate training data;
inputting the training data to a logistic regression algorithm model for training to generate the vertical integration algorithm model.
3. The application log-based application anomaly detection method according to claim 2, wherein the inputting of labeled transaction characteristic data into an isolated forest algorithm model to generate training data comprises:
adjusting a membership parameter of the isolated forest algorithm model so that a ratio of normal samples to abnormal samples is 1.
4. The application log-based application anomaly detection method according to claim 1, further comprising: and preprocessing the transaction characteristic data.
5. An application log-based application anomaly detection apparatus, comprising:
the transaction characteristic data extraction unit is used for acquiring transaction characteristic data to be subjected to anomaly detection, and the transaction characteristic data comprises: the response time of the current time period of the application, the response time of the time period of yesterday, the response time of the data of the last week in the time period, the success rate of the current time period, the success rate of the time period of yesterday, the success rate of the data of the last week in the time period, the service return code of the current time period, the service return code of the time period of yesterday, the service return code of the data of the last week in the time period, the homonymy data and the ring ratio data;
and the anomaly detection unit is used for carrying out anomaly detection on the application according to the transaction characteristic data and the pre-generated vertical integration algorithm model.
6. The application log-based application anomaly detection apparatus according to claim 5, further comprising: the vertical model generating unit is used for generating a vertical integration algorithm model; the vertical model generation unit includes:
the characteristic data acquisition module is used for acquiring the transaction characteristic data within preset time;
the characteristic data marking module is used for marking the transaction characteristic data;
the training data generation module is used for inputting the marked transaction characteristic data into the isolated forest algorithm model so as to generate training data;
and the vertical model generation module is used for inputting the training data into a logistic regression algorithm model for training so as to generate the vertical integration algorithm model.
7. The application log-based application anomaly detection device according to claim 6, wherein the anomaly detection unit is specifically configured to adjust a normalization parameter of the isolated forest algorithm model so that a ratio of normal samples to anomalous samples is 1.
8. The application log-based application anomaly detection apparatus according to claim 5, further comprising: and the characteristic data preprocessing unit is used for preprocessing the transaction characteristic data.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the application log based application anomaly detection method according to any one of claims 1 to 4 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the application log based application anomaly detection method according to any one of claims 1 to 4.
CN202010236180.9A 2020-03-30 2020-03-30 Application log-based application anomaly detection method and device Pending CN111353890A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010236180.9A CN111353890A (en) 2020-03-30 2020-03-30 Application log-based application anomaly detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010236180.9A CN111353890A (en) 2020-03-30 2020-03-30 Application log-based application anomaly detection method and device

Publications (1)

Publication Number Publication Date
CN111353890A true CN111353890A (en) 2020-06-30

Family

ID=71197488

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010236180.9A Pending CN111353890A (en) 2020-03-30 2020-03-30 Application log-based application anomaly detection method and device

Country Status (1)

Country Link
CN (1) CN111353890A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102087A (en) * 2020-09-21 2020-12-18 中国工商银行股份有限公司 Transaction abnormity detection method and device
CN112950372A (en) * 2021-03-03 2021-06-11 上海天旦网络科技发展有限公司 Method and system for automatic transaction association

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180101784A1 (en) * 2016-10-05 2018-04-12 D-Wave Systems Inc. Discrete variational auto-encoder systems and methods for machine learning using adiabatic quantum computers
CN108512827A (en) * 2018-02-09 2018-09-07 世纪龙信息网络有限责任公司 The identification of abnormal login and method for building up, the device of supervised learning model
CN108777873A (en) * 2018-06-04 2018-11-09 江南大学 The wireless sensor network abnormal deviation data examination method of forest is isolated based on weighted blend
CN109739904A (en) * 2018-12-30 2019-05-10 北京城市网邻信息技术有限公司 A kind of labeling method of time series, device, equipment and storage medium
CN109785595A (en) * 2019-02-26 2019-05-21 成都古河云科技有限公司 A kind of vehicle abnormality track real-time identification method based on machine learning
JP2019082746A (en) * 2017-10-27 2019-05-30 株式会社エヌ・ティ・ティ・データ Abnormal log detection apparatus, method and program for detecting abnormal log
CN109948728A (en) * 2019-03-28 2019-06-28 第四范式(北京)技术有限公司 The method and apparatus of the training of abnormal transaction detection model and abnormal transaction detection
CN110380888A (en) * 2019-05-29 2019-10-25 华为技术有限公司 A kind of network anomaly detection method and device
CN110414555A (en) * 2019-06-20 2019-11-05 阿里巴巴集团控股有限公司 Detect the method and device of exceptional sample
CN110598802A (en) * 2019-09-26 2019-12-20 腾讯科技(深圳)有限公司 Memory detection model training method, memory detection method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180101784A1 (en) * 2016-10-05 2018-04-12 D-Wave Systems Inc. Discrete variational auto-encoder systems and methods for machine learning using adiabatic quantum computers
JP2019082746A (en) * 2017-10-27 2019-05-30 株式会社エヌ・ティ・ティ・データ Abnormal log detection apparatus, method and program for detecting abnormal log
CN108512827A (en) * 2018-02-09 2018-09-07 世纪龙信息网络有限责任公司 The identification of abnormal login and method for building up, the device of supervised learning model
CN108777873A (en) * 2018-06-04 2018-11-09 江南大学 The wireless sensor network abnormal deviation data examination method of forest is isolated based on weighted blend
CN109739904A (en) * 2018-12-30 2019-05-10 北京城市网邻信息技术有限公司 A kind of labeling method of time series, device, equipment and storage medium
CN109785595A (en) * 2019-02-26 2019-05-21 成都古河云科技有限公司 A kind of vehicle abnormality track real-time identification method based on machine learning
CN109948728A (en) * 2019-03-28 2019-06-28 第四范式(北京)技术有限公司 The method and apparatus of the training of abnormal transaction detection model and abnormal transaction detection
CN110380888A (en) * 2019-05-29 2019-10-25 华为技术有限公司 A kind of network anomaly detection method and device
CN110414555A (en) * 2019-06-20 2019-11-05 阿里巴巴集团控股有限公司 Detect the method and device of exceptional sample
CN110598802A (en) * 2019-09-26 2019-12-20 腾讯科技(深圳)有限公司 Memory detection model training method, memory detection method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102087A (en) * 2020-09-21 2020-12-18 中国工商银行股份有限公司 Transaction abnormity detection method and device
CN112950372A (en) * 2021-03-03 2021-06-11 上海天旦网络科技发展有限公司 Method and system for automatic transaction association

Similar Documents

Publication Publication Date Title
EP3139297B1 (en) Malware determination device, malware determination system, malware determination method, and program
US9542255B2 (en) Troubleshooting based on log similarity
CN112148772A (en) Alarm root cause identification method, device, equipment and storage medium
CN111309539A (en) Abnormity monitoring method and device and electronic equipment
CN111986792A (en) Medical institution scoring method, device, equipment and storage medium
CN111353890A (en) Application log-based application anomaly detection method and device
CN111104242A (en) Method and device for processing abnormal logs of operating system based on deep learning
CN114205216B (en) Root cause positioning method and device for micro service fault, electronic equipment and medium
CN111290922A (en) Service operation health degree monitoring method and device
CN111460174A (en) Resume abnormity detection method and system based on entity knowledge reasoning
CN110019116A (en) Data traceability method, apparatus, data processing equipment and computer storage medium
CN113704389A (en) Data evaluation method and device, computer equipment and storage medium
JP7131351B2 (en) LEARNING METHOD, LEARNING PROGRAM AND LEARNING DEVICE
CN112783948A (en) Regional economic operation data analysis method, device and storage medium
CN116302984A (en) Root cause analysis method and device for test task and related equipment
CN113517998B (en) Processing method, device, equipment and storage medium of early warning configuration data
CN115169490A (en) Log classification method, device and equipment and computer readable storage medium
KR102531742B1 (en) Method and apparatus for null value correction of sensor data
Singaravelan et al. Analysis of classification algorithms on different datasets
CN114496196A (en) Automatic auditing system for clinical biochemical inspection in medical laboratory
CN112988507B (en) Service monitoring method, device, equipment, storage medium and computer program product
KR20150077669A (en) Data Analysis Method and System Using MapReduce Approach
Meng et al. An Integrated Semi-supervised Software Defect Prediction Model
WO2023162002A1 (en) Log analysis device, log analysis method and program
US20240152578A1 (en) Systems, methods, and non-transitory computer-readable storage devices for detecting and analyzing data clones in tabular datasets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination