CN114253806A - Access stratum log collection, analysis and early warning system - Google Patents

Access stratum log collection, analysis and early warning system Download PDF

Info

Publication number
CN114253806A
CN114253806A CN202111550550.7A CN202111550550A CN114253806A CN 114253806 A CN114253806 A CN 114253806A CN 202111550550 A CN202111550550 A CN 202111550550A CN 114253806 A CN114253806 A CN 114253806A
Authority
CN
China
Prior art keywords
log
data
alarm
module
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111550550.7A
Other languages
Chinese (zh)
Inventor
邱廷君
田亚运
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ysten Technology Co ltd
Original Assignee
Ysten Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ysten Technology Co ltd filed Critical Ysten Technology Co ltd
Priority to CN202111550550.7A priority Critical patent/CN114253806A/en
Publication of CN114253806A publication Critical patent/CN114253806A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3086Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves the use of self describing data formats, i.e. metadata, markup languages, human readable formats

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Library & Information Science (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses an access stratum log collecting, analyzing and early warning system, which comprises a collecting and counting module, a storage and display module and an analyzing and warning module, wherein the collecting and counting module is deployed on a provincial access stratum; the collection and statistics module carries out change monitoring and setting analysis on the configuration file of the access layer, aggregates statistical calculation and returns to the center; the storage display module comprises a collection storage program and the like and can be used for storing, recording, searching and displaying return data; the analysis alarm module comprises a trend alarm and a state alarm. The invention provides an access stratum log collection, analysis and early warning system which comprises a collection statistical module, a storage and display module and an analysis and alarm module which are respectively arranged in a provincial access stratum and a provincial access stratum center, solves the problems that the original data transmission in the traditional technology occupies large resources and can not analyze and early warn in real time, and simultaneously realizes an early warning scheme capable of customizing in real time.

Description

Access stratum log collection, analysis and early warning system
Technical Field
The invention relates to the field of computers, in particular to an access stratum log collection, analysis and early warning system.
Background
The most common log collection and analysis system scheme in the industry at present is the ELK scheme.
The ELK is a collection of elastic search, logstack and Kibana open source software, and is an open source scheme of a log management system. The system can be used for log search, analysis and visual display. And the Logstash acquires the log, formats the log into a JSON format, transmits the log into an ElasticSearch for storage, and accesses the Kibana to inquire log information by a browser.
The ELK scheme can provide the capability of log collection and analysis, but because the collected logs are all stored in the Elasticissearch in the form of original data, the requirement on the storage capacity is high under the condition of large log quantity, and if the requirement of remote transmission exists, the requirement on the network bandwidth is also high, which indicates that the temporary resource of the ELK scheme is large.
Because the traditional analysis early warning is based on original log data and is not streaming received log data, the time change of the log data cannot be confirmed in real time, and the log data used for analysis early warning is always ensured to be complete by delaying for a certain amount of time
Therefore, the existing scheme of the acquisition and analysis system has the problems that the transmission of original data occupies large resources and real-time analysis and early warning cannot be realized.
Disclosure of Invention
The invention aims to provide an access stratum log collection, analysis and early warning system. The invention has the advantages of relatively small occupied resources and real-time analysis and early warning.
The technical scheme of the invention is as follows: an access stratum log collecting, analyzing and early warning system comprises a collecting and counting module, a storage and display module and an analyzing and warning module, wherein the collecting and counting module is deployed on a provincial access stratum;
the collecting and counting module carries out change monitoring and setting analysis on the configuration file of the access layer, carries out aggregate counting calculation by taking minutes as time dimension and regularly transmits back the counting data to the center for storage, calculation and display;
the storage display module comprises a collection storage program, kafka, an elastic search, kibana and grafana and can be used for storing, recording, searching and displaying returned log statistical data;
the analysis alarm module comprises a trend alarm and a state alarm;
the trend alarm judges and realizes the abnormal trend change alarm of the service index through real-time streaming calculation;
and the state alarm judges and realizes the absolute value abnormal alarm of the service index through timing retrieval.
In the foregoing system for collecting, analyzing and warning an access stratum log, the collecting and counting module obtains a list of a Nginx log record file and a log record format by monitoring and analyzing a log configuration file of the Nginx, and generates a task of collecting, analyzing and counting the Nginx log record file; the Nginx log configuration file comprises nginx.conf and a plurality of vhost.conf; the nginx.conf can determine the log format type and the detailed format, and each vhost.conf corresponds to the log generation configuration of different services and comprises respective file names and the used log format types.
In the foregoing system for collecting, analyzing and warning an access stratum log, the monitoring and analyzing process of the collecting and counting module is as follows:
a1, monitoring a log configuration file of Nginx;
a2, judging whether the Nginx log configuration file is changed; if yes, executing the step A3, otherwise, continuing monitoring;
a3, judging the configuration type; if yes, executing A4.1; if the log _ format is true, executing A4.2;
a4.1, updating a monitoring service list;
a4.2, updating a log analysis format;
and A5, updating a log collection and analysis task.
In the access stratum log collection, analysis and early warning system, the collection and statistics module monitors and analyzes the log file and then aggregates the log file to generate statistical data which is transmitted back to the center; the specific process is as follows:
b1, monitoring and analyzing the log file in real time, and performing aggregation statistics on index data according to time periods; the time period is defaulted to one minute and can be adjusted according to needs;
b2, returning the statistical data of the previous time period when the statistical period of the log time is changed, and recording the progress point read by the log file into the progress record file;
b3, if the error occurs in returning the statistical data to the center, executing step B4;
b4, repeating the return retry; if the accumulated retry exceeds the limit, go to step B5;
b5, interrupting the step B1 until the returned statistical data is successful.
In the foregoing system for collecting, analyzing and warning an access stratum log, the collection and storage program has the following procedures:
c1, receiving statistical data returned from each region through an http interface service;
c2, preprocessing the statistical data;
c3, writing the preprocessed statistical data into kafka for caching, and storing the preprocessed statistical data into an elastic search database by logstack;
c4, judging whether kafka is abnormal or not; if yes, go to step C5;
c5, interrupting the step C1, and refusing to receive new data.
In the foregoing system for collecting, analyzing and warning the access stratum log, the preprocessing in step C2 includes completing missing default data (missing due to old version), and discarding abnormal data.
In the foregoing access stratum log collection, analysis, and early warning system, the analysis and alarm module performs real-time analysis and calculation on the received statistical data, and alarms when a service index change trend is persistently abnormal, and the specific process is as follows:
d1, monitoring the log items of the early warning and the warning threshold value of the corresponding service index in a customized manner through a configuration file;
d2, receiving data from the kafka real-time streaming, and screening the data of the service or the interface needing to monitor the alarm through pre-filtering; storing the screened data into a cache by taking time as an index for aggregation; if not, storing the data into a cache, and if so, calculating and updating the cache data;
d3, after the time index of the received log statistical data changes, considering that the data of the previous time point is received, and calculating the data of the next time point;
d4, taking out historical data in a period of time, generating data values which are required by calculation of abnormal values and are sorted according to a time sequence, and judging whether the data at the last time point are in accordance with normal distribution of numerical values in the previous period of time through Kolmogorov-Smirnov test to judge whether the data are the abnormal values; if yes, generating corresponding abnormal alarm information; if not, discarding;
d5, further filtering the obtained abnormal alarm information, and judging whether the abnormal alarm information meets the abnormal condition or not through the threshold value of the service index; if yes, go to step D6; if not, discarding;
d6, caching the obtained information needing to be alarmed, acquiring abnormal values in a preset time period from the cache, and judging the continuity and trend consistency of the abnormal values; if the abnormal condition exists, alarming; if not, discarding;
the abnormal value must be continuous and consistent in changing trend (both rising and falling) to be considered as abnormal, otherwise it is normal value fluctuation.
In the foregoing access stratum log collection, analysis, and early warning system, the preset time period in step D6 is defaulted to 10 minutes, which can be adjusted as needed.
Compared with the prior art, the invention provides an access stratum log collecting, analyzing and early warning system, which comprises a collecting and counting module, a storage and display module and an analyzing and warning module, wherein the collecting and counting module, the storage and display module and the analyzing and warning module are respectively deployed at a provincial access stratum and a center; according to the invention, the analysis and statistics of the data are completed in the log acquisition module, and the data volume is reduced after the original data is calculated and counted to form statistical data, so that the occupation of various resources of network transmission, central storage and statistical calculation is reduced, and the occupied resources are relatively small;
the trend alarm in the analysis alarm module is to acquire log statistical data from a kafka streaming mode, sense the time switching of the log statistical data in real time, immediately analyze the log statistical data in the previous time period and alarm, and analyze and early warn in real time;
therefore, the invention has the advantages of relatively small occupied resources and real-time analysis and early warning.
Furthermore, the collection and statistics module monitors the log format and the log item generation configuration in real time, responds to the change of the configuration file in real time, automatically analyzes the analysis rule and the item content of the log which are updated after the log configuration, does not need additional manual intervention, and reduces the labor cost during large-scale deployment and change;
the collection statistical module supports a customized log format based on nginx error.log (the log _ format of the nginx error.log is fixed, the customized log is used for recording customized information based on an error information field provided in the log _ format and is identified and analyzed through the customized format, and the current customized format is ' customization ' action: action _ name '), so that statistics of behaviors under special requirements can be supported; for example, the user-defined log can be written into the nginx error log file after the user-defined action is executed by the lua script, and the collection, analysis and statistical return can also be carried out by the collection and statistical module of the application;
the collecting and counting module analyzes the log information in real time and then performs statistical calculation processing on the result in a dimension of a minute level, a large amount of original log information under the same monitoring dimension is aggregated into an effective running state information record according to a time dimension, a part of processing links of the center are completed in the collecting end in advance, and the occupied resource of transmission, calculation and storage occupied by the log information obtained by the center is greatly reduced;
the collection statistical module respectively maintains log collection progress for each log file, and simultaneously provides a retransmission and interruption blocking mechanism under abnormal conditions, thereby avoiding loss or repetition of returned data and ensuring validity of the data;
the collection and storage program collects, processes, forwards and stores the statistical data returned by the collection and statistics module, performs compatibility processing on the data returned by the collection and statistics modules of different versions, ensures the consistency and the effectiveness of the data, simultaneously performs caching through writing kafka to ensure that the data is not lost, and analyzes and alarms in real time to provide a data source;
logstack receives statistical data from kafka and writes the statistical data into an elastic search, and provides an intuitive data retrieval and operation state chart display page through kibana and grafana;
the trend alarm in the analysis alarm module realizes customized service project range and index alarm threshold value through configuration files, performs streaming data analysis calculation through data accessed from kakfa, monitors service indexes in real time in a minute-level change mode, improves the accuracy of abnormal judgment of the change trend through multiple links such as abnormal value judgment, abnormal continuity judgment, abnormal consistency judgment, abnormal value threshold value judgment and the like based on Kolmogorov-Smirnov test, and provides an alarm for monitoring the change trend of the indexes in real time.
Drawings
FIG. 1 is a system architecture diagram of the present invention;
FIG. 2 is a flow chart of the snoop resolution of the gather statistics module of the present invention;
FIG. 3 is a flow diagram of a storage exhibition module of the present invention;
FIG. 4 is a flow chart of the analyze alarm module of the present invention.
Detailed Description
The invention is further illustrated by the following figures and examples, which are not to be construed as limiting the invention.
Examples are given. An access stratum log collection, analysis and early warning system is shown in fig. 1 and comprises a collection statistical module, a storage display module and an analysis and alarm module, wherein the collection statistical module is deployed on a provincial access stratum;
the collecting and counting module carries out change monitoring and setting analysis on the configuration file of the access layer, carries out aggregate counting calculation by taking minutes as time dimension and regularly transmits back the counting data to the center for storage, calculation and display;
the storage display module comprises a collection storage program, kafka, an elastic search, kibana and grafana and can be used for storing, recording, searching and displaying returned log statistical data;
the analysis alarm module comprises a trend alarm and a state alarm;
the trend alarm judges and realizes the abnormal trend change alarm of the service index through real-time streaming calculation;
and the state alarm judges and realizes the absolute value abnormal alarm of the service index through timing retrieval.
The collecting and counting module acquires a Nginx log record file list and a log record format by monitoring and analyzing a log configuration file of the Nginx and generates a Nginx log record file collecting and analyzing counting task; the Nginx log configuration file comprises nginx.conf and a plurality of vhost.conf; the nginx.conf can determine the log format type and the detailed format, and each vhost.conf corresponds to the log generation configuration of different services and comprises respective file names and the used log format types.
As shown in fig. 2, the monitoring and parsing process of the statistics collecting module is as follows:
a1, monitoring a log configuration file of Nginx;
a2, judging whether the Nginx log configuration file is changed; if yes, executing the step A3, otherwise, continuing monitoring;
a3, judging the configuration type; if yes, executing A4.1; if the log _ format is true, executing A4.2;
a4.1, updating a monitoring service list;
a4.2, updating a log analysis format;
and A5, updating a log collection and analysis task.
As shown in fig. 3, the collection statistics module monitors and analyzes the log file, aggregates the log file to generate statistics data, and transmits the statistics data back to the center; the specific process is as follows:
b1, monitoring and analyzing the log file in real time, and performing aggregation statistics on index data according to time periods; the time period is one minute;
b2, returning the statistical data of the previous time period when the statistical period of the log time is changed, and recording the progress point read by the log file into the progress record file;
b3, if the error occurs in returning the statistical data to the center, executing step B4;
b4, repeating the return retry; if the accumulated retry exceeds the limit, go to step B5;
b5, interrupting the step B1 until the returned statistical data is successful.
The collection and storage program has the following procedures:
c1, receiving statistical data returned from each region through an http interface service;
c2, preprocessing the statistical data;
c3, writing the preprocessed statistical data into kafka for caching, and storing the preprocessed statistical data into an elastic search database by logstack;
c4, judging whether kafka is abnormal or not; if yes, go to step C5;
c5, interrupting the step C1, and refusing to receive new data.
The preprocessing described in step C2 includes complementing missing default data (missing due to old version) and discarding abnormal data.
As shown in fig. 4, the analyzing and alarming module performs real-time analysis and calculation on the received statistical data, and alarms when persistent abnormality occurs in the change trend of the service index, and the specific process is as follows:
d1, monitoring the log items of the early warning and the warning threshold value of the corresponding service index in a customized manner through a configuration file;
d2, receiving data from the kafka real-time streaming, and screening the data of the service or the interface needing to monitor the alarm through pre-filtering; storing the screened data into a cache by taking time as an index for aggregation; if not, storing the data into a cache, and if so, calculating and updating the cache data;
d3, after the time index of the received log statistical data changes, considering that the data of the previous time point is received, and calculating the data of the next time point;
d4, taking out historical data in a period of time, generating data values which are required by calculation of abnormal values and are sorted according to a time sequence, and judging whether the data at the last time point are in accordance with normal distribution of numerical values in the previous period of time through Kolmogorov-Smirnov test to judge whether the data are the abnormal values; if yes, generating corresponding abnormal alarm information; if not, discarding;
d5, further filtering the obtained abnormal alarm information, and judging whether the abnormal alarm information meets the abnormal condition or not through the threshold value of the service index; if yes, go to step D6; if not, discarding;
d6, caching the obtained information needing to be alarmed, acquiring abnormal values in a preset time period from the cache, and judging the continuity and trend consistency of the abnormal values; if the abnormal condition exists, alarming; if not, discarding.
The preset time period in step D6 is 10 minutes.

Claims (6)

1. The utility model provides an access stratum log collection analysis early warning system which characterized in that: the system comprises a collecting and counting module, a storage and display module and an analysis and alarm module, wherein the collecting and counting module is deployed on a provincial access layer;
the collecting and counting module carries out change monitoring and setting analysis on the configuration file of the access layer, carries out aggregate counting calculation by taking minutes as time dimension and regularly transmits back the counting data to the center for storage, calculation and display;
the storage display module comprises a collection storage program, kafka, an elastic search, kibana and grafana and can be used for storing, recording, searching and displaying returned log statistical data;
the analysis alarm module comprises a trend alarm and a state alarm;
the trend alarm judges and realizes the abnormal trend change alarm of the service index through real-time streaming calculation;
and the state alarm judges and realizes the absolute value abnormal alarm of the service index through timing retrieval.
2. The system of claim 1, wherein the system comprises: the collecting and counting module acquires a Nginx log record file list and a log record format by monitoring and analyzing a log configuration file of the Nginx and generates a Nginx log record file collecting and analyzing counting task; the log configuration file of the Nginx comprises Nginx.
3. The system of claim 2, wherein the monitoring and parsing process of the collecting and counting module is as follows:
a1, monitoring a log configuration file of Nginx;
a2, judging whether the Nginx log configuration file is changed; if yes, executing the step A3, otherwise, continuing monitoring;
a3, judging the configuration type; if yes, executing A4.1; if the log _ format is true, executing A4.2;
a4.1, updating a monitoring service list;
a4.2, updating a log analysis format;
and A5, updating a log collection and analysis task.
4. The system of claim 2, wherein the collection and statistics module monitors and analyzes the log file, aggregates the log file to generate statistical data, and transmits the statistical data back to the center; the specific process is as follows:
b1, monitoring and analyzing the log file in real time, and performing aggregation statistics on index data according to time periods;
b2, returning the statistical data of the previous time period when the statistical period of the log time is changed, and recording the progress point read by the log file into the progress record file;
b3, if the error occurs in returning the statistical data to the center, executing step B4;
b4, repeating the return retry; if the accumulated retry exceeds the limit, go to step B5;
b5, interrupting the step B1 until the returned statistical data is successful.
5. The system of claim 1, wherein the collection and storage program comprises the following steps:
c1, receiving statistical data returned from each region through an http interface service;
c2, preprocessing the statistical data;
c3, writing the preprocessed statistical data into kafka for caching, and storing the preprocessed statistical data into an elastic search database by logstack;
c4, judging whether kafka is abnormal or not; if yes, go to step C5;
c5, interrupting the step C1, and refusing to receive new data.
6. The system of claim 1, wherein the analysis and alarm module analyzes and calculates the received statistical data in real time and alarms when the service index change trend is continuously abnormal, and the specific process is as follows:
d1, monitoring the log items of the early warning and the warning threshold value of the corresponding service index in a customized manner through a configuration file;
d2, receiving data from the kafka real-time streaming, and screening the data of the service or the interface needing to monitor the alarm through pre-filtering; storing the screened data into a cache by taking time as an index for aggregation;
d3, after the time index of the received log statistical data changes, considering that the data of the previous time point is received, and calculating the data of the next time point;
d4, taking out historical data in a period of time, generating data values which are required by calculation of abnormal values and are sorted according to a time sequence, and judging whether the data at the last time point are in accordance with normal distribution of numerical values in the previous period of time through Kolmogorov-Smirnov test to judge whether the data are the abnormal values; if yes, generating corresponding abnormal alarm information; if not, discarding;
d5, further filtering the obtained abnormal alarm information, and judging whether the abnormal alarm information meets the abnormal condition or not through the threshold value of the service index; if yes, go to step D6; if not, discarding;
d6, caching the obtained information needing to be alarmed, acquiring abnormal values in a preset time period from the cache, and judging the continuity and trend consistency of the abnormal values; if the abnormal condition exists, alarming; if not, discarding.
CN202111550550.7A 2021-12-17 2021-12-17 Access stratum log collection, analysis and early warning system Pending CN114253806A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111550550.7A CN114253806A (en) 2021-12-17 2021-12-17 Access stratum log collection, analysis and early warning system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111550550.7A CN114253806A (en) 2021-12-17 2021-12-17 Access stratum log collection, analysis and early warning system

Publications (1)

Publication Number Publication Date
CN114253806A true CN114253806A (en) 2022-03-29

Family

ID=80795592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111550550.7A Pending CN114253806A (en) 2021-12-17 2021-12-17 Access stratum log collection, analysis and early warning system

Country Status (1)

Country Link
CN (1) CN114253806A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114826943A (en) * 2022-06-30 2022-07-29 山东捷瑞数字科技股份有限公司 NGINX log analysis method and system
CN115460072A (en) * 2022-08-25 2022-12-09 浪潮云信息技术股份公司 Log processing system integrating log collection, analysis, storage and service

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114826943A (en) * 2022-06-30 2022-07-29 山东捷瑞数字科技股份有限公司 NGINX log analysis method and system
CN114826943B (en) * 2022-06-30 2022-10-28 山东捷瑞数字科技股份有限公司 NGINX log analysis method and system
CN115460072A (en) * 2022-08-25 2022-12-09 浪潮云信息技术股份公司 Log processing system integrating log collection, analysis, storage and service

Similar Documents

Publication Publication Date Title
US7526670B2 (en) Method and system to monitor a diverse heterogeneous application environment
US20100153431A1 (en) Alert triggered statistics collections
CN114253806A (en) Access stratum log collection, analysis and early warning system
CN111339175B (en) Data processing method, device, electronic equipment and readable storage medium
US11323463B2 (en) Generating data structures representing relationships among entities of a high-scale network infrastructure
CN108509313B (en) Service monitoring method, platform and storage medium
US8095514B2 (en) Treemap visualizations of database time
CN102938710A (en) Monitoring system and method for large-scale servers
US20150186436A1 (en) Method and system to monitor a diverse heterogeneous application environment
US20080168044A1 (en) System and method for providing performance statistics for application components
CN103001824A (en) System and method for monitoring multiple servers
CN111241050B (en) Linkage analysis system and method for big data platform
CN112699007A (en) Method, system, network device and storage medium for monitoring machine performance
CN110765189A (en) Exception management method and system for Internet products
CN112069049A (en) Data monitoring management method and device, server and readable storage medium
CN111083008A (en) Nginx-based traffic collection and analysis method
CN117632897A (en) Dynamic capacity expansion and contraction method and device
CN116701525A (en) Early warning method and system based on real-time data analysis and electronic equipment
CN114143169A (en) Micro-service application observability system
CN117194142A (en) Integrated application performance diagnosis system and method based on link tracking
CN112667149B (en) Data heat sensing method, device, equipment and medium
CN112527887B (en) Visual operation and maintenance method and device applied to Gbase database
CN114090382A (en) Health inspection method and device for super-converged cluster
CN113852199B (en) Multi-dimensional power distribution automation inspection system
CN116743618B (en) Data acquisition and analysis method, equipment and medium of station remote equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination