CN115333966B - Topology-based Nginx log analysis method, system and equipment - Google Patents

Topology-based Nginx log analysis method, system and equipment Download PDF

Info

Publication number
CN115333966B
CN115333966B CN202210963046.8A CN202210963046A CN115333966B CN 115333966 B CN115333966 B CN 115333966B CN 202210963046 A CN202210963046 A CN 202210963046A CN 115333966 B CN115333966 B CN 115333966B
Authority
CN
China
Prior art keywords
nginx
access
instance
data
topology
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210963046.8A
Other languages
Chinese (zh)
Other versions
CN115333966A (en
Inventor
田标
崔伟
邓捷
袁科
易景平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Digital Life Technology Co Ltd
Original Assignee
Tianyi Digital Life Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Digital Life Technology Co Ltd filed Critical Tianyi Digital Life Technology Co Ltd
Priority to CN202210963046.8A priority Critical patent/CN115333966B/en
Publication of CN115333966A publication Critical patent/CN115333966A/en
Application granted granted Critical
Publication of CN115333966B publication Critical patent/CN115333966B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Debugging And Monitoring (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to the technical field of computer network management, and discloses a topology-based Nginx log analysis method, system and equipment. The server of each Nginx instance obtains an Nginx access record based on a Logstash configuration file or a pre-developed acquisition tool and sends the record to a Kafka cluster for storage; the server extracts a corresponding Nginx access record set from the Kafka cluster according to analysis requirements, analyzes and obtains association relations between users and each service system, calculates and obtains access quality data among each associated node according to corresponding operation quality data, and further constructs a topological graph, requests the flow direction of data to serve as the direction of each side of the graph, and sets the attribute of the corresponding side according to the access quality data among the associated nodes. The invention solves the technical problem of how to construct the topological graph reflecting the call relation and the running quality change condition between the user and the related system of the service global based on the Nginx log.

Description

Topology-based Nginx log analysis method, system and equipment
Technical Field
The present invention relates to the field of computer network management technologies, and in particular, to a topology-based nmginx log analysis method, system, and device.
Background
With the development of network technology, modern service systems become more and more complex, and more servers and terminal devices are involved, and network topology is particularly important in large-scale network management. From the global view of business, the user access and the running quality data (such as performance, error, access quantity and the like) and the change condition thereof in the inter-system call process can provide quantitative data support for timely discovery, deep analysis, post playback, verification and the like of the network security problem of the distributed system if the running quality data can be displayed in the form of a topological graph, so that the service guarantee work of the distributed system is greatly facilitated.
At present, the connection relationship between servers is analyzed according to logs such as connection and flow among servers, and the network topology state among servers is determined according to the analyzed connection relationship. For example, the patent application with the application number CN202111227745.8 provides a method, a system, a processing device and a storage medium for generating a dynamic network topology map based on logs and graphs, which analyze access server IP, accessed server IP, connection time, user MAC address, connection mode and traffic data by collecting server traffic and ssh, ftp, web system logs, and further create a directed graph structure by using the analyzed data, so that related personnel can query the network topology state according to the created graph. However, the existing method can only process the two-layer calling relationship from the source server to the target server, can not reflect the calling relationship and the running quality change condition between the user and the related system of the service global, and needs to analyze the connection relationship between the servers by analyzing various log data, thereby reducing the practicability of the method.
The Nginx can be widely applied because of the characteristics of a back-end system capable of reversely acting a plurality of different protocols with high performance, the expansion is convenient, and the like, and the log can record the information of source IP accessed by each user, time consumption of a request forwarded to each back-end instance, error or the like. Through analyzing the Nginx log, the calling relationship between the user of the service global and the related system can be known. However, due to factors such as large data volume of the nmginx log, insufficient call identification between different nmginx instances, and differences between different platforms, how to construct a topological graph reflecting call relations and running quality change conditions between users and related systems of the service global based on the nmginx log, a system solution is always lacking in the prior art.
Disclosure of Invention
The invention provides a topology-based Nginx log analysis method, system and equipment, which solve the technical problem of how to construct a topology graph reflecting the call relationship and the running quality change condition between a user and a related system of a service global based on the Nginx log.
The first aspect of the present invention provides a topology-based nmginx log analysis method, including:
determining an Nginx log analysis requirement, wherein the Nginx log analysis requirement comprises a unique identification of a target service and information of an analysis time period;
Acquiring an Nginx access record set corresponding to the analysis time period from a Kafka cluster according to the unique identifier of the target service; the Kafka cluster is provided with a plurality of Nginx access records, each Nginx access record is obtained by collecting and analyzing an Nginx access log by a server where a corresponding Nginx instance is located based on a preset Logstash configuration file or a pre-developed collecting tool, the Nginx access records comprise call relation topology data and corresponding operation quality data from an access initiator to an access target, the access initiator is a user side, the server where the Nginx instance is located or a reverse proxy back-end system instance, and the access target is the server where the Nginx instance is located or the reverse proxy back-end system instance;
analyzing the call relationship topology data in the Nginx access record set to obtain association relationships among the user ends, the servers where the Nginx instances are located and the reverse proxy back-end system instances, and calculating preset access quality indexes among the association nodes according to the operation quality data in the Nginx access record set to obtain corresponding access quality data;
And constructing a corresponding topological graph according to the association relation and the access quality data, wherein the flow direction of the request data is used as the direction of each side of the topological graph, and the attribute of the corresponding side is set according to the access quality data among the association nodes.
According to one implementation manner of the first aspect of the present invention, the acquiring, from a Kafka cluster, a set of nginnx access records corresponding to the analysis period according to the unique identifier of the target service includes:
and reading the Nginx access record set by using the Logstash.
According to an implementation manner of the first aspect of the present invention, the analyzing each call relationship topology data in the nginnx access record set includes:
determining each node address from an access initiator to an access target of all the Nginx access records in the Nginx access record set;
and matching the access target address of each Nginx access record with the node addresses of the rest Nginx access records in the Nginx access record set, and determining all nodes and corresponding access relations of each complete multi-level call chain according to the obtained matching result.
According to one manner that the first aspect of the present invention can be implemented, the method further includes:
Setting corresponding chain identification information according to the access target address of each complete multi-stage call chain, wherein the chain identification information comprises a service name, a project name and/or an access target unique identifier which correspond to the multi-stage call chain;
classifying call relation topology data, corresponding operation quality data, association relation and access quality data among nodes in a corresponding multi-level call chain into Nginx instance data, reverse proxy back-end system instance data and project data;
setting a corresponding Nginx instance table, a back-end system instance table and a project function table in a database according to the chain identification information, storing the Nginx instance data in the corresponding Nginx instance table, storing the reverse proxy back-end system instance data in the corresponding back-end system instance table, storing the project data in the corresponding project function table, and adding corresponding labels to each of the Nginx instance table, the back-end system instance table and the project function table.
According to one manner that the first aspect of the present invention can be implemented, the method further includes:
and synchronizing each Nginx instance table, the back-end system instance table and the item function table into an ES index according to the label.
According to one implementation manner of the first aspect of the present invention, the preset access quality index includes a total access amount, an error amount of which a status code exceeds a corresponding preset threshold range, a distribution of the error amount on each URL, a log number of which a request time exceeds a corresponding project time threshold, an input flow, an output flow, and/or an access amount suspected of having a security risk.
A second aspect of the present invention provides a topology-based nmginx log analysis system, comprising:
the requirement determining module is used for determining an Nginx log analysis requirement, wherein the Nginx log analysis requirement comprises a unique identifier of a target service and information of an analysis time period;
the data acquisition module is used for acquiring an Nginx access record set corresponding to the analysis time period from the Kafka cluster according to the unique identifier of the target service; the Kafka cluster is provided with a plurality of Nginx access records, each Nginx access record is obtained by collecting and analyzing an Nginx access log by a server where a corresponding Nginx instance is located based on a preset Logstash configuration file or a pre-developed collecting tool, the Nginx access records comprise call relation topology data and corresponding operation quality data from an access initiator to an access target, the access initiator is a user side, the server where the Nginx instance is located or a reverse proxy back-end system instance, and the access target is the server where the Nginx instance is located or the reverse proxy back-end system instance;
The data analysis module is used for analyzing the call relationship topology data in the Nginx access record set to obtain the association relationship among the user side, the server where the Nginx instance is located and the reverse proxy back-end system instance, and calculating the preset access quality index among the association nodes according to the operation quality data in the Nginx access record set to obtain corresponding access quality data;
the topology diagram construction module is used for constructing a corresponding topology diagram according to the association relation and the access quality data, wherein the direction of each side of the topology diagram is taken as the flow direction of the request data, and the attribute of the corresponding side is set according to the access quality data among the association nodes.
According to one implementation manner of the second aspect of the present invention, the data acquisition module includes:
and reading the Nginx access record set by using the Logstash.
According to one manner in which the second aspect of the present invention can be implemented, the data analysis module includes:
a determining unit, configured to determine each node address from an access initiator to an access target of all the Nginx access records in the Nginx access record set;
And the matching unit is used for matching the access target address of each Nginx access record with the node addresses of the rest Nginx access records in the Nginx access record set, and determining all nodes of each complete multi-level call chain and corresponding access relations according to the obtained matching result.
According to one manner in which the second aspect of the invention can be implemented, the system further comprises:
the setting module is used for setting corresponding chain identification information according to the access target address of each complete multi-stage call chain, wherein the chain identification information comprises a service name, a project name and/or an access target unique identifier which correspond to the multi-stage call chain;
the data classification module is used for classifying call relation topology data, corresponding operation quality data, association relation and access quality data among all nodes in the corresponding multi-level call chain into Nginx instance data, reverse proxy back-end system instance data and project data;
the data storage module is used for setting a corresponding Nginx instance table, a back-end system instance table and a project function table in a database according to the chain identification information, storing the Nginx instance data in the corresponding Nginx instance table, storing the reverse proxy back-end system instance data in the corresponding back-end system instance table, storing the project data in the corresponding project function table, and adding corresponding labels to each of the Nginx instance table, the back-end system instance table and the project function table.
According to one manner in which the second aspect of the invention can be implemented, the system further comprises:
and the data synchronization module is used for synchronizing each Nginx instance table, the back-end system instance table and the item function table into an ES index according to the label.
According to one implementation manner of the second aspect of the present invention, the preset access quality index includes a total access amount, an error amount of which a status code exceeds a corresponding preset threshold range, a distribution of the error amount on each URL, a log number of which a request time exceeds a corresponding project time threshold, an input flow, an output flow, and/or an access amount suspected of having a security risk.
A third aspect of the present invention provides a topology-based nmginx log analysis system, comprising:
the method comprises the steps that the acquisition end is deployed at the server where each Nginx instance is located, the acquisition end acquires an Nginx access log based on a preset Logstar configuration file or a pre-developed acquisition tool, analyzes the Nginx access log to obtain a corresponding Nginx access record, and sends the obtained Nginx access record to a Kafka cluster for storage; the Nginx access record comprises call relation topology data and corresponding operation quality data from an access initiator to an access target, wherein the access initiator is a server or a reverse proxy back-end system instance where a user terminal and an Nginx instance are located, and the access target is a server or a reverse proxy back-end system instance where the Nginx instance is located;
The server comprises the Kafka cluster and an analysis device;
the analysis device is used for determining an Nginx log analysis requirement, wherein the Nginx log analysis requirement comprises a unique identification of a target service and information of an analysis time period; acquiring an Nginx access record set corresponding to the analysis time period from the Kafka cluster according to the unique identifier of the target service; analyzing the call relationship topology data in the Nginx access record set to obtain association relationships among the user ends, the servers where the Nginx instances are located and the reverse proxy back-end system instances, and calculating preset access quality indexes among the association nodes according to the operation quality data in the Nginx access record set to obtain corresponding access quality data; and constructing a corresponding topological graph according to the association relation and the access quality data, wherein the flow direction of the request data is used as the direction of each side of the topological graph, and the attribute of the corresponding side is set according to the access quality data among the association nodes.
According to one possible implementation of the third aspect of the present invention, the analysis device is specifically configured to:
and reading the Nginx access record set by using the Logstash.
According to one possible implementation of the third aspect of the present invention, the analysis device is specifically configured to:
determining each node address from an access initiator to an access target of all the Nginx access records in the Nginx access record set;
and matching the access target address of each Nginx access record with the node addresses of the rest Nginx access records in the Nginx access record set, and determining all nodes and corresponding access relations of each complete multi-level call chain according to the obtained matching result.
According to one possible implementation of the third aspect of the present invention, the analysis device is further specifically configured to:
setting corresponding chain identification information according to the access target address of each complete multi-stage call chain, wherein the chain identification information comprises a service name, a project name and/or an access target unique identifier which correspond to the multi-stage call chain;
classifying call relation topology data, corresponding operation quality data, association relation and access quality data among nodes in a corresponding multi-level call chain into Nginx instance data, reverse proxy back-end system instance data and project data;
setting a corresponding Nginx instance table, a back-end system instance table and a project function table in a database according to the chain identification information, storing the Nginx instance data in the corresponding Nginx instance table, storing the reverse proxy back-end system instance data in the corresponding back-end system instance table, storing the project data in the corresponding project function table, and adding corresponding labels to each of the Nginx instance table, the back-end system instance table and the project function table.
According to one possible implementation of the third aspect of the present invention, the analysis device is further specifically configured to:
and synchronizing each Nginx instance table, the back-end system instance table and the item function table into an ES index according to the label.
According to one implementation manner of the third aspect of the present invention, the preset access quality index includes a total access amount, an error amount of which a status code exceeds a corresponding preset threshold range, a distribution of the error amount on each URL, a log number of which a request time exceeds a corresponding item time threshold, an input flow, an output flow, and/or an access amount suspected of having a security risk.
According to an implementation manner of the third aspect of the present invention, the collecting terminal includes a logstack configured with the preset logstack configuration file, and the collecting terminal is specifically configured to perform:
analyzing the collected Nginx access log to obtain a target field of an Nginx access record, deleting a record of a local request, accessing a target address and a record of abnormal performance data, and designating the target field of a type to be converted;
adding the service name of the current log, the identifier of the server where the deployed Nginx instance is located and the IP address of the Nginx instance as special fields to the corresponding Nginx access record;
Respectively adding the IP address and the access target address of the current Nginx instance into an Nginx service instance set and a back-end system instance set of an external Redis cache;
converting the user source IP to obtain the Chinese names, the longitudes and the latitudes of the cities, the provinces and the countries to which the user source IP belongs, and adding the obtained result as a new field into the corresponding Nginx access record.
According to one implementation manner of the third aspect of the present invention, the collecting end is further specifically configured to:
code is added in the logstack configuration file to check whether malicious access exists in the ngginx access log, whether SQL injection attack occurs, whether the length and the content of the request parameters are scanned by an attack tool or a security tool or not, and whether the problems exist in the request parameters or not, and corresponding fields are added in corresponding ngginx access records according to the obtained checking result.
A fourth aspect of the present invention provides a topology-based nmginx log analysis device, including:
a memory for storing instructions; the instructions are used for realizing the topology-based Nginx log analysis method in a mode that any one of the above can be realized;
and the processor is used for executing the instructions in the memory.
A fifth aspect of the present invention is a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements a topology-based Nginx log analysis method according to any one of the above-described modes.
From the above technical scheme, the invention has the following advantages:
the method comprises the steps that a server in which an Nginx instance is located collects an Nginx access log based on a preset Log stack configuration file or a pre-developed collection tool, analyzes calling relations and operation quality data in the Nginx access log, obtains an Nginx access record and sends the Nginx access record to a Kafka cluster for storage; extracting a corresponding Nginx access record set from the Kafka cluster according to the Nginx log analysis requirement for analysis to obtain the association relation among the user ends, the servers where the Nginx instances are located and the back-end system instances of the reverse proxy, calculating preset access quality indexes among the association nodes according to corresponding operation quality data to obtain access quality data, and constructing a corresponding topological graph according to an analysis result, wherein the flow direction of the request data is used as the direction of each side of the topological graph, and setting the attribute of the corresponding side according to the access quality data among the association nodes; according to a series of Nginx massive access logs distributed on multiple servers, the invention provides a scheme for analyzing access relations, project function compositions and corresponding operation quality between users and related systems in a service global display manner in a topological graph form, solves the technical problem of how to construct a topological graph reflecting the calling relation and operation quality change condition between the users and related systems in the service global based on the Nginx logs, has high-efficiency topological processing performance, and can also form a complete topology of service system deployment composition together with a calling topology between back-end systems; the change of the analysis requirement of the Nginx logs can be responded quickly by adjusting the Logstash configuration file or the related information of a pre-developed acquisition tool, and service global access topology data can be automatically generated and stored from massive Nginx access logs without recompilation and release, so that the method has the advantages of small development workload and convenience in deployment and adjustment; according to the invention, the attribute of the corresponding edge is set according to the access quality data among the associated nodes, so that the discovered problem can be directly prompted on the topological graph, and the service technician can comprehensively grasp the operation quality change condition of the service global through the topological graph, thereby providing quantitative data support for timely discovery, deep analysis, post playback, verification and the like of the network security problem of the distributed system, and greatly facilitating the service guarantee work of the distributed system.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained from these drawings without inventive faculty for a person skilled in the art.
FIG. 1 is a flow chart of a topology-based Nginx log analysis method provided by an alternative embodiment of the invention;
FIG. 2 is a schematic diagram of a specific storage scheme for storing relevant data in Nginx log analysis using tables according to an alternative embodiment of the present invention;
FIG. 3 is a schematic diagram of an example of an application access topology based on the method of FIG. 1 according to an alternative embodiment of the present invention;
FIG. 4 is a block diagram of a topology-based Nginx log analysis system according to an alternative embodiment of the present invention;
FIG. 5 is a block diagram of a topology-based Nginx log analysis system according to another alternative embodiment of the present invention;
fig. 6 is a schematic structural diagram of an ngnx log analysis system when an acquisition end acquires an ngnx access log based on a preset logstack configuration file and analyzes the ngnx access log to obtain a corresponding ngnx access record according to an alternative embodiment of the present invention.
Reference numerals:
in FIG. 3, 1-a demand determination module; 2-a data acquisition module; 3-a data analysis module; 4-a topology map construction module;
in FIG. 4, 10-acquisition end; 20-a server side; 201-Kafka cluster; 202-analysis device.
Detailed Description
The embodiment of the invention provides a topology-based Nginx log analysis method, system and equipment, which are used for solving the technical problem of how to construct a topology graph reflecting the call relationship and the running quality change condition between a user and a related system of a service global based on the Nginx log.
In order to make the objects, features and advantages of the present invention more comprehensible, the technical solutions in the embodiments of the present invention are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention provides a topology-based Nginx log analysis method.
Referring to fig. 1, fig. 1 shows a flowchart of an Nginx log analysis method based on topology according to an embodiment of the present invention.
The Nginx log analysis method based on the topology provided by the embodiment of the invention comprises the steps S1-S4.
Step S1, determining Nginx log analysis requirements, wherein the Nginx log analysis requirements comprise unique identification of target business and information of analysis time periods.
The nmginx log analysis requirement can be determined by receiving analysis requirements initiated by other preset terminals or by periodically generating the analysis requirements. The length of the analysis period is set according to actual requirements.
And S2, acquiring an Nginx access record set corresponding to the analysis time period from the Kafka cluster according to the unique identification of the target service.
The Kafka cluster is provided with a plurality of Nginx access records, each Nginx access record is acquired and analyzed by a server where a corresponding Nginx instance is located based on a preset Logstar configuration file or a pre-developed acquisition tool, the Nginx access records comprise call relationship topology data and corresponding operation quality data from an access initiator to an access target, the access initiator is a client, the server where the Nginx instance is located or a reverse proxy back-end system instance, and the access target is the server where the Nginx instance is located or the reverse proxy back-end system instance.
It should be noted that, in order to implement the method in the embodiment of the present invention, the server where the nmginx instance is located may deploy the acquisition end described in the following embodiment of the present invention, so as to implement extraction of the nmginx access record. Specific details of the extraction are not described in detail in this embodiment.
In one implementation manner, the acquiring, from the Kafka cluster, the set of nginnx access records corresponding to the analysis period according to the unique identifier of the target service includes:
and reading the Nginx access record set by using the Logstash.
Each nmginx access record in the set of nmginx access records may be parsed into individual required fields using logstack.
Step S3, analyzing the call relationship topology data in the Nginx access record set to obtain the association relationship among the user end, the server where the Nginx instance is located and the reverse proxy back-end system instance, and calculating the preset access quality index among the association nodes according to the operation quality data in the Nginx access record set to obtain the corresponding access quality data.
One nmginx access record corresponds to one actual access request from the access initiator to the access target via nmginx. In a multi-level access link, the access target may also be the access initiator of other nmginx logs. And analyzing and processing call relationship topology data of each Nginx access record in the current analysis time period by using a log stack-filter-ruby plug-in to obtain the association relationship among the user side, the server where each Nginx instance is located and each reverse proxy back-end system instance.
Specifically, the judgment can be performed by using the nginix instance IP to find out all nodes and access relations of the then complete multi-level call chain from all access records in the current time period. For example, when the access initiator address is an IP of the nmginx instance, a connection is made from the IP to the current nmginx; if the access target address is also an Nginx instance, the current Nginx requests another Nginx instance, namely, multiple layers of Nginx are available, at the moment, the addresses of the front end service instance and the back end service instance of each Nginx are assembled, and whether the back end service is also the Nginx and the current Nginx level is synchronously set, and the Nginx facing the user end is the first layer; if the access initiator address and the access target address are both back-end system instance addresses, the back-end system corresponding to the access target address is called through Nginx; if the access destination address is the access initiator address of the other nmginx log, the corresponding system of the access destination address will actually continue to call the nmginx instance to which the corresponding log belongs.
In one implementation, the analyzing each call relationship topology data in the nginnx access record set includes:
Determining each node address from an access initiator to an access target of all the Nginx access records in the Nginx access record set;
and matching the access target address of each Nginx access record with the node addresses of the rest Nginx access records in the Nginx access record set, and determining all nodes and corresponding access relations of each complete multi-level call chain according to the obtained matching result.
In the embodiment of the invention, the association relation among the nodes is determined in an address matching mode, and the method is simple and convenient.
In one implementation manner, the preset access quality index includes a total access amount, an error amount of the status code exceeding a corresponding preset threshold range, a distribution of the error amount on each URL, a log amount of the request time exceeding a corresponding item time threshold, an input flow, an output flow, and/or an access amount suspected of having a security risk.
As a specific embodiment, the preset access quality indicator may be calculated by a logstack-filter-ruby plug-in.
As a specific embodiment, it is possible to check whether there is a request satisfying not less than 3 conditions suspected of being at risk, and then regard the request as an access suspected of being at risk for security. The suspected risky conditions may include:
The user always accesses a specific URL, the IP of the user source is unchanged or the range between the changed IP addresses is within a certain threshold value, the number of times of operation per second/minute of the user is not within a corresponding threshold value range, the average value and standard deviation of adjacent operation time differences of the user are not within the corresponding threshold value range, and the request parameters and rules thereof accord with preset characteristics.
And S4, constructing a corresponding topological graph according to the association relation and the access quality data, wherein the flow direction of the request data is used as the direction of each side of the topological graph, and the attribute of the corresponding side is set according to the access quality data among the association nodes.
Wherein the UI may be developed based on Grafana or some JavaScript interaction graph tool to expose the topology graph through the UI.
Further, access quality data corresponding to the preset access quality index exceeding the alarm threshold value can be marked on the topological graph.
The marking mode can be to set different colors, animations, voice prompts and/or wire thickness effects according to different alarm levels so as to emphasize the positions of corresponding data on the topological graph.
The embodiment of the invention can enable related personnel to quickly and intuitively find out the service and the influence range of the service with the running quality problem and the alarm by looking up the topological graph, thereby being beneficial to improving the efficiency of the problem investigation, fundamentally solving the problem and taking measures to eliminate before the problem is seriously developed.
Because the data on the topological graph is only a static analysis result of a period of time, the judgment about the global operation health condition of the service can be output by analyzing all alarms in the current selected period of time. For example, if the slow performance of items without direct call relation of one machine room, the error amount exceeds a certain threshold value and reaches a certain ratio, checking of machine room infrastructure can be prompted, processing priority of related items can be prompted according to the number of alarms, and the like.
On the UI, the selection of topology elements by the mouse, hovering, right-click operation may also be responded to support viewing more data, switching to topology of other services, etc., for example, further providing each node on the topology map with more variable trend analysis graphics such as the following, to assist the user in understanding the change situation of the running quality from more dimensions:
(1) Inquiring the logs of the currently selected nodes in the selected time period from the ES, taking each log as a point, and providing a scatter diagram to display the corresponding information of time consumption of the request and whether errors occur;
(2) Providing a histogram showing the number of requests within different performance intervals;
(3) Providing a line graph showing the trend of slow performance access (i.e., the number of logs for which the request time exceeds a threshold specified by the project) over time;
(4) Providing a table to show the analyzed and found access IP with security risk, related URL, type of security problem, occurrence time, access parameters and frequency; an item has data such as which URLs, associated descriptions, time of online use, alarm thresholds, alarms that actually occur, etc.
If the actual topological graph contains more nodes and layers, the user can be allowed to select the maximum allowed node layer times of entering and exiting the service nodes by taking the service as the center on the UI displayed by the topological graph, so that the scale of the specific displayed topological graph is controlled. The proposal can greatly reduce the development workload, support the configuration of the instantaneous adjustment logstack and is beneficial to the quick response of the analysis requirement of the Nginx log.
In one implementation, the method further comprises:
setting corresponding chain identification information according to the access target address of each complete multi-stage call chain, wherein the chain identification information comprises a service name, a project name and/or an access target unique identifier which correspond to the multi-stage call chain;
classifying call relation topology data, corresponding operation quality data, association relation and access quality data among nodes in a corresponding multi-level call chain into Nginx instance data, reverse proxy back-end system instance data and project data;
Setting a corresponding Nginx instance table, a back-end system instance table and a project function table in a database according to the chain identification information, storing the Nginx instance data in the corresponding Nginx instance table, storing the reverse proxy back-end system instance data in the corresponding back-end system instance table, storing the project data in the corresponding project function table, and adding corresponding labels (tags) to each of the Nginx instance table, the back-end system instance table and the project function table.
In the embodiment of the invention, the Nginx instance table can support the storage of topology information when a service has a plurality of stages of Nginx serving as a reverse proxy service. The Nginx instance data stored in the Nginx instance table may include a service to which the Nginx instance belongs, an IP, and an identifier of a server where the Nginx instance belongs. If the service to which the nmginx instance belongs adopts a multi-level nmginx proxy, the nmginx instance data may further include information about a hierarchy to which the current nmginx belongs, whether the first layer directly accepts user access, a front-end service instance set, a back-end service instance set, and whether the back-end service of the current nmginx instance is nmginx or other. The front-end service instance set is a service instance set for accessing the current Nginx instance, and the back-end service instance set is a set of service instances to be accessed by the current Nginx instance.
In the embodiment of the invention, the back-end system instance table can support the storage of the back-end system instance data of the reverse proxy. The back-end system instance data of the reverse proxy can comprise topology information of service instances such as back-end Web, gRPC and the like of the Nginx service reverse proxy, wherein the topology information comprises service names, project deployment information, an external communication protocol, an instance address of a called project, a project instance address of a calling current project, corresponding operation quality data and access quality data. The backend system instance table may also store a threshold value corresponding to a preset access quality indicator.
In the embodiment of the invention, the project function table can support the storage of project data. The project data may include a name and an access address (e.g., URL in the nginnx access log) of a project function, a service to which the project function belongs, project information called by the project function, corresponding operation quality data, and access quality data. The item function table may also store thresholds for corresponding access quality indicators.
For the above embodiment of the present invention, when the obtained analysis data is stored, the current topology record may be saved to the database by combining the external service interface with the nmginx instance table, the backend system instance table and the project function table, and the current system time is set as the operation time. When data is stored, when the number of rear-end system examples of one service deployment increase/decrease Nginx or an agent thereof is checked and the deployment address is changed, a history record and change time of a corresponding topology can be automatically generated in a database, so that a user can be allowed to check the specific change condition of the service topology. When each URL of the back-end system is stored, the URL is added only when the URL does not exist in the table, and the adding time is recorded, so that the functions of one item and the online use time of each function can be recorded.
In addition to the nmginx instance table, the backend system instance table, and the project function table, a service domain name table, a user area access history table, and a node operation quality history table may be set to store corresponding data.
For example, the service domain name table may store information of services and domain names thereof related to all topological graphs, including that the services use several layers of ng ix agents;
the user area access history table can be used for storing the access amount of a certain city to a service within a certain time and the access amount of errors, slow performance, flow and suspected security problems in the access amount;
the operation quality history table may be used to store operation quality index values for each edge of the generated topology map, including an access address of a start node (an nmginx instance or a backend system instance) and a target instance (an end node, also an nmginx instance or a backend system instance) of its access, a start time of each analysis time interval, a main operation quality index of the access path, such as an access amount, 4XX and 5XX error amounts, a distribution of errors over functions, a log number (slow energy) that a request time exceeds a threshold value specified by an item, an input output flow, and a number of suspected security requests.
It should be noted that the specific content stored in each table may be adjusted appropriately according to actual requirements. Illustratively, fig. 2 is a schematic diagram of a specific storage scheme for storing relevant data in an nginnx log analysis by using tables according to an alternative embodiment of the present invention. The present invention can set what each table should store with reference to the storage scheme shown in fig. 2.
As another possible way, a graph database may be used for storing the obtained analysis data, and the configuration of the corresponding topology graph is conceptually and logically consistent with the above-described way of storing through a table, and the main nodes include: the user source city, the Nginx instance and the back-end system instance can be functionally extended to different nodes, each back-end instance can call other systems through the Nginx, and the attribute of each node can select data needed on a service by referring to the fields of the table.
In one implementation, the method further comprises:
and synchronizing each Nginx instance table, the back-end system instance table and the item function table into an ES index according to the label.
Both nginnx and ngress based on K8S of nginnx are widely used for proxy backend services, and the content and format of the access log output by both are highly consistent. In the above embodiment of the invention, by analyzing the user area, the nminb and the related system topologies of the back-end system, the calling party and the called party of the system, and then matching with indexes such as access quantity, error rate and performance of each part of the topology, a topology graph is constructed, so that a technician can conveniently and quickly overview the overall operation quality of the service, quickly find out the problematic part and acquire related data, and the topology graph can be combined with the call topology among the back-end systems to form a complete topology for service system deployment.
Fig. 3 shows an example schematic diagram of an application access topology obtained by the method according to the above embodiment of the present invention according to an alternative embodiment of the present invention.
As shown in fig. 3, when the access quality data between the associated nodes is used as the attribute of the edge, the access quality data is presented in a format similar to "total 268/slow 12/error 5/amp 13". "Total 268/slow 12/error 5/A13" means that there are a total of 268 access logs between two nodes over a selected period of time, where the performance of 12 requests exceeds a threshold, 5 errors occur, and 13 requests are suspected of being at security risk. It should be noted that other indexes (e.g., traffic) may be added to the required service in the format, and if there is no problem, the corresponding index may be set to 0.
The invention also provides a topology-based Nginx log analysis system.
Referring to fig. 4, fig. 4 shows a structural connection block diagram of an Nginx log analysis system based on topology according to an embodiment of the present invention.
The embodiment of the invention provides a topology-based Nginx log analysis system, which comprises the following components:
the requirement determining module 1 is configured to determine an nginnx log analysis requirement, where the nginnx log analysis requirement includes a unique identifier of a target service and information of an analysis time period;
The data acquisition module 2 is used for acquiring an Nginx access record set corresponding to the analysis time period from the Kafka cluster according to the unique identifier of the target service; the Kafka cluster is provided with a plurality of Nginx access records, each Nginx access record is obtained by collecting and analyzing an Nginx access log by a server where a corresponding Nginx instance is located based on a preset Logstash configuration file or a pre-developed collecting tool, the Nginx access records comprise call relation topology data and corresponding operation quality data from an access initiator to an access target, the access initiator is a user side, the server where the Nginx instance is located or a reverse proxy back-end system instance, and the access target is the server where the Nginx instance is located or the reverse proxy back-end system instance;
the data analysis module 3 is configured to analyze each call relationship topology data in the nginnx access record set to obtain an association relationship among each user side, each server where the nginnx instance is located, and each reverse proxy back-end system instance, and calculate a preset access quality index between each associated node according to each operation quality data in the nginnx access record set, so as to obtain corresponding access quality data;
The topology diagram construction module 4 is configured to construct a corresponding topology diagram according to the association relationship and the access quality data, wherein a direction of each side of the topology diagram is taken as a flow direction of the request data, and attributes of the corresponding side are set according to the access quality data between the association nodes.
In one possible implementation, the data acquisition module 2 includes:
and reading the Nginx access record set by using the Logstash.
In one possible implementation, the data analysis module 3 comprises:
a determining unit, configured to determine each node address from an access initiator to an access target of all the Nginx access records in the Nginx access record set;
and the matching unit is used for matching the access target address of each Nginx access record with the node addresses of the rest Nginx access records in the Nginx access record set, and determining all nodes of each complete multi-level call chain and corresponding access relations according to the obtained matching result.
In one implementation, the system further comprises:
the setting module is used for setting corresponding chain identification information according to the access target address of each complete multi-stage call chain, wherein the chain identification information comprises a service name, a project name and/or an access target unique identifier which correspond to the multi-stage call chain;
The data classification module is used for classifying call relation topology data, corresponding operation quality data, association relation and access quality data among all nodes in the corresponding multi-level call chain into Nginx instance data, reverse proxy back-end system instance data and project data;
the data storage module is used for setting a corresponding Nginx instance table, a back-end system instance table and a project function table in a database according to the chain identification information, storing the Nginx instance data in the corresponding Nginx instance table, storing the reverse proxy back-end system instance data in the corresponding back-end system instance table, storing the project data in the corresponding project function table, and adding corresponding labels to each of the Nginx instance table, the back-end system instance table and the project function table.
In one implementation, the system further comprises:
and the data synchronization module is used for synchronizing each Nginx instance table, the back-end system instance table and the item function table into an ES index according to the label.
In one implementation manner, the preset access quality index includes a total access amount, an error amount of the status code exceeding a corresponding preset threshold range, a distribution of the error amount on each URL, a log amount of the request time exceeding a corresponding item time threshold, an input flow, an output flow, and/or an access amount suspected of having a security risk.
The invention also provides a topology-based Nginx log analysis system.
Referring to fig. 5, fig. 5 is a block diagram illustrating a structural connection of an nginnx log analysis system based on topology according to another alternative embodiment of the present invention.
The embodiment of the invention provides a topology-based Nginx log analysis system, which comprises the following components:
the acquisition end 10 is deployed at the server where each ng nx instance is located, the acquisition end 10 acquires ng nx access logs based on a preset logstack configuration file or a pre-developed acquisition tool, analyzes the ng nx access logs to obtain corresponding ng nx access records, and sends the obtained ng nx access records to the Kafka cluster 201 for storage; the Nginx access record comprises call relation topology data and corresponding operation quality data from an access initiator to an access target, wherein the access initiator is a server or a reverse proxy back-end system instance where a user terminal and an Nginx instance are located, and the access target is a server or a reverse proxy back-end system instance where the Nginx instance is located;
the server 20 comprises the Kafka cluster 201 and an analysis device 202;
the analysis device 202 is configured to determine an nginix log analysis requirement, where the nginix log analysis requirement includes a unique identifier of a target service and information of an analysis time period; acquiring an Nginx access record set corresponding to the analysis time period from the Kafka cluster 201 according to the unique identifier of the target service; analyzing the call relationship topology data in the Nginx access record set to obtain association relationships among the user ends, the servers where the Nginx instances are located and the reverse proxy back-end system instances, and calculating preset access quality indexes among the association nodes according to the operation quality data in the Nginx access record set to obtain corresponding access quality data; and constructing a corresponding topological graph according to the association relation and the access quality data, wherein the flow direction of the request data is used as the direction of each side of the topological graph, and the attribute of the corresponding side is set according to the access quality data among the association nodes.
In one possible implementation, the analysis device 202 is specifically configured to:
and reading the Nginx access record set by using the Logstash.
In one possible implementation, the analysis device 202 is specifically configured to:
determining each node address from an access initiator to an access target of all the Nginx access records in the Nginx access record set;
and matching the access target address of each Nginx access record with the node addresses of the rest Nginx access records in the Nginx access record set, and determining all nodes and corresponding access relations of each complete multi-level call chain according to the obtained matching result.
In one possible implementation, the analysis device 202 is further specifically configured to:
setting corresponding chain identification information according to the access target address of each complete multi-stage call chain, wherein the chain identification information comprises a service name, a project name and/or an access target unique identifier which correspond to the multi-stage call chain;
classifying call relation topology data, corresponding operation quality data, association relation and access quality data among nodes in a corresponding multi-level call chain into Nginx instance data, reverse proxy back-end system instance data and project data;
Setting a corresponding Nginx instance table, a back-end system instance table and a project function table in a database according to the chain identification information, storing the Nginx instance data in the corresponding Nginx instance table, storing the reverse proxy back-end system instance data in the corresponding back-end system instance table, storing the project data in the corresponding project function table, and adding corresponding labels to each of the Nginx instance table, the back-end system instance table and the project function table.
In one possible implementation, the analysis device 202 is further specifically configured to:
and synchronizing each Nginx instance table, the back-end system instance table and the item function table into an ES index according to the label.
In one implementation manner, the preset access quality index includes a total access amount, an error amount of the status code exceeding a corresponding preset threshold range, a distribution of the error amount on each URL, a log amount of the request time exceeding a corresponding item time threshold, an input flow, an output flow, and/or an access amount suspected of having a security risk.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an nmginx log analysis system when an acquisition end 10 according to an alternative embodiment of the present invention acquires an nmginx access log based on a preset logstack configuration file and parses the nmginx access log to obtain a corresponding nmginx access record.
As shown in fig. 6, a logstack and an external Redis buffer configured with the preset logstack configuration file are deployed in each machine room (i.e., the server where the nginnx instance is located), where the logstack and the external Redis buffer form the collection end 10 of the server where the nginnx instance is located. The server 20 is provided with a DB library for storing the nginnx instance table, the backend system instance table and the item function table, and an ES library for synchronizing each of the nginnx instance table, the backend system instance table and the item function table into an ES index.
In one possible way, the acquisition end 10 is specifically configured to perform, by logstack:
analyzing the collected Nginx access log to obtain a target field of an Nginx access record, deleting a record of a local request, accessing a target address and a record of abnormal performance data, and designating the target field of a type to be converted;
adding the service name of the current log, the identifier of the server where the deployed Nginx instance is located and the IP address of the Nginx instance as special fields to the corresponding Nginx access record;
respectively adding the IP address and the access target address of the current Nginx instance into an Nginx service instance set and a back-end system instance set of an external Redis cache;
Converting the user source IP to obtain the Chinese names, the longitudes and the latitudes of the cities, the provinces and the countries to which the user source IP belongs, and adding the obtained result as a new field into the corresponding Nginx access record.
In this embodiment, the logstack configured with the preset logstack configuration file is deployed in the server where each ngnx instance is located to collect and parse the ngnx access log. The functions including the acquisition end 10 described in this embodiment may be implemented by editing a configuration file of logstack until the processed result is output to the Kafka cluster 201. By adjusting the Logstar configuration file, the change of the analysis requirement of the Nginx logs can be responded quickly, and service global access topology data can be automatically generated and stored from massive Nginx access logs without recompilation and release, so that the method has the advantages of small development workload and convenience in deployment and adjustment.
The user source IP is converted to obtain the Chinese names, the longitudes and the latitudes of the cities, the provinces and the countries to which the user source IP belongs, and the data obtained by the conversion can reflect the access quality of users in different areas, so that the conditions of whether an operator network is fully opened, whether the bandwidth is enough, whether CDN needs to be started or not and the like can be further checked, a data basis is provided for improving the access quality, and preparation is made for the development of services in advance, so that user complaints caused by poor deployment of a machine room are avoided.
Because the logstack cannot output the domain name of the chinese area and the output field thereof is not controllable, as a way capable of being implemented, the logstack-filter-geoip can be developed for the defect for the second time, so as to implement the function of converting the user source IP by the logstack of the acquisition end 10 in this embodiment.
Specifically, a file with a path name of geoip. Rb under the logstack directory may be modified, and the content of the main modification may include:
according to the characteristic that users of Internet application often have regional aggregation and repeated access, constructing and starting the corresponding relation between the LRU cache user IP and the region, and multiplexing to reduce the time consumption of searching an IP library;
in connection with the data analysis requirements, specifying a required regional field name, which may include country, province (regional name), city and/or latitude and longitude;
and modifying the filter method, and utilizing a GeoLite2-City.mmdb supporting IPv6 address analysis library above the corresponding code block to obtain a corresponding city object according to user IP analysis, and obtaining the required Chinese value of the country and the region (province), and fields such as longitude and latitude, postal code and the like from the object.
In one possible implementation, the collecting terminal 10 is further specifically configured to:
code is added in the logstack configuration file to check whether malicious access exists in the ngginx access log, whether SQL injection attack occurs, whether the length and the content of the request parameters are scanned by an attack tool or a security tool or not, and whether the problems exist in the request parameters or not, and corresponding fields are added in corresponding ngginx access records according to the obtained checking result.
As a specific implementation manner, to implement the operation of the acquisition end 10 in the foregoing embodiment, the logstack in the acquisition end 10 specifically performs:
according to log_format configuration of Nginx, adopting a csv plugin of Logflash to analyze to obtain values of fields of a user source, a back-end system instance address of a reverse proxy, a request and corresponding performance, input and output flow, a state code, a URL of the request, a request parameter, a forwarding service address and a request length;
calling drop { } to delete the record of the local request, the back-end system instance address of the reverse proxy and the record of the abnormal performance data; designating fields (including performance, traffic and date and time) of a type to be converted by using a mute plug-in, and deleting unnecessary fields;
adding the service name of the current log, the IP of the deployment machine room and the Nginx instance as a special field to the log record;
respectively adding the IP of the current Nginx instance and the back-end system instance address of the reverse proxy into an Nginx service instance set and a back-end system instance set of an external Redis cache by using a log stack-filter-ruby plug-in;
starting a geoip plug-in, designating a storage path of an IP library GeoLite2-City. Mmdb to be used in the local, converting a user source IP through a log-hash-gel of secondary development, and adding the Chinese name, longitude and latitude of the city, province and country to which the IP belongs as a new field into an output record;
And adding codes into the Log-hash configuration file by using the Log-filter-ruby plugin to check whether malicious access exists in the request URL and the request parameters of the log.
Wherein, the content and the code of the inspection can be adjusted at any time according to the actual needs. If any problem is found, two fields are added in the current log record to respectively describe the type and description of the found problem, and a tag is added to distinguish the found problem from other logs without safety risks.
Code is added in the logstack configuration file to check whether the ngginx access log has malicious access, whether an SQL injection attack occurs, whether the length of the parameters and the content of the request are problematic when scanned by an attack tool or a security tool, and specific checking details may include:
(1) Checking whether SQL injection attack occurs: checking whether partial content of SQL sentences, SQL functions, names or version numbers of databases and tables built in database services appear in a log with a response state code of 200 or 5XX, and if so, adding field descriptions in the current record to suffer SQL injection attack and corresponding back-end system response information;
(2) Checking whether there is malicious access behaviour: checking whether the request URL and parameters thereof contain at least one file in a mode of a file, HTTP relative or absolute path, if so, further checking whether the directory name, the file name and the file extension contained in the request URL and parameters thereof are in a batch of sensitive paths and file lists, and recording checking results, request paths and parameters; and checking whether the request URL and parameters thereof transmit content assisting in executing a certain code, judging whether the corresponding access is to execute reflection by using the operation platform, load any class, code or attempt to load the code which can be executed by a certain shell according to the checking result, and executing the code;
(3) Check if it is being scanned by an attack tool or security tool: determining whether the business system is being scanned by an abnormal user's tool by checking whether a portion of the contents of the current log agent field contains one of a set of suspected problem tools;
(4) Checking the length and content of the request parameters: checking whether the length of the request parameter exceeds a corresponding preset threshold value, if so, indicating that the corresponding request is possibly illegal or belongs to an abnormal request; if the content contains hexadecimal characters, indicating that the Nginx may have problems with the configuration of SSL, character codec, etc., specific problems may be determined accordingly when some hexadecimal characters are detected;
(5) The output plug-in of the logstack writes the result data of each item into the respective subject of the Kafka cluster 201, extracts the user IP, the requested URL, the parameters, the occurrence time and the response status code for each log with suspected security according to the tag of whether the log has security risk, writes the extracted data into the database storing the information, and is convenient for alarming in time.
After the above operation of the collection terminal 10, each nginnx access record already contains the call relationship from the operation initiator (nginnx, the backend system instance, or the area where the user is located) to the access target and the running quality information thereof. After the access logs of all the ng nx instances are transferred to the Kafka cluster 201, the data actually includes ng nx of all the services accessed by each user area, topology data called by the related back-end systems, and values of indexes such as access quantity, performance, success rate and the like of each side in the topology in a certain time. The server 20 further extracts topology components and related operation quality information according to the topology update requirement by using the logstack, and stores the topology components and the related operation quality information for use when displaying the topology, alarming the problems to be concerned and playing back the topology in real time.
In another implementation manner, the collection terminal 10 is specifically configured to collect and parse the nginnx access log through a pre-developed collection tool. For example, the collecting tool can be developed by adopting language such as Go, and then the collecting tool is deployed to a server where each nmginx instance is located, and then the collecting tool calculates topology composition and operation quality data according to the access log timing of each nmginx instance, and then the topology composition and operation quality data are respectively sent to the server 20 for processing.
The method for collecting and analyzing the Nginx access log through the pre-developed collecting tool has the advantages that: the constraint of Logstar can be broken through to realize more demands; the tool may open HTTP services to support control operations needed to be performed on nmginx at runtime as per real-time instructions of the server 20, such as restricting excessive access according to user configuration, checking log content corresponding to more security risk types and more configuration errors, modifying nmginx configuration and reloading into effect, etc.; the logstack is no longer needed, and deployment can be simplified. The variation of this scheme is mainly in the part of the nmginx log processing, including:
reading a configuration file of the Nginx on a current server, obtaining log_format of an access log of the Nginx and a back-end instance address (possibly still being the Nginx instance address) of a reverse proxy, obtaining a service domain name and a machine room to which the service domain name belongs according to the configuration file of a tool, reading an IP of the Nginx instance, submitting the information to a server 20, and storing historical information such as change content and time of corresponding services for subsequent verification when the server 20 detects deployment changes of each service;
Reading Nginx access log file records in batches, analyzing each field according to log_format, and deleting unnecessary log records according to the same logic;
and obtaining the city, province, country and longitude and latitude of the user source according to the address of the access initiator by utilizing an IP library and an analysis tool provided by GeoIP2 and matching with a locally constructed LRU cache, and adding a result field into a corresponding log record.
The invention also provides topology-based Nginx log analysis equipment, which comprises:
a memory for storing instructions; the instructions are configured to implement the topology-based Nginx log analysis method according to any one of the embodiments above;
and the processor is used for executing the instructions in the memory.
The present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a topology-based nmginx log analysis method as described in any of the embodiments above.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system, apparatus and module may refer to corresponding procedures in the foregoing method embodiments, and specific beneficial effects of the above-described system, apparatus and module may refer to corresponding beneficial effects in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the system embodiments described above are merely illustrative, e.g., the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.
The integrated modules, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (12)

1. The topology-based Nginx log analysis method is characterized by comprising the following steps of:
determining an Nginx log analysis requirement, wherein the Nginx log analysis requirement comprises a unique identification of a target service and information of an analysis time period;
acquiring an Nginx access record set corresponding to the analysis time period from a Kafka cluster according to the unique identifier of the target service; the Kafka cluster is provided with a plurality of Nginx access records, each Nginx access record is obtained by collecting and analyzing an Nginx access log by a server where a corresponding Nginx instance is located based on a preset Logstash configuration file or a pre-developed collecting tool, the Nginx access records comprise call relation topology data and corresponding operation quality data from an access initiator to an access target, the access initiator is a user side, the server where the Nginx instance is located or a reverse proxy back-end system instance, and the access target is the server where the Nginx instance is located or the reverse proxy back-end system instance;
analyzing the call relationship topology data in the Nginx access record set to obtain association relationships among the user ends, the servers where the Nginx instances are located and the reverse proxy back-end system instances, and calculating preset access quality indexes among the association nodes according to the operation quality data in the Nginx access record set to obtain corresponding access quality data;
And constructing a corresponding topological graph according to the association relation and the access quality data, wherein the flow direction of the request data is used as the direction of each side of the topological graph, and the attribute of the corresponding side is set according to the access quality data among the association nodes.
2. The topology-based nmginx log analysis method of claim 1, wherein said obtaining an nmginx access record set corresponding to the analysis period from a Kafka cluster according to the unique identification of the target service comprises:
and reading the Nginx access record set by using the Logstash.
3. The topology-based nmginx log analysis method of claim 1, wherein said analyzing each of said call relationship topology data in said set of nmginx access records comprises:
determining each node address from an access initiator to an access target of all the Nginx access records in the Nginx access record set;
and matching the access target address of each Nginx access record with the node addresses of the rest Nginx access records in the Nginx access record set, and determining all nodes and corresponding access relations of each complete multi-level call chain according to the obtained matching result.
4. A topology based nmginx log analysis method as recited in claim 3, further comprising:
setting corresponding chain identification information according to the access target address of each complete multi-stage call chain, wherein the chain identification information comprises a service name, a project name and/or an access target unique identifier which correspond to the multi-stage call chain;
classifying call relation topology data, corresponding operation quality data, association relation and access quality data among nodes in a corresponding multi-level call chain into Nginx instance data, reverse proxy back-end system instance data and project data;
setting a corresponding Nginx instance table, a back-end system instance table and a project function table in a database according to the chain identification information, storing the Nginx instance data in the corresponding Nginx instance table, storing the reverse proxy back-end system instance data in the corresponding back-end system instance table, storing the project data in the corresponding project function table, and adding corresponding labels to each of the Nginx instance table, the back-end system instance table and the project function table.
5. The topology based nmginx log analysis method of claim 4, further comprising:
And synchronizing each Nginx instance table, the back-end system instance table and the item function table into an ES index according to the label.
6. The topology-based nmginx log analysis method of claim 1, wherein the preset access quality indicator includes a total access amount, an error amount of which a status code exceeds a corresponding preset threshold range, a distribution of the error amount over URLs, a number of logs of which a request time exceeds a corresponding project time threshold, an input flow, an output flow, and/or an access amount suspected of having a security risk.
7. A topology-based nmginx log analysis system, comprising:
the requirement determining module is used for determining an Nginx log analysis requirement, wherein the Nginx log analysis requirement comprises a unique identifier of a target service and information of an analysis time period;
the data acquisition module is used for acquiring an Nginx access record set corresponding to the analysis time period from the Kafka cluster according to the unique identifier of the target service; the Kafka cluster is provided with a plurality of Nginx access records, each Nginx access record is obtained by collecting and analyzing an Nginx access log by a server where a corresponding Nginx instance is located based on a preset Logstash configuration file or a pre-developed collecting tool, the Nginx access records comprise call relation topology data and corresponding operation quality data from an access initiator to an access target, the access initiator is a user side, the server where the Nginx instance is located or a reverse proxy back-end system instance, and the access target is the server where the Nginx instance is located or the reverse proxy back-end system instance;
The data analysis module is used for analyzing the call relationship topology data in the Nginx access record set to obtain the association relationship among the user side, the server where the Nginx instance is located and the reverse proxy back-end system instance, and calculating the preset access quality index among the association nodes according to the operation quality data in the Nginx access record set to obtain corresponding access quality data;
the topology diagram construction module is used for constructing a corresponding topology diagram according to the association relation and the access quality data, wherein the direction of each side of the topology diagram is taken as the flow direction of the request data, and the attribute of the corresponding side is set according to the access quality data among the association nodes.
8. A topology-based nmginx log analysis system, comprising:
the method comprises the steps that the acquisition end is deployed at the server where each Nginx instance is located, the acquisition end acquires an Nginx access log based on a preset Logstar configuration file or a pre-developed acquisition tool, analyzes the Nginx access log to obtain a corresponding Nginx access record, and sends the obtained Nginx access record to a Kafka cluster for storage; the Nginx access record comprises call relation topology data and corresponding operation quality data from an access initiator to an access target, wherein the access initiator is a server or a reverse proxy back-end system instance where a user terminal and an Nginx instance are located, and the access target is a server or a reverse proxy back-end system instance where the Nginx instance is located;
The server comprises the Kafka cluster and an analysis device;
the analysis device is used for determining an Nginx log analysis requirement, wherein the Nginx log analysis requirement comprises a unique identification of a target service and information of an analysis time period; acquiring an Nginx access record set corresponding to the analysis time period from the Kafka cluster according to the unique identifier of the target service; analyzing the call relationship topology data in the Nginx access record set to obtain association relationships among the user ends, the servers where the Nginx instances are located and the reverse proxy back-end system instances, and calculating preset access quality indexes among the association nodes according to the operation quality data in the Nginx access record set to obtain corresponding access quality data; and constructing a corresponding topological graph according to the association relation and the access quality data, wherein the flow direction of the request data is used as the direction of each side of the topological graph, and the attribute of the corresponding side is set according to the access quality data among the association nodes.
9. The topology-based nginnx log analysis system of claim 8, wherein the collection terminal includes a logstack configured with the preset logstack configuration file, and the collection terminal is specifically configured to perform:
Analyzing the collected Nginx access log to obtain a target field of an Nginx access record, deleting a record of a local request, accessing a target address and a record of abnormal performance data, and designating the target field of a type to be converted;
adding the service name of the current log, the identifier of the server where the deployed Nginx instance is located and the IP address of the Nginx instance as special fields to the corresponding Nginx access record;
respectively adding the IP address and the access target address of the current Nginx instance into an Nginx service instance set and a back-end system instance set of an external Redis cache;
converting the user source IP to obtain the Chinese names, the longitudes and the latitudes of the cities, the provinces and the countries to which the user source IP belongs, and adding the obtained result as a new field into the corresponding Nginx access record.
10. The topology-based nmginx log analysis system of claim 9, wherein the collection end is further configured to:
code is added in the logstack configuration file to check whether malicious access exists in the ngginx access log, whether SQL injection attack occurs, whether the length and the content of the request parameters are scanned by an attack tool or a security tool or not, and whether the problems exist in the request parameters or not, and corresponding fields are added in corresponding ngginx access records according to the obtained checking result.
11. A topology-based nmginx log analysis device, comprising:
a memory for storing instructions; wherein the instructions are for implementing a topology-based nmginx log analysis method as claimed in any one of claims 1-6;
and the processor is used for executing the instructions in the memory.
12. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the topology based Nginx log analysis method according to any of the claims 1-6.
CN202210963046.8A 2022-08-11 2022-08-11 Topology-based Nginx log analysis method, system and equipment Active CN115333966B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210963046.8A CN115333966B (en) 2022-08-11 2022-08-11 Topology-based Nginx log analysis method, system and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210963046.8A CN115333966B (en) 2022-08-11 2022-08-11 Topology-based Nginx log analysis method, system and equipment

Publications (2)

Publication Number Publication Date
CN115333966A CN115333966A (en) 2022-11-11
CN115333966B true CN115333966B (en) 2023-05-12

Family

ID=83923892

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210963046.8A Active CN115333966B (en) 2022-08-11 2022-08-11 Topology-based Nginx log analysis method, system and equipment

Country Status (1)

Country Link
CN (1) CN115333966B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115994172B (en) * 2022-12-09 2024-05-14 华青融天(北京)软件股份有限公司 Method, device, equipment and medium for determining service access relation
CN116915463B (en) * 2023-07-17 2024-03-08 北京优特捷信息技术有限公司 Call chain data security analysis method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017107018A1 (en) * 2015-12-21 2017-06-29 华为技术有限公司 Method, device, and system for discovering the relationship of applied topology
CN111796935A (en) * 2020-06-29 2020-10-20 中国工商银行股份有限公司 Consumption instance distribution method and system for calling log information
CN112783720A (en) * 2021-01-05 2021-05-11 广州品唯软件有限公司 Topological structure diagram generation method and device, computer equipment and display system
CN114297231A (en) * 2021-12-29 2022-04-08 上海梦鱼信息科技有限公司 Method for intelligently collecting logs and data and quickly forming relational topology

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017107018A1 (en) * 2015-12-21 2017-06-29 华为技术有限公司 Method, device, and system for discovering the relationship of applied topology
CN111796935A (en) * 2020-06-29 2020-10-20 中国工商银行股份有限公司 Consumption instance distribution method and system for calling log information
CN112783720A (en) * 2021-01-05 2021-05-11 广州品唯软件有限公司 Topological structure diagram generation method and device, computer equipment and display system
CN114297231A (en) * 2021-12-29 2022-04-08 上海梦鱼信息科技有限公司 Method for intelligently collecting logs and data and quickly forming relational topology

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Web日志挖掘中网站拓扑结构获取技术的研究;牛晓晨;;电脑知识与技术(01);全文 *
基于移动代理的网络拓扑发现技术的研究;邓勇;王汝传;黄海平;徐喜春;;计算机科学(10);全文 *

Also Published As

Publication number Publication date
CN115333966A (en) 2022-11-11

Similar Documents

Publication Publication Date Title
CN115333966B (en) Topology-based Nginx log analysis method, system and equipment
US11196756B2 (en) Identifying notable events based on execution of correlation searches
US9565076B2 (en) Distributed network traffic data collection and storage
US8676965B2 (en) Tracking high-level network transactions
CN107273267A (en) Log analysis method based on elastic components
CN107229556A (en) Log Analysis System based on elastic components
US20100153431A1 (en) Alert triggered statistics collections
CN108509326B (en) Service state statistical method and system based on nginx log
CN112039701B (en) Interface call monitoring method, device, equipment and storage medium
CN108268485A (en) A kind of daily record real-time analysis method and system
JP4627539B2 (en) Load test system, load test data creation method, and program thereof
CN106326280B (en) Data processing method, device and system
CN107635003A (en) The management method of system journal, apparatus and system
CN111770022B (en) Capacity expansion method, system, equipment and computer storage medium based on link monitoring
CN114791846A (en) Method for realizing observability aiming at cloud native chaos engineering experiment
CN112714118B (en) Network traffic detection method and device
CN111371570A (en) Fault detection method and device for NFV network
GB2416091A (en) High Capacity Fault Correlation
CN114567501B (en) Automatic asset identification method, system and equipment based on label scoring
CN113037551B (en) Quick identification and positioning method for sensitive-related services based on traffic slice
CN109408479A (en) Daily record data adding method, system, computer equipment and storage medium
CN110300193B (en) Method and device for acquiring entity domain name
CN101465764B (en) Inspection method for internet service business place information safety management
WO2023093527A1 (en) Alarm association rule generation method and apparatus, and electronic device and storage medium
CN116820874A (en) Enterprise-level big data component and method for monitoring and alarming application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant