CN114491558A

CN114491558A - Intrusion studying and judging method and system based on Web log

Info

Publication number: CN114491558A
Application number: CN202210099336.2A
Authority: CN
Inventors: 陶然
Original assignee: Guangdong Yunzhi Anxin Technology Co ltd
Current assignee: Guangdong Yunzhi Anxin Technology Co ltd
Priority date: 2022-01-27
Filing date: 2022-01-27
Publication date: 2022-05-13

Abstract

The invention discloses an intrusion studying and judging method and system based on web logs, wherein the studying and judging method comprises the following steps: cutting the original log into fields according to the default separators and the default quotients; automatically identifying field formats, quickly matching the field formats, analyzing and storing in a warehouse; reading logs in a warehouse in batches, and detecting log matching rules; updating the log and writing back to the library; aggregating the logs to generate a web attack event, and judging whether the invader attacks successfully or not; and generating and submitting a protection and processing report according to the web attack time, the studying and judging result and the manually confirmed influence range of the intrusion. The invention realizes the invasion study and judgment of the IP of the attacker by cutting, uniformly analyzing, detecting the rule matching and aggregating the web logs, adopts the mode of combining automatic detection and manual confirmation, has the advantages of high efficiency, high result accuracy, high log analyzing strength and convenient secondary analysis and use, and greatly improves the efficiency of web analysis.

Description

Intrusion studying and judging method and system based on Web log

Technical Field

The invention relates to the technical field of internet, in particular to an intrusion studying and judging method and system based on Web logs.

Background

As one of the most important applications on the internet, the Web provides a convenient document publishing and acquiring mechanism, and gradually becomes an aggregation place of various information resources. The richness and diversity of information entices hackers, resulting in more and more frequent attacks on Web applications, especially on Web servers. The attack is not only various, but also huge in harm, sometimes causes information leakage of company users, and even causes server paralysis.

When a user accesses the Web service through the client, the Web server generates a Web access log according to a corresponding request of the user. The Web access log records various kinds of original information such as a processing request received by the Web server and a runtime error. User behavior such as whether attack occurs, most frequently accessed pages, address location distribution, access device distribution, etc. can be analyzed and traced through the Web log. The safety analysis of the Web log can help people to locate attackers and restore attack paths, find out and repair the safety loopholes existing in the website.

The Web services need to be built on different Web servers, common Web servers include Nginx, Apache, Tomcat, IIS and the like, information carried by logs of different servers is similar and is behavior records of user access, such as various access times, client IP, request methods, request URLs, protocol versions, user agents, Cookies, reference addresses, server addresses, response codes, request byte numbers, response byte numbers and the like. However, log formats of different servers are not uniform, web log analysis in the prior art mainly depends on manual analysis and then analysis is performed one by one, and therefore the workload of analysts is huge, the analysis efficiency is low, and the intrusion protection and processing are affected.

In view of the above, there is an urgent need for a method and system for studying and judging intrusion based on Web logs to solve the problem of low analysis efficiency caused by non-uniform Web log type formats.

Disclosure of Invention

The invention aims to solve the technical problem that the web log analysis efficiency is low due to the fact that the web log types provided by the existing different types of web servers are not uniform in format.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

an intrusion studying and judging method based on Web logs comprises the following steps:

cutting the original log according to the default separators and the default quotients, and cutting the original log into fields;

automatically identifying field formats, quickly matching the field formats, analyzing, uniformly processing and storing the field formats into a web log library;

reading logs in a warehouse in batches, and detecting log matching rules; the log matching rule detection comprises: judging whether the IP of a visitor of the web log is malicious according to threat intelligence rules, and adding a malicious label to the web log with the malicious IP of the visitor; judging the IP credit of the visitor IP of the web log according to the IP credit library; judging the access equipment of the visitor IP of the web log according to the keyword rule; judging the attack type of the web log according to the safety detection rule; detecting whether a login behavior exists in the web log according to a login detection rule;

updating the log after completing the matching rule detection, and writing the log back to the web log library;

aggregating the web logs with malicious labels according to the attacker IP and account information of the web logs to generate a web attack event, and judging whether the attacker successfully attacks or not according to the attack, login and download behaviors of the IP;

and generating and submitting a protection and processing report according to the generated web attack time and the studying and judging result and the influence range of the intrusion confirmed manually.

In the foregoing solution, preferably, the fast matching of the field format further includes:

and aiming at the first log processed in the same batch, providing a recommended field according to the identified field format, and sending the recommended field to an administrator for the administrator to manually confirm.

In the above solution, preferably, the IP information in the IP reputation base includes an IP source and an IP threat type.

In the above scheme, preferably, the IP source is an inbound home subscriber, an inbound IDC, an outbound IDC, or an outbound home subscriber; the IP threat type is that the IP history is attacked, and the attack type comprises junk mail or web attack.

In the above scheme, preferably, the access device of the user is a crawler access device, a mobile terminal device access device or a PC device access device.

In the above scheme, preferably, the attack type of the log is SQL injection attack, XSS, PHP injection, or JAVA vulnerability injection.

In the foregoing solution, preferably, the cutting the original log according to the default delimiter and the default reference character, before cutting the original log into the fields, further includes:

setting a default delimiter and a default quotter; the default separators are commas and spaces, and the default referents are double quotation marks, brackets and single quotation marks.

A Web log based intrusion lap-judging system comprising:

the analysis module is used for cutting the original log into fields according to the default separators and the default quotients, automatically identifying the formats of the fields, quickly matching the formats of the fields, analyzing, uniformly processing and warehousing the fields;

the log matching rule engine is used for carrying out rule matching judgment on the logs which are analyzed into the database according to the rules;

the attack aggregation engine is used for aggregating the web logs which are subjected to rule matching judgment according to the attacker IP of the web logs, generating web log events and judging whether the attack is successful or not;

and the storage module is used for storing the log and the IP reputation base.

In the foregoing solution, preferably, the log matching rule engine includes:

a threat information rule judging module for analyzing the log in storage according to the threat information rule judging module

The IP credit judging module is used for judging the credit of the visitor IP of the web log according to the IP credit library;

the keyword rule judging module is used for judging the access equipment of the user through the user agent field of the log according to the keyword rule;

the safety detection rule module is used for judging the attack type of the web log according to the request details of the web log;

and the login detection rule module is used for detecting whether the web log has login behaviors.

Compared with the prior art, the intrusion studying and judging method based on the Web logs provided by the invention cuts the logs through the separators and the quoting symbols and automatically identifies the formats of the cut fields, so that the logs are uniformly analyzed; carrying out threat intelligence rule judgment, IP credit library judgment, keyword rule judgment, safety detection rule judgment and login rule judgment on the analyzed and warehoused logs to realize rule matching detection of the logs; and aggregating the logs subjected to the rule matching detection according to the IP and the account information of the attacker to generate a web attack event, thereby realizing the invasion study and judgment of the IP of the attacker. The intrusion studying and judging method based on the Web logs adopts a mode of combining automatic detection and manual confirmation, has the advantages of high efficiency, high result accuracy, high log analysis strength and convenience for secondary analysis and use, and greatly improves the efficiency of Web analysis.

Drawings

FIG. 1 is a schematic structural diagram of an intrusion studying and judging system based on Web logs according to the present invention;

FIG. 2 is a schematic diagram illustrating a method for determining intrusion based on Web logs according to the present invention;

FIG. 3 is a flowchart of an intrusion studying and judging method based on Web logs according to the present invention.

Detailed Description

The invention provides an intrusion studying and judging method based on Web logs, which realizes the intrusion studying and judging of an attacker IP by cutting, uniformly analyzing, detecting rule matching and aggregating the Web logs and effectively solves the problems of low efficiency and large workload of manual studying and judging of the Web logs. The invention is described in detail below with reference to the drawings and the detailed description.

As shown in fig. 1, the intrusion studying and judging system based on the web log provided by the invention comprises a parsing module 1, a rule matching engine 2, an aggregation engine 3 and a storage module 4. The analysis module 13 is configured to cut the original log into fields according to the default segmenter and the default referrer, automatically identify the format of the fields, perform format fast matching on the fields, and perform analysis, unified processing and storage. The log matching rule engine 2 is used for performing rule matching judgment on the log analyzed into the library according to the rule. And the attack aggregation engine 3 is used for aggregating the web logs which are subjected to rule matching judgment according to the attacker IP of the web logs, generating web log events and judging whether the attack is successful or not. The storage module 4 is used for storing the log and the IP reputation base.

The log matching rule engine 2 comprises a threat intelligence rule determination module 21, an IP reputation determination module 22, a keyword rule determination module 23, a security detection rule module 24 and a login detection rule module 25. The threat intelligence rule decision module 21 is used for analyzing the log in the warehouse according to the threat intelligence rule decision module. The IP reputation judging module 22 is used for making a reputation judgment on the visitor IP of the web log according to the IP reputation base. The keyword rule determining module 23 is used for determining the access device of the user through the user agent field of the log according to the keyword rule. The security detection rule module 24 is used for determining the attack type of the web log according to the request details of the web log. The login detection rule module 25 is used for detecting whether the web log has login behavior.

As shown in fig. 2 and fig. 3, the present invention provides an intrusion studying and judging method based on a Web log, which includes the following steps:

s1, cutting the original log according to the default separators and the default quotients, and cutting the original log into fields;

first, an administrator is required to set default separators and default references. Default separators are commas and spaces, and default references are double quotation marks, parentheses and uniquotation marks.

Then, the analysis module 1 analyzes the original log by using a CSV-like mode according to the default separators and the default quotients, and cuts the original log into fields.

S2, automatically identifying the field format, quickly matching the field format, analyzing, uniformly processing and storing the field format in a web log library;

the parsing module 1 automatically recognizes field formats such as IP, time, request contents, response code.

And giving a recommended field according to the identified field format for the first log processed in the same batch, and sending the recommended field to an administrator for the administrator to manually confirm. The recommendation fields are shown in table 1.

TABLE 1

Type of field	Field value	Reference attribute
			IP	Legal IP	Client IP, server IP
Positive integer	[100-600)	Response code
			Positive integer	Less than 100, greater than 600	Number of response/request bytes
Time	Time	Request time
			Decimal fraction	Decimal fraction	It is time consuming to request
Character string	URL	Request URL, response address
			Character string	Mozilla/5.0*	User agent
Character string	GET, PUT, etc	Requesting content

The analysis module 1 sends the web logs after the analysis and the unified processing to the web log library in the storage module 4 for storage.

S3, reading the logs put in storage in batches, and detecting the log matching rules;

after the Web logs are stored in a warehouse, the log matching rule engine 2 reads the Web logs in batches and detects the matching rules of the Web logs.

Specifically, the matching rule detection comprises threat intelligence rule judgment, IP credit judgment, keyword rule judgment, safety detection rules and login detection rules.

The threat intelligence rule judging module 21 judges whether the visitor IP of the web log is malicious or not according to the threat intelligence rule; and after judging that the IP of the visitor of the web log is malicious, adding a malicious tag in the web log.

The IP reputation judging module 22 judges the IP reputation of the visitor IP of the web log according to the IP reputation base;

the IP reputation base is stored in the storage module 4. The IP information in the IP reputation base includes an IP source and an IP threat type. The IP source is domestic home subscriber, domestic IDC, external IDC or overseas home subscriber; the IP threat type is that the IP has been attacked historically, including spam or web attacks.

The keyword rule determination module 23 determines an access device of the visitor IP of the web log through a user agent (user) field of the log according to the keyword rule. Because different devices have different userAgents, for example, different userAgents exist in Google browsers and Firefox browsers, the userAgents of android mobile phones and apple mobile phones are different, some malicious software (such as crawlers and the like) also have unique userAgents, whether the request is malicious or not can be preliminarily judged through the access device, and therefore assistance is provided for manual research and judgment. The types of the access equipment of the user comprise crawler access, mobile terminal equipment access and PC equipment access.

The security detection rule module 24 detects the request details of the web log according to the security detection rule, and determines the attack type of the web log. The request details are details of the request, including a request URL, request parameters, and the like. And the malicious attack behavior can be judged according to the request parameters and the request URL. The attack type of the log is SQL injection attack, XSS, PHP injection or JAVA vulnerability injection.

The login detection rule module 25 detects whether the web log has a login behavior according to the login detection rule. Since the log is a log-in behavior, and if the visitor IP of the web log was previously attacked and partially privileged, then the host may be compromised and an attacker may have logged into the system. Log entry behavior can also be used to determine subsequent brute force attacks.

The matching rule detection can be executed synchronously or in a certain sequence.

S4, updating the log after completing the matching rule detection, and writing back the log to the web log library;

s5, according to the attacker IP and the account information of the web log, aggregating the web log with the malicious label to generate a web attack event, and according to the attack, login and download behaviors of the IP, judging whether the attack of the intruder is successful;

and the aggregation engine 3 aggregates according to the attacker IP and the account information of the web log to generate a web attack event, and judges whether the attack of the intruder is successful or not according to subsequent behaviors of the IP, such as attack, login and download behaviors and the like.

The aggregation of the IP, the time and the behavior is to analyze the logs of the same IP at the same time. If the same visitor IP has both the attack behavior and the login behavior and also has the downloading behavior, the success of the visitor IP attack, the host computer missing and the file stealing can be judged.

Aiming at different attack types, if the same visitor IP has a large number of login behaviors, login fails first, and then login succeeds, the attack type is brute force cracking, and the password is guessed by the attacker, so that the host is lost.

If the same visitor IP has a large number of access behaviors, the access is successful firstly, and then the access is failed, the attack type is DDOS attack, and the attacker attacks successfully to influence normal service.

And S6, generating and submitting a protection and processing report according to the generated web attack time and the research and judgment result and the influence range of the intrusion which is manually confirmed.

The aggregation engine 3 automatically generates a research result of whether the invasion of the visitor IP is successful, and the administrator manually confirms the invasion range, protects and processes the invasion range and submits a protection and processing report according to the web attack time and the research result generated by the aggregation engine 3.

Compared with the prior art, the intrusion studying and judging method and system based on the Web logs have the following advantages:

1. the logs are cut through the separators and the quotients, and formats of the cut fields are automatically identified, so that the logs are uniformly analyzed, the logs of different formats are uniformly processed, and the processing efficiency of the logs is greatly improved; the processed log analysis granularity is high, so that secondary analysis and use of the web log by an administrator are facilitated;

2. by adopting a mode of combining automatic detection and manual confirmation, automatically carrying out invasion study and judgment on the web logs through rule matching detection and log aggregation, and manually confirming the invasion influence range, further confirming the study and judgment result and fixing the study and judgment evidence, so that the accuracy of invasion study and judgment is greatly improved;

3. based on multi-row record calculation, the method has higher accuracy by combining the IP credit library and subsequent attack, login and download behaviors of the same visitor IP.

The present invention is not limited to the above-mentioned preferred embodiments, and any structural changes made under the teaching of the present invention shall fall within the scope of the present invention, which is similar or similar to the technical solutions of the present invention.

Claims

1. An intrusion studying and judging method based on Web logs is characterized by comprising the following steps:

aggregating web logs with malicious labels according to attacker IP and account information of the web logs to generate a web attack event, and judging whether the attack of the intruder is successful or not according to the attack, login and download behaviors of the same IP in the same time period;

2. The method for judging intrusion based on Web log according to claim 1, wherein the fast matching of the field format further comprises:

and aiming at the first log processed in the same batch, providing a recommended field according to the identified field format, and sending the recommended field to an administrator for manual confirmation.

3. The method of claim 1, wherein the IP information in the IP reputation base comprises an IP source and an IP threat type.

4. The method for studying and judging invasion based on Web logs according to claim 3, wherein the IP source is an inbound homeuser, an inbound IDC, an outbound IDC or an outbound homeuser; the IP threat type is that the IP history is attacked, and the attack type comprises junk mail or web attack.

5. The method for studying and judging intrusion based on the Web log as claimed in claim 1, wherein the access device of the user is a crawler access, a mobile terminal device access or a PC device access.

6. The Web log-based intrusion study method according to claim 1, wherein the attack type of the log is SQL injection attack, XSS, PHP injection, or JAVA vulnerability injection.

7. The method for studying invasion of Web log according to claim 1, wherein the cutting the original log according to the default delimiter and the default quotter, before the cutting the original log into the fields, further comprises:

8. An intrusion study and judgment system based on Web logs is characterized by comprising:

and the storage module is used for storing the log and the IP reputation base.

9. The Web log based intrusion study system of claim 8, wherein the log matching rules engine comprises:

a threat information rule judging module used for analyzing the log in the warehouse according to the threat information rule judging module

and the login detection rule module is used for detecting whether the web log has a login behavior.