CN108616498A - A kind of web access exceptions detection method and device - Google Patents

A kind of web access exceptions detection method and device Download PDF

Info

Publication number
CN108616498A
CN108616498A CN201810158886.0A CN201810158886A CN108616498A CN 108616498 A CN108616498 A CN 108616498A CN 201810158886 A CN201810158886 A CN 201810158886A CN 108616498 A CN108616498 A CN 108616498A
Authority
CN
China
Prior art keywords
url
access log
multiple access
abnormality detection
exception
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810158886.0A
Other languages
Chinese (zh)
Inventor
党向磊
张鸿
徐太忠
惠榛
王金松
陈阳
汪立东
赵路
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Computer Network and Information Security Management Center
Original Assignee
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Computer Network and Information Security Management Center filed Critical National Computer Network and Information Security Management Center
Priority to CN201810158886.0A priority Critical patent/CN108616498A/en
Publication of CN108616498A publication Critical patent/CN108616498A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a kind of web access exceptions detection method and devices.This method includes:According to multiple access logs, training abnormality detection model;Wherein, include normal access log and abnormal access daily record in the multiple access log;Receive the hypertext transfer protocol http request that user equipment is sent;Whether it is exception request by http request described in the abnormality detection Model Identification;If the http request is exception request, the http request is intercepted.The embodiment of the present invention can be applied to web safety and machine learning field, by carrying out machine learning to a large amount of normal samples and exceptional sample, it can be used for the access exception detection of the security fields web and intercept, can solving traditional waf fire walls, the method maintenance cost that access is intercepted is high, flexibility is poor, does not have the technical issues of protective capacities to unknown exception to invading.

Description

A kind of web access exceptions detection method and device
Technical field
The present invention relates to technical field of network security, more particularly to a kind of web (World Wide Web, global wide area Net) access exception detection method and device.
Background technology
With the continuous development of Internet technology, traditional network boundary is fading away, the enterprise of industrial quarters, especially It is Large-Scale Interconnected net company, average to reach up to ten million per purpose any active ues, the daily record of application system each in this way will Up to hundreds of G bytes, even up to the T orders of magnitude.Currently, the malicious access accounting with ash production and black production for representative still occupies height not Under, it is all being sent out in daily each period for Large-Scale Interconnected net company, the especially malicious attack of the industries such as finance, telecommunications It is raw, and attack means are continuing to introduce new, and cause network security problem.
For malicious attack, traditional solution is by waf (Web Application Firewall, web applications Guard system) fire wall performs intrusion detection, based on default rule to http (HyperText Transfer Protocol, Hypertext transfer protocol) it asks to be intercepted or let pass, it is accessed to intercept invasion.Waf fire walls as shown in Figure 1 enter Invade detection operating diagram.Waf fire walls receive http request, first parse header, body information of http request, then Check that User Defined configures, the rule in being configured according to the User Defined determines whether that demand closes the http request, blocks The http request for needing to close is cut, the http request that need not be closed of letting pass.Further, the rule in User Defined configuration It is mainly used for matching the parts such as uri, parameter, ua, cookie, referer in htpp requests, if successful match, Then the http request is intercepted, otherwise is let pass.
But traditional web Intrusion Detection Technique underactions, and there is no detectability to unknown exception.Specifically, Default rule is based on by waf fire walls to intercept invasion access, on the one hand, it is hard regular in face of flexible hacker, It is very easily by-passed, and is difficult to cope with 0day attacks based on previous rule set;On the other hand, when the river rises the boat goes up for Attack Defence, The construction of defender's rule and safeguard that threshold is high and cost is big, it is often more important that traditional human technology is only limitted to defend known prestige The side of body so unknown threat can not be detected, also just let alone is effectively blocked unknown due to not knowing that is unknown threat It threatens.
Invention content
The technical problem to be solved in the present invention is to provide a kind of web access exceptions detection method and devices, existing to solve There are web Intrusion Detection Technique underactions, and there is no the problem of detectability to unknown exception.
In order to solve the above-mentioned technical problem, the present invention solves by the following technical programs:
The present invention provides a kind of global wide area network web access exception detection methods, including:According to multiple access logs, instruction Practice abnormality detection model;Wherein, include normal access log and abnormal access daily record in the multiple access log;It receives and uses The hypertext transfer protocol http request that family equipment is sent;By http request described in the abnormality detection Model Identification whether be Exception request;If the http request is exception request, the http request is intercepted.
Wherein, according to multiple access logs, abnormality detection model is trained, including:Multiple access logs are obtained, and to described Multiple access logs carry out data cleansing processing;After data cleansing processing, in the multiple access log, each system is extracted The characteristic of one Resource Locator URL;It is according to the data cleansing handling result of each access log and each described The characteristic of URL corresponds to for each URL and generates data model objects;By the decision tree of spark, to each described Data model objects are handled, and data model objects train abnormality detection model using treated.
Wherein, data cleansing processing is carried out to the multiple access log, including:It filters out in each access log Static file;Duplicate removal processing is carried out to the URL repeated in the multiple access log;To in the multiple access log URL carry out alphabet size write consistency treatment;Processing is decoded to the URL being encoded in the multiple access log;For Each access log adds label, and the type of the label includes normal sample and exceptional sample;According to pre-prepd Normal ULR and exception URL, to the URL quantity and the corresponding access log of exceptional sample in the corresponding access log of normal sample In ULR quantity carry out it is balanced.
Wherein, the characteristic of each URL is extracted, including:According to preset parameter type, the ginseng in each URL is extracted Number feature;According to preset abnormal keyword, the danger classes feature of each URL is extracted;According to preset characteristic character, extraction Length characteristic, quantative attribute and the type feature of each URL.
Wherein, each data model objects are handled, and data model objects training is different using treated Normal detection model, including:Label in the data model objects is numbered;Label in the data model objects is The label of access log belonging to the corresponding URL of the data model objects;By the characteristic in the data model objects It is converted into single-row feature vector;The single-row feature vector is standardized, standardized feature vector is obtained;Using institute State the number and standardized feature vector of label, the training abnormality detection model.
The present invention provides a kind of web access exceptions detection device, including:Training module is used for according to multiple access logs, Training abnormality detection model;Wherein, include normal access log and abnormal access daily record in the multiple access log;It receives Module, the hypertext transfer protocol http request for receiving user equipment transmission;Identification module, for passing through the abnormal inspection Survey whether http request described in Model Identification is exception request;Blocking module, for judging the http in the identification module In the case that request is exception request, the http request is intercepted.
Wherein, the training module, including:Processing unit, for obtaining multiple access logs, and to the multiple access Daily record carries out data cleansing processing;Extraction unit, for after data cleansing processing, in the multiple access log, extracting The characteristic of each uniform resource position mark URL;Generation unit, at the data cleansing according to each access log Result and the characteristic of each URL are managed, is corresponded to for each URL and generates data model objects;Training unit is used In the decision tree by spark, each data model objects are handled, and use treated data model objects Training abnormality detection model.
Wherein, the processing unit, is further used for:Filter out the static file in each access log;To institute It states the URL repeated in multiple access logs and carries out duplicate removal processing;It is big that letter is carried out to the URL in the multiple access log Small letter consistency treatment;Processing is decoded to the URL being encoded in the multiple access log;For each access log Label is added, the type of the label includes normal sample and exceptional sample;According to pre-prepd normal ULR and exception URL, ULR quantity in URL quantity and the corresponding access log of exceptional sample in the corresponding access log of normal sample is carried out equal Weighing apparatus.
Wherein, the extraction unit, is further used for:According to preset parameter type, the parameter extracted in each URL is special Sign;According to preset abnormal keyword, the danger classes feature of each URL is extracted;According to preset characteristic character, extraction is each Length characteristic, quantative attribute and the type feature of URL.
Wherein, the training unit, is further used for:Label in the data model objects is numbered;It is described Label in data model objects is the label of the access log belonging to the corresponding URL of the data model objects;By the number It is converted into single-row feature vector according to the characteristic in model object;The single-row feature vector is standardized, is obtained To standardized feature vector;Use number and the standardized feature vector of the label, the training abnormality detection model.
The present invention has the beneficial effect that:
The embodiment of the present invention can be applied to web safety and machine learning field, by a large amount of normal samples and exception Sample carries out machine learning, can be used for the access exception detection of the security fields web and intercepts, can solve traditional waf fire prevention Wall accesses invasion that the method maintenance cost intercepted is high, flexibility is poor, asks the technology of the not no protective capacities of unknown exception Topic.
Description of the drawings
Fig. 1 is the intrusion detection operating diagram of existing waf fire walls;
Fig. 2 is the flow chart of web access exception detection methods according to a first embodiment of the present invention;
Fig. 3 is the flow chart of training abnormality detection model according to a second embodiment of the present invention;
Fig. 4 is the step flow chart of data cleansing processing according to a third embodiment of the present invention;
Fig. 5 is the step flow chart of extraction characteristic according to a fourth embodiment of the present invention;
Fig. 6 is the step flow chart of training abnormality detection model according to a fifth embodiment of the present invention;
Fig. 7 is the schematic diagram of training abnormality detection model according to a fifth embodiment of the present invention;
Fig. 8 is the structure chart of web access exception detection devices according to a seventh embodiment of the present invention;
Fig. 9 is the structure chart of training module according to a seventh embodiment of the present invention.
Specific implementation mode
Machine learning method can carry out automation study and training based on mass data, in image, voice, nature Language Processing etc. extensive use.The present invention utilizes machine learning method, training abnormality detection model to pass through abnormality detection mould Type identifies intrusion behavior.The abnormality detection model of the present invention is different from unsupervised web intrusion detections, i.e., the present invention not only makes It is modeled with normal access log to identify normal discharge, but by the way that normal access log and abnormal access daily record to be combined Mode models, Direct Recognition abnormal flow, and flexible in confrontation can identify the unknown prestige continued to introduce new The side of body.
Below in conjunction with attached drawing and embodiment, the present invention will be described in further detail.It should be appreciated that described herein Specific embodiment be only used to explain the present invention, limit the present invention.
Embodiment one
The present embodiment provides a kind of web access exceptions detection methods.Fig. 2 is that web according to a first embodiment of the present invention is visited Ask the flow chart of method for detecting abnormality.
Step S210, according to multiple access logs, training abnormality detection model.
In the present embodiment, include in multiple access log:Normal access log and abnormal access daily record.
Abnormality detection model, it is whether abnormal for detecting http request.If http request is abnormal, it is determined that for invasion row To need to intercept the http request.
Every preset time period obtains an access log, and according to the multiple access logs got, the training exception Detection model ensures the recognition accuracy of the abnormality detection model.
In the present embodiment, access log can be Nginx daily records, and each access log is abnormal for training as one The sample of detection model.Normal access log is positive sample, and abnormal access daily record is negative sample.
Step S220 receives the http request that user equipment is sent.
Whether step S230 is exception request by http request described in the abnormality detection Model Identification.If it is, Execute step S240;If not, thening follow the steps S250.
Step S240 intercepts the http request if the http request is exception request.
Step S250, if the http request is normal request, the http request of letting pass.
The embodiment of the present invention can be applied to web safety and machine learning field, can be used for the access of the security fields web Abnormality detection and interception can solve traditional waf fire walls and access invasion the method maintenance cost height intercepted, flexibility Difference does not have the technical issues of protective capacities to unknown exception.
Embodiment two
The present embodiment is further described through the training process of abnormality detection module.
Fig. 3 is the step flow chart of training abnormality detection model according to a second embodiment of the present invention.
Step S310 obtains multiple access logs, and carries out data cleansing processing to the multiple access log.
Data cleansing is handled, and includes at least following one:Filter out static file, URL duplicate removals, capitalization be converted into small letter, URL decodings, balance URL quantity in positive negative sample at the calibration of positive negative sample.
In the present embodiment, every preset time period obtains an access log, or obtains multiple visits input by user It asks daily record, data request processing is carried out to the multiple access logs got.
In the present embodiment, both included normal access log in the multiple access logs got, and also included abnormal visit Ask daily record.Further, the multiple access logs got can have determined whether abnormal access by analyzing in advance Daily record.Such as:The access log in a period of time in application system is obtained, whether extremely each access log is analyzed, in determination Each access log normally whether after, also have multiple access logs of abnormal log abnormal for training existing normal daily record Detection model.
Step S320 in the multiple access log, extracts each URL (Uniform after data cleansing processing Resource Locator, uniform resource locator) characteristic.
Characteristic includes at least following one:To length characteristic, quantative attribute, type feature, danger classes feature, Parameter attribute.
Step S330, according to the data cleansing handling result of each access log and the feature of each URL Data correspond to for each URL and generate data model objects.
In data model objects, include at least:The mark of URL, the characteristic of the URL, access log belonging to the URL Label.
The label of access log is for reflecting that the journal file is normal access log or abnormal access daily record.
Step S340 is handled each data model objects by the decision tree of spark, and using treated Data model objects train abnormality detection model.
By spark decision tree Primary Construction abnormality detection models, the abnormality detection model accuracy of Primary Construction is relatively low, Subsequently through treated, data model objects train abnormality detection model, the identification for stepping up abnormality detection model accurate Degree.Further, abnormality detection model is decision tree.
In the present embodiment, the unknown abnormality detection technology based on decision tree, by great amount of samples carry out data cleansing, Feature extraction is combined tuning to the parameter of various machine learning, can generate optimal abnormality detection model.Abnormality detection Model can grasp a large amount of feature of intrusion behavior, either known attack type or unknown attack type, this reality Applying example all has very strong protective capacities, and need not be safeguarded to rule, has considerable flexibility.
Embodiment three
The present embodiment will be further described through the step of data cleaning treatment.
Fig. 4 is the step flow chart of data cleansing processing according to a third embodiment of the present invention.
Step S410 filters out the static file in each access log.
For Nginx daily records, static file judgement can be carried out by contentType, and then filter out daily record In static file.
The type of static file, including:(contentType includes for music (contentType include audio/*), video Video/*), (contentType includes application/ by picture (contentType includes image/*), js Javascript), css (contenType includes text/css) etc..
Step S420 carries out duplicate removal processing to the URL repeated in multiple access logs.
In other words, the URL in each access log is inquired, all URL in multiple access logs are identified, really Fixed identical URL, only retains a URL.Further, for the URL repeated in same access log, only retain one URL deletes the URL of repetition;For the URL repeated in different access daily record, only retain the URL in an access log (one access log of random selection), deletes the URL in other access logs.
URL duplicate removals, it is ensured that each URL will not repeat in the sample, to prevent because not abnormal caused by duplicate removal The inaccurate problem of detection model failure.
Step S430 carries out alphabet size to the URL in the multiple access log and writes consistency treatment.
In the present embodiment, the capitalization occurred in URL can be converted to lowercase, the main purpose done so It is convenient in order to handle data.
The toLowerCase methods of the specific String classes that java may be used carry out capital and small letter conversion.
Step S440 is decoded processing to the URL being encoded in the multiple access log.
In nginx daily records, many URL have carried out URL codings, and therefore, it is necessary to be solved to coded URL Code.
Can specifically java.net.URLDecoder.decode (info, " utf-8 ") be passed through;URL is decoded.
Step S450 adds label for each access log, and the type of the label includes normal sample and exception Sample.
Because not only there is normal access log in the multiple access logs obtained but also there are abnormal access daily records, it is therefore desirable to Multiple access logs are identified, by label come to distinguish access log be normal access log or abnormal access daily record.
Step S460, according to pre-prepd normal ULR and exception URL, in the corresponding access log of normal sample ULR quantity in URL quantity and the corresponding access log of exceptional sample carries out balanced.
In the present embodiment, normal access log is referred to as positive sample (normal sample), abnormal access daily record is referred to as negative sample This (exceptional sample), the URL quantity in positive and negative samples is unbalance in order to prevent, influences the identification accuracy of abnormality detection model, needs Prepare a certain number of normal URL and a certain number of exception URL in advance, make URL (normal URL) quantity in positive sample and URL (abnormal URL) quantity in negative sample is equal.
It determines the URL quantity for including in all positive samples, determines the URL quantity for including in all negative samples, compare two kinds Whether the URL quantity that sample includes is identical, if it is different, then determine the specimen types of URL negligible amounts, from pre-prepd The corresponding URL of the type is randomly selected in URL, is added in sample.Such as:Prepare 10,000,000 exception URL and 1,000 in advance Ten thousand normal URL;Positive sample includes 10,000,000 URL, and negative sample includes 1,005,000,000 URL;Positive sample is fewer than negative sample 5000000 URL randomly select 5,000,000 normal URL, mean allocation from pre-prepd 10,000,000 normal URL in this way Into each positive sample.
Specifically, can be randomly selected in pre-prepd URL by the sample methods of spark.
It should be noted that step S460 can be after URL duplicate removals.Random time point before characteristic extraction is held Row.
Example IV
The present embodiment will be further described through the process for extracting characteristic.
Before the step flow of description extraction characteristic, first the composition of URL is illustrated:
Example is illustrated with a complete URL below:
http://www.test.com/login.aspUsername=admin&password=123
Before extracting characteristic, need to parse the URL, and cut the URL.
In actual access log, domain name is an individual field, and request_uri is another individual word Section.
In this example, domain name is specially http://www.test.com;Request_uri particular contents be/ login.aspUsername=admin&password=123.
Request_uri can by character "" be split, in this example ,/login.asp is uri;username =admin&password=123 is parameter.
Uri can be split by character "/", if uri is divided into multistage, every section is (a road path Diameter).
Parameter can be split by character " & ", and the every paragraph format being partitioned into is name=value, wherein name tables Show parameter name, value expression parameter values.In this example, parameter is divided into two sections by character " & ", i.e.,:Username=admin and Password=123.
Each section after dividing to character " & " is split by character "=", can then obtain independent parameter Name and parameter value.In this example, parameter name includes:Username and password, parameter value include:Admin and 123.
Fig. 5 is the step flow chart of extraction characteristic according to a fourth embodiment of the present invention.
Step S510 extracts the parameter attribute in each URL according to preset parameter type.
The parameter type, including:Ip (Internet Protocol, Internet protocol) address, http and https.
Parameter attribute includes:Value_contain__ip, value_contain_http and value_contain_ https。
value_contain_ip:Whether the argument section (Value) of URL includes the addresses ip or the argument section of URL In include ip number;
value_contain_http:Http number for including in the argument section of URL;
value_contain_https:Https number for including in the argument section of URL.
Step S520 extracts the danger classes feature of each URL according to preset abnormal keyword.
The type of abnormal keyword includes:Low danger keyword, middle danger keyword and high-risk keyword.
Danger classes feature, including:The danger classes of predetermined fraction in URL.
Such as:Danger classes feature, including:The danger classes of uri, name and value in URL.
uri_risk_level:The danger classes of uri;
name_risk_level:The danger classes of name;
value_risk_level:The danger classes of value.
Abnormal keyword may exist in abnormal key word library, exception key word library as shown in Table 1, but this field Technical staff is it is appreciated that the various low danger keyword, middle danger keyword and high-risk keyword in abnormal keyword are not limited to table 1 Shown in content.
Table 1
Low danger keyword, middle danger keyword and high-risk keyword, can embody each section in URL (uri, name and Value danger classes).
Different weighted values can be assigned to different grades of abnormal keyword, the weight of high-risk keyword is 5, and middle danger is closed The weight of key word is 2, and the weight of low danger keyword is 1.Certainly, the weight for not including abnormal keyword is 0.If in a URL The abnormal keyword for including multiple grades, then take highest weighting.
Step S530 extracts the length characteristic, quantative attribute and type feature of each URL according to preset characteristic character.
Characteristic character, including but not limited to:Character "", "/", " & " and "=".
Length characteristic is extracted:
Length characteristic, including but not limited to:uri_length、parameter_length、uri_maxlen、name_ Maxlen and value_maxlen.
uri_length:By "" first half (uri) after cutting overall length (unit:Character);
parameter_length:By "" latter half (parameter) after cutting overall length;
uri_maxlen:The maximum length that the parts uri pass through path after "/" cutting;
name_maxlen:The maximum length that argument section passes through name after " & " and "=" cutting;
value_maxlen:The maximum length that argument section passes through value after " & " and "=" cutting.
Quantative attribute is extracted:
Quantative attribute, including but not limited to:Uri_number and parameter_number.
uri_number:The parts uri "/" divide after path number;
parameter_number:The number of parameter of the argument section after " & " segmentation.
Type feature extracts:
Can with according to whether for it is empty, whether containing suspicious character, whether be pure digi-tal either pure letter or mixing decile Type in being 9, as shown in table 2:
Type Type Value Value Illustrate Example
Null value 0 ″″
Pure digi-tal 1 1234556
Pure letter 2 Abc
Normal character 3 -
Number+letter 4 123-111
Number+normal character 5 123abc
Letter+normal character 6 abc_abc
Number+character+normal character 7 123-abc_
Other characters 8 < a href
Table 2
According to the content in 2, type feature is extracted, uri can essentially be carried out according to type corresponding value Marking.
Type feature, including but not limited to:Uri_type, name_type and value_type.
uri_type:By "" type of preceding part after cutting;
name_type:The type that argument section passes through name after " & " and "=" cutting;
value_type:The type that argument section passes through value after " & " and "=" cutting.
Current embodiment require that illustrating, each sequence of steps in the present embodiment is not fixed, and can be executed with reversed order.
It, can be to the mark of the access log belonging to the characteristic of URL, the URL and the URL according to features described above data Label are packaged, and form the corresponding data model objects DataModel of the URL.
Such as:The data model objects DataModel of one URL is as shown in table 3:
Embodiment five
The present embodiment will be further described through the process for using data model objects to train abnormality detection model.Figure 6 be the step flow chart of training abnormality detection model according to a fifth embodiment of the present invention.Fig. 7 is implemented according to the present invention the 5th The schematic diagram of the training abnormality detection model of example.
Following operation is executed by the decision tree (abnormality detection model) of spark:
Label in data model objects is numbered step S610;Label in the data model objects is institute State the label of the access log belonging to the corresponding URL of data model objects.
The label in data model objects is numbered by StringIndexer classes, using label fields as defeated Enter, inputs transformed field indexedLabel.Can by it is label converting be 1 and 0,1 is expressed as abnormal access daily record, 0 table Show normal access log.
Step S620 converts the characteristic in the data model objects to single-row feature vector.
In order to facilitate abnormality detection model training, the part in data model objects can be arranged and be converted to feature vector, And Uniform Name.
Because arranging (request_uri) comprising characteristic, label column (label) and request uri in data model objects, The single-row feature vector that characteristic is converted to comprising multinomial feature can be arranged by row method for transformation.Row conversion can To be completed using the VectorAssembler classes of spark, using characteristic as input, the single vector row of output one featureVector.Further, first the characteristic converted will be needed to generate an array, passes through VectorAssembler All characteristics are converted to a row featureVector by class.
Step S630 is standardized the single-row feature vector to obtain standardized feature vector.
It is to quickly propel machine learning using standardized means to the purpose that single-row feature vector is standardized Pace of learning, and some character numerical values can also be avoided excessive, lead in abnormality detection model that proportion is excessive, make other Feature becomes secondary index.
The StandardScaler built in spark may be used to be standardized single-row feature vector.
When single-row feature vector to be standardized, it is 0 to make data mean value, variance 1.
It can be selected there are two parameter in the process:
1:Variance is zoomed to 1 by withStd=true
2:Mean value is moved on to 0 by withMean=true.
In the present embodiment, two parameters can be selected all to be set to true, by taking the DataModel of generation as an example, input For single-row feature vector featureVector, export as standardized feature vector scaledFeatures.
Step S640 uses number and the standardized feature vector of the label, the training abnormality detection model.
In the present embodiment, machine learning is carried out using spark decision trees (abnormality detection model), by label column IndexedLabel and standardized feature vector scaledFeatures inputs decision tree, the optimal abnormal inspection after output training Survey model.
The parameter set for needing to define decision tree by ParamGridBuilder () class, can concentrate from multiple parameters, select Optimal parameter set is selected, according to optimal parameter set, the abnormality detection Model Identification accuracy highest of acquisition, and most by accuracy High abnormality detection model is as output.
Decision tree needs the parameter to be used to include:MaxBins, maxDepth, impurity and numTrees.
maxBins:For the maximum division numbers of disruptive features, in the present embodiment, maximum division numbers selectable value is 25、28、31、33;
maxDepth:The maximum height of decision tree, in the present embodiment, the maximum height selectable value of decision tree is 4,6,8, 10;
impurity:Purity computational methods are used for the calculating of information gain, and computational methods only include " entropy " and “gini”;
numTrees:The quantity of decision tree is built, in the present embodiment, the selectable value of numTrees is 10,15,20.
Parameter set is the combination of the selectable value of maxBins, maxDepth, impurity and numTrees.
Furthermore it is possible to carry out the verification of two classification by BinaryClassficationEvaluator, in order to Judge the function admirable of abnormality detection model.
As shown in fig. 7, the process of the present embodiment training abnormality detection model is ten folding cross validation flows, ten foldings intersection is tested Flow is demonstrate,proved using paramsMap, estimator and evaluator as input, trains optimal abnormality detection model.At this In embodiment, Spark supports are trained abnormality detection model by CrossValidator tools.Input parameter includes ParamsMap, estimator and evaluator.
paramsMap:For decision tree need parameter maxBins, maxDepth, impurity to be used and NumTrees carries out the combination of selectable value, obtains multigroup parameter set;
estimator:It is instructed using label column indexedLabel and standardized feature vector scaledFeatures Practice, determines the corresponding training result of every group of parameter set;
evaluator:Training result is assessed, the best abnormality detection model of performance is determined and exports.
Embodiment six
The present embodiment provides a kind of web access exceptions detection devices.
Fig. 8 is the structure chart of web access exception detection devices according to a seventh embodiment of the present invention.
The web access exception detection devices, including:
Training module 810, for according to multiple access logs, training abnormality detection model;Wherein, in the multiple access Daily record includes normal access log and abnormal access daily record.
Receiving module 820, the hypertext transfer protocol http request for receiving user equipment transmission.
Identification module 830, for whether being exception request by http request described in the abnormality detection Model Identification.
Blocking module 840, for the identification module judge the http request for exception request in the case of, intercept The http request.
Further, as shown in figure 9, the training module 810, including:
Processing unit 811 is carried out for obtaining multiple access logs, and to the multiple access log at data cleansing Reason.
Extraction unit 812, for after data cleansing processing, in the multiple access log, extracting each unified money The characteristic of source finger URL URL.
Generation unit 813, for according to each data cleansing handling result of the access log and each URL Characteristic, corresponded to for each URL and generate data model objects.
Training unit 814 is handled each data model objects, and make for the decision tree by spark With treated, data model objects train abnormality detection model.
The processing unit 811, is further used for:Filter out the static file in each access log;To described The URL repeated in multiple access logs carries out duplicate removal processing;Alphabet size is carried out to the URL in the multiple access log Write consistency treatment;Processing is decoded to the URL being encoded in the multiple access log;Add for each access log It tags, the type of the label includes normal sample and exceptional sample;It is right according to pre-prepd normal ULR and exception URL ULR quantity in URL quantity and the corresponding access log of exceptional sample in the corresponding access log of normal sample carries out balanced.
The extraction unit 812, is further used for:According to preset parameter type, the parameter extracted in each URL is special Sign;According to preset abnormal keyword, the danger classes feature of each URL is extracted;According to preset characteristic character, extraction is each Length characteristic, quantative attribute and the type feature of URL.
The training unit 814, is further used for:Label in the data model objects is numbered;The number According to the label that the label in model object is the access log belonging to the corresponding URL of the data model objects;By the data Characteristic in model object is converted into single-row feature vector;The single-row feature vector is standardized, is obtained Standardized feature vector;Use number and the standardized feature vector of the label, the training abnormality detection model.
The function of device described in the present embodiment is described in Fig. 2-embodiments of the method shown in Fig. 7, therefore Not detailed place, may refer to the related description in previous embodiment, this will not be repeated here in the description of the present embodiment.
The present invention carries out feature extraction to a large amount of normal sample and exceptional sample, can grasp normal access and invasion is visited More than ten of feature for asking most critical carries out decision tree classification study by these features, and the one-time detection model of generation is grasped A large amount of intrusion behavior feature.
The present invention can also verify the parameter in decision tree, select optimal parameter combination, generate optimal Abnormality detection model has better protection to intrusion behavior.
The abnormality detection model that the present invention generates has very high flexibility, avoids maintenance regulation this link, reduces Maintenance cost and the abnormality detection model of generation unknown exception can be protected.
Although being example purpose, the preferred embodiment of the present invention is had been disclosed for, those skilled in the art will recognize Various improvement, increase and substitution are also possible, and therefore, the scope of the present invention should be not limited to the above embodiments.

Claims (10)

1. a kind of global wide area network web access exception detection methods, which is characterized in that including:
According to multiple access logs, training abnormality detection model;Wherein, it include normal access day in the multiple access log Will and abnormal access daily record;
Receive the hypertext transfer protocol http request that user equipment is sent;
Whether it is exception request by http request described in the abnormality detection Model Identification;
If the http request is exception request, the http request is intercepted.
2. the method as described in claim 1, which is characterized in that according to multiple access logs, training abnormality detection model, packet It includes:
Multiple access logs are obtained, and data cleansing processing is carried out to the multiple access log;
After data cleansing processing, in the multiple access log, the characteristic of each uniform resource position mark URL is extracted According to;
According to the data cleansing handling result of each access log and the characteristic of each URL, for each institute It states URL and corresponds to generation data model objects;
By the decision tree of spark, each data model objects are handled, and use treated data model pair As training abnormality detection model.
3. method as claimed in claim 2, which is characterized in that carry out data cleansing processing, packet to the multiple access log It includes:
Filter out the static file in each access log;
Duplicate removal processing is carried out to the URL repeated in the multiple access log;
Alphabet size is carried out to the URL in the multiple access log and writes consistency treatment;
Processing is decoded to the URL being encoded in the multiple access log;
Label is added for each access log, the type of the label includes normal sample and exceptional sample;
According to pre-prepd normal ULR and exception URL, to the URL quantity and exception in the corresponding access log of normal sample ULR quantity in the corresponding access log of sample carries out balanced.
4. method as claimed in claim 3, which is characterized in that the characteristic of each URL is extracted, including:
According to preset parameter type, the parameter attribute in each URL is extracted;
According to preset abnormal keyword, the danger classes feature of each URL is extracted;
According to preset characteristic character, the length characteristic, quantative attribute and type feature of each URL are extracted.
5. method as claimed in claim 3, which is characterized in that handle each data model objects, and use Treated, and data model objects train abnormality detection model, including:
Label in the data model objects is numbered;Label in the data model objects is the data model The label of access log belonging to the corresponding URL of object;
Convert the characteristic in the data model objects to single-row feature vector;
The single-row feature vector is standardized, standardized feature vector is obtained;
Use number and the standardized feature vector of the label, the training abnormality detection model.
6. a kind of web access exceptions detection device, which is characterized in that including:
Training module, for according to multiple access logs, training abnormality detection model;Wherein, in the multiple access log Including normal access log and abnormal access daily record;
Receiving module, the hypertext transfer protocol http request for receiving user equipment transmission;
Identification module, for whether being exception request by http request described in the abnormality detection Model Identification;
Blocking module, for the identification module judge the http request for exception request in the case of, described in interception Http request.
7. device as claimed in claim 6, which is characterized in that the training module, including:
Processing unit carries out data cleansing processing for obtaining multiple access logs, and to the multiple access log;
Extraction unit, for after data cleansing processing, in the multiple access log, extracting each uniform resource locator The characteristic of URL;
Generation unit, for according to the data cleansing handling result of each access log and the feature of each URL Data correspond to for each URL and generate data model objects;
Training unit is handled each data model objects for the decision tree by spark, and uses processing Data model objects afterwards train abnormality detection model.
8. device as claimed in claim 7, which is characterized in that the processing unit is further used for:
Filter out the static file in each access log;
Duplicate removal processing is carried out to the URL repeated in the multiple access log;
Alphabet size is carried out to the URL in the multiple access log and writes consistency treatment;
Processing is decoded to the URL being encoded in the multiple access log;
Label is added for each access log, the type of the label includes normal sample and exceptional sample;
According to pre-prepd normal ULR and exception URL, to the URL quantity and exception in the corresponding access log of normal sample ULR quantity in the corresponding access log of sample carries out balanced.
9. device as claimed in claim 8, which is characterized in that the extraction unit is further used for:
According to preset parameter type, the parameter attribute in each URL is extracted;
According to preset abnormal keyword, the danger classes feature of each URL is extracted;
According to preset characteristic character, the length characteristic, quantative attribute and type feature of each URL are extracted.
10. device as claimed in claim 8, which is characterized in that the training unit is further used for:
Label in the data model objects is numbered;Label in the data model objects is the data model The label of access log belonging to the corresponding URL of object;
Convert the characteristic in the data model objects to single-row feature vector;
The single-row feature vector is standardized, standardized feature vector is obtained;
Use number and the standardized feature vector of the label, the training abnormality detection model.
CN201810158886.0A 2018-02-24 2018-02-24 A kind of web access exceptions detection method and device Pending CN108616498A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810158886.0A CN108616498A (en) 2018-02-24 2018-02-24 A kind of web access exceptions detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810158886.0A CN108616498A (en) 2018-02-24 2018-02-24 A kind of web access exceptions detection method and device

Publications (1)

Publication Number Publication Date
CN108616498A true CN108616498A (en) 2018-10-02

Family

ID=63658394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810158886.0A Pending CN108616498A (en) 2018-02-24 2018-02-24 A kind of web access exceptions detection method and device

Country Status (1)

Country Link
CN (1) CN108616498A (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109347827A (en) * 2018-10-22 2019-02-15 东软集团股份有限公司 Method, apparatus, equipment and the storage medium of attack prediction
CN109391620A (en) * 2018-10-22 2019-02-26 武汉极意网络科技有限公司 Method for building up, system, server and the storage medium of abnormal behaviour decision model
CN109635993A (en) * 2018-10-23 2019-04-16 平安科技(深圳)有限公司 Operation behavior monitoring method and device based on prediction model
CN109635039A (en) * 2018-11-23 2019-04-16 金色熊猫有限公司 Multiple data centers method of data synchronization and system
CN109886290A (en) * 2019-01-08 2019-06-14 平安科技(深圳)有限公司 Detection method, device, computer equipment and the storage medium of user's request
CN110351301A (en) * 2019-07-26 2019-10-18 长沙市智为信息技术有限公司 A kind of double-deck progressive method for detecting abnormality of HTTP request
CN110753064A (en) * 2019-10-28 2020-02-04 中国科学技术大学 Machine learning and rule matching fused security detection system
CN110765393A (en) * 2019-09-17 2020-02-07 微梦创科网络科技(中国)有限公司 Method and device for identifying harmful URL (uniform resource locator) based on vectorization and logistic regression
CN111371757A (en) * 2020-02-25 2020-07-03 腾讯科技(深圳)有限公司 Malicious communication detection method and device, computer equipment and storage medium
CN111371806A (en) * 2020-03-18 2020-07-03 北京邮电大学 Web attack detection method and device
CN111585955A (en) * 2020-03-31 2020-08-25 中南大学 HTTP request abnormity detection method and system
CN111639277A (en) * 2020-05-22 2020-09-08 杭州安恒信息技术股份有限公司 Automated extraction method of machine learning sample set and computer-readable storage medium
CN111930545A (en) * 2019-05-13 2020-11-13 ***通信集团湖北有限公司 Program script processing method and device and server
CN112054989A (en) * 2020-07-13 2020-12-08 北京天融信网络安全技术有限公司 Construction method of detection model and detection method of batch operation abnormity
CN112383529A (en) * 2020-11-09 2021-02-19 浙江大学 Method for generating confrontation flow in mimicry WAF
CN112817789A (en) * 2021-02-23 2021-05-18 浙江大华技术股份有限公司 Modeling method and device based on browser transmission
CN113158182A (en) * 2020-01-07 2021-07-23 深信服科技股份有限公司 Web attack detection method and device, electronic equipment and storage medium
CN113904829A (en) * 2021-09-29 2022-01-07 上海市大数据股份有限公司 Application firewall system based on machine learning
CN114666162A (en) * 2022-04-29 2022-06-24 北京火山引擎科技有限公司 Flow detection method, device, equipment and storage medium
CN115168848A (en) * 2022-09-08 2022-10-11 南京鼎山信息科技有限公司 Interception feedback processing method based on big data analysis interception
CN116980235A (en) * 2023-09-25 2023-10-31 成都数智创新精益科技有限公司 Artificial intelligence-based interception method for WEB illegal request
CN117436496A (en) * 2023-11-22 2024-01-23 深圳市网安信科技有限公司 Training method and detection method of anomaly detection model based on big data log
EP3918500B1 (en) * 2019-03-05 2024-04-24 Siemens Industry Software Inc. Machine learning-based anomaly detections for embedded software applications

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8668649B2 (en) * 2010-02-04 2014-03-11 Siemens Medical Solutions Usa, Inc. System for cardiac status determination
CN105656886A (en) * 2015-12-29 2016-06-08 北京邮电大学 Method and device for detecting website attack behaviors based on machine learning
CN106209845A (en) * 2016-07-12 2016-12-07 国家计算机网络与信息安全管理中心 A kind of malicious HTTP based on Bayesian Learning Theory request decision method
CN106452955A (en) * 2016-09-29 2017-02-22 北京赛博兴安科技有限公司 Abnormal network connection detection method and system
CN107346388A (en) * 2017-07-03 2017-11-14 四川无声信息技术有限公司 Web attack detection methods and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8668649B2 (en) * 2010-02-04 2014-03-11 Siemens Medical Solutions Usa, Inc. System for cardiac status determination
CN105656886A (en) * 2015-12-29 2016-06-08 北京邮电大学 Method and device for detecting website attack behaviors based on machine learning
CN106209845A (en) * 2016-07-12 2016-12-07 国家计算机网络与信息安全管理中心 A kind of malicious HTTP based on Bayesian Learning Theory request decision method
CN106452955A (en) * 2016-09-29 2017-02-22 北京赛博兴安科技有限公司 Abnormal network connection detection method and system
CN107346388A (en) * 2017-07-03 2017-11-14 四川无声信息技术有限公司 Web attack detection methods and device

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109347827A (en) * 2018-10-22 2019-02-15 东软集团股份有限公司 Method, apparatus, equipment and the storage medium of attack prediction
CN109391620A (en) * 2018-10-22 2019-02-26 武汉极意网络科技有限公司 Method for building up, system, server and the storage medium of abnormal behaviour decision model
CN109391620B (en) * 2018-10-22 2021-06-25 武汉极意网络科技有限公司 Method, system, server and storage medium for establishing abnormal behavior judgment model
CN109347827B (en) * 2018-10-22 2021-06-22 东软集团股份有限公司 Method, device, equipment and storage medium for predicting network attack behavior
CN109635993A (en) * 2018-10-23 2019-04-16 平安科技(深圳)有限公司 Operation behavior monitoring method and device based on prediction model
CN109635039A (en) * 2018-11-23 2019-04-16 金色熊猫有限公司 Multiple data centers method of data synchronization and system
CN109886290A (en) * 2019-01-08 2019-06-14 平安科技(深圳)有限公司 Detection method, device, computer equipment and the storage medium of user's request
CN109886290B (en) * 2019-01-08 2024-05-28 平安科技(深圳)有限公司 User request detection method and device, computer equipment and storage medium
EP3918500B1 (en) * 2019-03-05 2024-04-24 Siemens Industry Software Inc. Machine learning-based anomaly detections for embedded software applications
CN111930545A (en) * 2019-05-13 2020-11-13 ***通信集团湖北有限公司 Program script processing method and device and server
CN111930545B (en) * 2019-05-13 2023-11-03 ***通信集团湖北有限公司 SQL script processing method, SQL script processing device and SQL script processing server
CN110351301A (en) * 2019-07-26 2019-10-18 长沙市智为信息技术有限公司 A kind of double-deck progressive method for detecting abnormality of HTTP request
CN110351301B (en) * 2019-07-26 2021-09-28 长沙市智为信息技术有限公司 HTTP request double-layer progressive anomaly detection method
CN110765393A (en) * 2019-09-17 2020-02-07 微梦创科网络科技(中国)有限公司 Method and device for identifying harmful URL (uniform resource locator) based on vectorization and logistic regression
CN110753064A (en) * 2019-10-28 2020-02-04 中国科学技术大学 Machine learning and rule matching fused security detection system
CN110753064B (en) * 2019-10-28 2021-05-07 中国科学技术大学 Machine learning and rule matching fused security detection system
CN113158182A (en) * 2020-01-07 2021-07-23 深信服科技股份有限公司 Web attack detection method and device, electronic equipment and storage medium
CN111371757A (en) * 2020-02-25 2020-07-03 腾讯科技(深圳)有限公司 Malicious communication detection method and device, computer equipment and storage medium
CN111371806A (en) * 2020-03-18 2020-07-03 北京邮电大学 Web attack detection method and device
CN111371806B (en) * 2020-03-18 2021-05-25 北京邮电大学 Web attack detection method and device
CN111585955A (en) * 2020-03-31 2020-08-25 中南大学 HTTP request abnormity detection method and system
CN111585955B (en) * 2020-03-31 2021-10-15 中南大学 HTTP request abnormity detection method and system
CN111639277A (en) * 2020-05-22 2020-09-08 杭州安恒信息技术股份有限公司 Automated extraction method of machine learning sample set and computer-readable storage medium
CN112054989A (en) * 2020-07-13 2020-12-08 北京天融信网络安全技术有限公司 Construction method of detection model and detection method of batch operation abnormity
CN112383529A (en) * 2020-11-09 2021-02-19 浙江大学 Method for generating confrontation flow in mimicry WAF
CN112817789A (en) * 2021-02-23 2021-05-18 浙江大华技术股份有限公司 Modeling method and device based on browser transmission
CN112817789B (en) * 2021-02-23 2023-01-31 浙江大华技术股份有限公司 Modeling method and device based on browser transmission
CN113904829B (en) * 2021-09-29 2024-01-23 上海市大数据股份有限公司 Application firewall system based on machine learning
CN113904829A (en) * 2021-09-29 2022-01-07 上海市大数据股份有限公司 Application firewall system based on machine learning
CN114666162B (en) * 2022-04-29 2023-05-05 北京火山引擎科技有限公司 Flow detection method, device, equipment and storage medium
CN114666162A (en) * 2022-04-29 2022-06-24 北京火山引擎科技有限公司 Flow detection method, device, equipment and storage medium
CN115168848B (en) * 2022-09-08 2022-12-16 南京鼎山信息科技有限公司 Interception feedback processing method based on big data analysis interception
CN115168848A (en) * 2022-09-08 2022-10-11 南京鼎山信息科技有限公司 Interception feedback processing method based on big data analysis interception
CN116980235A (en) * 2023-09-25 2023-10-31 成都数智创新精益科技有限公司 Artificial intelligence-based interception method for WEB illegal request
CN117436496A (en) * 2023-11-22 2024-01-23 深圳市网安信科技有限公司 Training method and detection method of anomaly detection model based on big data log

Similar Documents

Publication Publication Date Title
CN108616498A (en) A kind of web access exceptions detection method and device
CN110233849B (en) Method and system for analyzing network security situation
Lin et al. Malicious URL filtering—A big data application
CN110808968B (en) Network attack detection method and device, electronic equipment and readable storage medium
CN107707545B (en) Abnormal webpage access fragment detection method, device, equipment and storage medium
TWI498752B (en) Extracting information from unstructured data and mapping the information to a structured schema using the naive bayesian probability model
CN108768883B (en) Network traffic identification method and device
CN103559235A (en) Online social network malicious webpage detection and identification method
CN111030941A (en) Decision tree-based HTTPS encrypted flow classification method
CN105306463A (en) Modbus TCP intrusion detection method based on support vector machine
CN112685738B (en) Malicious confusion script static detection method based on multi-stage voting mechanism
Gupta et al. Performance analysis of anti-phishing tools and study of classification data mining algorithms for a novel anti-phishing system
CN111049819A (en) Threat information discovery method based on threat modeling and computer equipment
CN111866024A (en) Network encryption traffic identification method and device
CN108683685A (en) A kind of cloud security CDN system and monitoring method for XSS attack
CN110602021A (en) Safety risk value evaluation method based on combination of HTTP request behavior and business process
CN114338195A (en) Web traffic anomaly detection method and device based on improved isolated forest algorithm
Manasrah et al. DGA-based botnets detection using DNS traffic mining
Harbola et al. Improved intrusion detection in DDoS applying feature selection using rank & score of attributes in KDD-99 data set
CN110602020A (en) Botnet detection technology based on DGA domain name and periodic network connection session behavior
CN113904834B (en) XSS attack detection method based on machine learning
CN110572402A (en) internet hosting website detection method and system based on network access behavior analysis and readable storage medium
CN113946823A (en) SQL injection detection method and device based on URL baseline deviation analysis
Iqbal et al. Analysis of a payload-based network intrusion detection system using pattern recognition processors
Setianto et al. Gpt-2c: A gpt-2 parser for cowrie honeypot logs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20181002

WD01 Invention patent application deemed withdrawn after publication