CN108616498A

CN108616498A - A kind of web access exceptions detection method and device

Info

Publication number: CN108616498A
Application number: CN201810158886.0A
Authority: CN
Inventors: 党向磊; 张鸿; 徐太忠; 惠榛; 王金松; 陈阳; 汪立东; 赵路
Original assignee: National Computer Network and Information Security Management Center
Current assignee: National Computer Network and Information Security Management Center
Priority date: 2018-02-24
Filing date: 2018-02-24
Publication date: 2018-10-02

Abstract

The invention discloses a kind of web access exceptions detection method and devices.This method includes：According to multiple access logs, training abnormality detection model；Wherein, include normal access log and abnormal access daily record in the multiple access log；Receive the hypertext transfer protocol http request that user equipment is sent；Whether it is exception request by http request described in the abnormality detection Model Identification；If the http request is exception request, the http request is intercepted.The embodiment of the present invention can be applied to web safety and machine learning field, by carrying out machine learning to a large amount of normal samples and exceptional sample, it can be used for the access exception detection of the security fields web and intercept, can solving traditional waf fire walls, the method maintenance cost that access is intercepted is high, flexibility is poor, does not have the technical issues of protective capacities to unknown exception to invading.

Description

A kind of web access exceptions detection method and device

Technical field

The present invention relates to technical field of network security, more particularly to a kind of web (World Wide Web, global wide area Net) access exception detection method and device.

Background technology

With the continuous development of Internet technology, traditional network boundary is fading away, the enterprise of industrial quarters, especially It is Large-Scale Interconnected net company, average to reach up to ten million per purpose any active ues, the daily record of application system each in this way will Up to hundreds of G bytes, even up to the T orders of magnitude.Currently, the malicious access accounting with ash production and black production for representative still occupies height not Under, it is all being sent out in daily each period for Large-Scale Interconnected net company, the especially malicious attack of the industries such as finance, telecommunications It is raw, and attack means are continuing to introduce new, and cause network security problem.

For malicious attack, traditional solution is by waf (Web Application Firewall, web applications Guard system) fire wall performs intrusion detection, based on default rule to http (HyperText Transfer Protocol, Hypertext transfer protocol) it asks to be intercepted or let pass, it is accessed to intercept invasion.Waf fire walls as shown in Figure 1 enter Invade detection operating diagram.Waf fire walls receive http request, first parse header, body information of http request, then Check that User Defined configures, the rule in being configured according to the User Defined determines whether that demand closes the http request, blocks The http request for needing to close is cut, the http request that need not be closed of letting pass.Further, the rule in User Defined configuration It is mainly used for matching the parts such as uri, parameter, ua, cookie, referer in htpp requests, if successful match, Then the http request is intercepted, otherwise is let pass.

But traditional web Intrusion Detection Technique underactions, and there is no detectability to unknown exception.Specifically, Default rule is based on by waf fire walls to intercept invasion access, on the one hand, it is hard regular in face of flexible hacker, It is very easily by-passed, and is difficult to cope with 0day attacks based on previous rule set；On the other hand, when the river rises the boat goes up for Attack Defence, The construction of defender's rule and safeguard that threshold is high and cost is big, it is often more important that traditional human technology is only limitted to defend known prestige The side of body so unknown threat can not be detected, also just let alone is effectively blocked unknown due to not knowing that is unknown threat It threatens.

Invention content

The technical problem to be solved in the present invention is to provide a kind of web access exceptions detection method and devices, existing to solve There are web Intrusion Detection Technique underactions, and there is no the problem of detectability to unknown exception.

In order to solve the above-mentioned technical problem, the present invention solves by the following technical programs：

The present invention provides a kind of global wide area network web access exception detection methods, including：According to multiple access logs, instruction Practice abnormality detection model；Wherein, include normal access log and abnormal access daily record in the multiple access log；It receives and uses The hypertext transfer protocol http request that family equipment is sent；By http request described in the abnormality detection Model Identification whether be Exception request；If the http request is exception request, the http request is intercepted.

Wherein, according to multiple access logs, abnormality detection model is trained, including：Multiple access logs are obtained, and to described Multiple access logs carry out data cleansing processing；After data cleansing processing, in the multiple access log, each system is extracted The characteristic of one Resource Locator URL；It is according to the data cleansing handling result of each access log and each described The characteristic of URL corresponds to for each URL and generates data model objects；By the decision tree of spark, to each described Data model objects are handled, and data model objects train abnormality detection model using treated.

Wherein, data cleansing processing is carried out to the multiple access log, including：It filters out in each access log Static file；Duplicate removal processing is carried out to the URL repeated in the multiple access log；To in the multiple access log URL carry out alphabet size write consistency treatment；Processing is decoded to the URL being encoded in the multiple access log；For Each access log adds label, and the type of the label includes normal sample and exceptional sample；According to pre-prepd Normal ULR and exception URL, to the URL quantity and the corresponding access log of exceptional sample in the corresponding access log of normal sample In ULR quantity carry out it is balanced.

Wherein, the characteristic of each URL is extracted, including：According to preset parameter type, the ginseng in each URL is extracted Number feature；According to preset abnormal keyword, the danger classes feature of each URL is extracted；According to preset characteristic character, extraction Length characteristic, quantative attribute and the type feature of each URL.

Wherein, each data model objects are handled, and data model objects training is different using treated Normal detection model, including：Label in the data model objects is numbered；Label in the data model objects is The label of access log belonging to the corresponding URL of the data model objects；By the characteristic in the data model objects It is converted into single-row feature vector；The single-row feature vector is standardized, standardized feature vector is obtained；Using institute State the number and standardized feature vector of label, the training abnormality detection model.

The present invention provides a kind of web access exceptions detection device, including：Training module is used for according to multiple access logs, Training abnormality detection model；Wherein, include normal access log and abnormal access daily record in the multiple access log；It receives Module, the hypertext transfer protocol http request for receiving user equipment transmission；Identification module, for passing through the abnormal inspection Survey whether http request described in Model Identification is exception request；Blocking module, for judging the http in the identification module In the case that request is exception request, the http request is intercepted.

Wherein, the training module, including：Processing unit, for obtaining multiple access logs, and to the multiple access Daily record carries out data cleansing processing；Extraction unit, for after data cleansing processing, in the multiple access log, extracting The characteristic of each uniform resource position mark URL；Generation unit, at the data cleansing according to each access log Result and the characteristic of each URL are managed, is corresponded to for each URL and generates data model objects；Training unit is used In the decision tree by spark, each data model objects are handled, and use treated data model objects Training abnormality detection model.

Wherein, the processing unit, is further used for：Filter out the static file in each access log；To institute It states the URL repeated in multiple access logs and carries out duplicate removal processing；It is big that letter is carried out to the URL in the multiple access log Small letter consistency treatment；Processing is decoded to the URL being encoded in the multiple access log；For each access log Label is added, the type of the label includes normal sample and exceptional sample；According to pre-prepd normal ULR and exception URL, ULR quantity in URL quantity and the corresponding access log of exceptional sample in the corresponding access log of normal sample is carried out equal Weighing apparatus.

Wherein, the extraction unit, is further used for：According to preset parameter type, the parameter extracted in each URL is special Sign；According to preset abnormal keyword, the danger classes feature of each URL is extracted；According to preset characteristic character, extraction is each Length characteristic, quantative attribute and the type feature of URL.

Wherein, the training unit, is further used for：Label in the data model objects is numbered；It is described Label in data model objects is the label of the access log belonging to the corresponding URL of the data model objects；By the number It is converted into single-row feature vector according to the characteristic in model object；The single-row feature vector is standardized, is obtained To standardized feature vector；Use number and the standardized feature vector of the label, the training abnormality detection model.

The present invention has the beneficial effect that：

The embodiment of the present invention can be applied to web safety and machine learning field, by a large amount of normal samples and exception Sample carries out machine learning, can be used for the access exception detection of the security fields web and intercepts, can solve traditional waf fire prevention Wall accesses invasion that the method maintenance cost intercepted is high, flexibility is poor, asks the technology of the not no protective capacities of unknown exception Topic.

Description of the drawings

Fig. 1 is the intrusion detection operating diagram of existing waf fire walls；

Fig. 2 is the flow chart of web access exception detection methods according to a first embodiment of the present invention；

Fig. 3 is the flow chart of training abnormality detection model according to a second embodiment of the present invention；

Fig. 4 is the step flow chart of data cleansing processing according to a third embodiment of the present invention；

Fig. 5 is the step flow chart of extraction characteristic according to a fourth embodiment of the present invention；

Fig. 6 is the step flow chart of training abnormality detection model according to a fifth embodiment of the present invention；

Fig. 7 is the schematic diagram of training abnormality detection model according to a fifth embodiment of the present invention；

Fig. 8 is the structure chart of web access exception detection devices according to a seventh embodiment of the present invention；

Fig. 9 is the structure chart of training module according to a seventh embodiment of the present invention.

Specific implementation mode

Machine learning method can carry out automation study and training based on mass data, in image, voice, nature Language Processing etc. extensive use.The present invention utilizes machine learning method, training abnormality detection model to pass through abnormality detection mould Type identifies intrusion behavior.The abnormality detection model of the present invention is different from unsupervised web intrusion detections, i.e., the present invention not only makes It is modeled with normal access log to identify normal discharge, but by the way that normal access log and abnormal access daily record to be combined Mode models, Direct Recognition abnormal flow, and flexible in confrontation can identify the unknown prestige continued to introduce new The side of body.

Below in conjunction with attached drawing and embodiment, the present invention will be described in further detail.It should be appreciated that described herein Specific embodiment be only used to explain the present invention, limit the present invention.

Embodiment one

The present embodiment provides a kind of web access exceptions detection methods.Fig. 2 is that web according to a first embodiment of the present invention is visited Ask the flow chart of method for detecting abnormality.

Step S210, according to multiple access logs, training abnormality detection model.

In the present embodiment, include in multiple access log：Normal access log and abnormal access daily record.

Abnormality detection model, it is whether abnormal for detecting http request.If http request is abnormal, it is determined that for invasion row To need to intercept the http request.

Every preset time period obtains an access log, and according to the multiple access logs got, the training exception Detection model ensures the recognition accuracy of the abnormality detection model.

In the present embodiment, access log can be Nginx daily records, and each access log is abnormal for training as one The sample of detection model.Normal access log is positive sample, and abnormal access daily record is negative sample.

Step S220 receives the http request that user equipment is sent.

Whether step S230 is exception request by http request described in the abnormality detection Model Identification.If it is, Execute step S240；If not, thening follow the steps S250.

Step S240 intercepts the http request if the http request is exception request.

Step S250, if the http request is normal request, the http request of letting pass.

The embodiment of the present invention can be applied to web safety and machine learning field, can be used for the access of the security fields web Abnormality detection and interception can solve traditional waf fire walls and access invasion the method maintenance cost height intercepted, flexibility Difference does not have the technical issues of protective capacities to unknown exception.

Embodiment two

The present embodiment is further described through the training process of abnormality detection module.

Fig. 3 is the step flow chart of training abnormality detection model according to a second embodiment of the present invention.

Step S310 obtains multiple access logs, and carries out data cleansing processing to the multiple access log.

Data cleansing is handled, and includes at least following one：Filter out static file, URL duplicate removals, capitalization be converted into small letter, URL decodings, balance URL quantity in positive negative sample at the calibration of positive negative sample.

In the present embodiment, every preset time period obtains an access log, or obtains multiple visits input by user It asks daily record, data request processing is carried out to the multiple access logs got.

In the present embodiment, both included normal access log in the multiple access logs got, and also included abnormal visit Ask daily record.Further, the multiple access logs got can have determined whether abnormal access by analyzing in advance Daily record.Such as：The access log in a period of time in application system is obtained, whether extremely each access log is analyzed, in determination Each access log normally whether after, also have multiple access logs of abnormal log abnormal for training existing normal daily record Detection model.

Step S320 in the multiple access log, extracts each URL (Uniform after data cleansing processing Resource Locator, uniform resource locator) characteristic.

Characteristic includes at least following one：To length characteristic, quantative attribute, type feature, danger classes feature, Parameter attribute.

Step S330, according to the data cleansing handling result of each access log and the feature of each URL Data correspond to for each URL and generate data model objects.

In data model objects, include at least：The mark of URL, the characteristic of the URL, access log belonging to the URL Label.

The label of access log is for reflecting that the journal file is normal access log or abnormal access daily record.

Step S340 is handled each data model objects by the decision tree of spark, and using treated Data model objects train abnormality detection model.

By spark decision tree Primary Construction abnormality detection models, the abnormality detection model accuracy of Primary Construction is relatively low, Subsequently through treated, data model objects train abnormality detection model, the identification for stepping up abnormality detection model accurate Degree.Further, abnormality detection model is decision tree.

In the present embodiment, the unknown abnormality detection technology based on decision tree, by great amount of samples carry out data cleansing, Feature extraction is combined tuning to the parameter of various machine learning, can generate optimal abnormality detection model.Abnormality detection Model can grasp a large amount of feature of intrusion behavior, either known attack type or unknown attack type, this reality Applying example all has very strong protective capacities, and need not be safeguarded to rule, has considerable flexibility.

Embodiment three

The present embodiment will be further described through the step of data cleaning treatment.

Fig. 4 is the step flow chart of data cleansing processing according to a third embodiment of the present invention.

Step S410 filters out the static file in each access log.

For Nginx daily records, static file judgement can be carried out by contentType, and then filter out daily record In static file.

The type of static file, including：(contentType includes for music (contentType include audio/*), video Video/*), (contentType includes application/ by picture (contentType includes image/*), js Javascript), css (contenType includes text/css) etc..

Step S420 carries out duplicate removal processing to the URL repeated in multiple access logs.

In other words, the URL in each access log is inquired, all URL in multiple access logs are identified, really Fixed identical URL, only retains a URL.Further, for the URL repeated in same access log, only retain one URL deletes the URL of repetition；For the URL repeated in different access daily record, only retain the URL in an access log (one access log of random selection), deletes the URL in other access logs.

URL duplicate removals, it is ensured that each URL will not repeat in the sample, to prevent because not abnormal caused by duplicate removal The inaccurate problem of detection model failure.

Step S430 carries out alphabet size to the URL in the multiple access log and writes consistency treatment.

In the present embodiment, the capitalization occurred in URL can be converted to lowercase, the main purpose done so It is convenient in order to handle data.

The toLowerCase methods of the specific String classes that java may be used carry out capital and small letter conversion.

Step S440 is decoded processing to the URL being encoded in the multiple access log.

In nginx daily records, many URL have carried out URL codings, and therefore, it is necessary to be solved to coded URL Code.

Can specifically java.net.URLDecoder.decode (info, " utf-8 ") be passed through；URL is decoded.

Step S450 adds label for each access log, and the type of the label includes normal sample and exception Sample.

Because not only there is normal access log in the multiple access logs obtained but also there are abnormal access daily records, it is therefore desirable to Multiple access logs are identified, by label come to distinguish access log be normal access log or abnormal access daily record.

Step S460, according to pre-prepd normal ULR and exception URL, in the corresponding access log of normal sample ULR quantity in URL quantity and the corresponding access log of exceptional sample carries out balanced.

In the present embodiment, normal access log is referred to as positive sample (normal sample), abnormal access daily record is referred to as negative sample This (exceptional sample), the URL quantity in positive and negative samples is unbalance in order to prevent, influences the identification accuracy of abnormality detection model, needs Prepare a certain number of normal URL and a certain number of exception URL in advance, make URL (normal URL) quantity in positive sample and URL (abnormal URL) quantity in negative sample is equal.

It determines the URL quantity for including in all positive samples, determines the URL quantity for including in all negative samples, compare two kinds Whether the URL quantity that sample includes is identical, if it is different, then determine the specimen types of URL negligible amounts, from pre-prepd The corresponding URL of the type is randomly selected in URL, is added in sample.Such as：Prepare 10,000,000 exception URL and 1,000 in advance Ten thousand normal URL；Positive sample includes 10,000,000 URL, and negative sample includes 1,005,000,000 URL；Positive sample is fewer than negative sample 5000000 URL randomly select 5,000,000 normal URL, mean allocation from pre-prepd 10,000,000 normal URL in this way Into each positive sample.

Specifically, can be randomly selected in pre-prepd URL by the sample methods of spark.

It should be noted that step S460 can be after URL duplicate removals.Random time point before characteristic extraction is held Row.

Example IV

The present embodiment will be further described through the process for extracting characteristic.

Before the step flow of description extraction characteristic, first the composition of URL is illustrated：

Example is illustrated with a complete URL below：

http://www.test.com/login.aspUsername=admin＆password=123

Before extracting characteristic, need to parse the URL, and cut the URL.

In actual access log, domain name is an individual field, and request_uri is another individual word Section.

In this example, domain name is specially http://www.test.com；Request_uri particular contents be/ login.aspUsername=admin＆password=123.

Request_uri can by character "" be split, in this example ,/login.asp is uri；username =admin＆password=123 is parameter.

Uri can be split by character "/", if uri is divided into multistage, every section is (a road path Diameter).

Parameter can be split by character " ＆ ", and the every paragraph format being partitioned into is name=value, wherein name tables Show parameter name, value expression parameter values.In this example, parameter is divided into two sections by character " ＆ ", i.e.,：Username=admin and Password=123.

Each section after dividing to character " ＆ " is split by character "=", can then obtain independent parameter Name and parameter value.In this example, parameter name includes：Username and password, parameter value include：Admin and 123.

Fig. 5 is the step flow chart of extraction characteristic according to a fourth embodiment of the present invention.

Step S510 extracts the parameter attribute in each URL according to preset parameter type.

The parameter type, including：Ip (Internet Protocol, Internet protocol) address, http and https.

Parameter attribute includes：Value_contain__ip, value_contain_http and value_contain_ https。

value_contain_ip：Whether the argument section (Value) of URL includes the addresses ip or the argument section of URL In include ip number；

value_contain_http：Http number for including in the argument section of URL；

value_contain_https：Https number for including in the argument section of URL.

Step S520 extracts the danger classes feature of each URL according to preset abnormal keyword.

The type of abnormal keyword includes：Low danger keyword, middle danger keyword and high-risk keyword.

Danger classes feature, including：The danger classes of predetermined fraction in URL.

Such as：Danger classes feature, including：The danger classes of uri, name and value in URL.

uri_risk_level：The danger classes of uri；

name_risk_level：The danger classes of name；

value_risk_level：The danger classes of value.

Abnormal keyword may exist in abnormal key word library, exception key word library as shown in Table 1, but this field Technical staff is it is appreciated that the various low danger keyword, middle danger keyword and high-risk keyword in abnormal keyword are not limited to table 1 Shown in content.

Table 1

Low danger keyword, middle danger keyword and high-risk keyword, can embody each section in URL (uri, name and Value danger classes).

Different weighted values can be assigned to different grades of abnormal keyword, the weight of high-risk keyword is 5, and middle danger is closed The weight of key word is 2, and the weight of low danger keyword is 1.Certainly, the weight for not including abnormal keyword is 0.If in a URL The abnormal keyword for including multiple grades, then take highest weighting.

Step S530 extracts the length characteristic, quantative attribute and type feature of each URL according to preset characteristic character.

Characteristic character, including but not limited to：Character "", "/", " ＆ " and "=".

Length characteristic is extracted：

Length characteristic, including but not limited to：uri_length、parameter_length、uri_maxlen、name_ Maxlen and value_maxlen.

uri_length：By "" first half (uri) after cutting overall length (unit：Character)；

parameter_length：By "" latter half (parameter) after cutting overall length；

uri_maxlen：The maximum length that the parts uri pass through path after "/" cutting；

name_maxlen：The maximum length that argument section passes through name after " ＆ " and "=" cutting；

value_maxlen：The maximum length that argument section passes through value after " ＆ " and "=" cutting.

Quantative attribute is extracted：

Quantative attribute, including but not limited to：Uri_number and parameter_number.

uri_number：The parts uri "/" divide after path number；

parameter_number：The number of parameter of the argument section after " ＆ " segmentation.

Type feature extracts：

Can with according to whether for it is empty, whether containing suspicious character, whether be pure digi-tal either pure letter or mixing decile Type in being 9, as shown in table 2：

Type Type	Value Value	Illustrate Example
			Null value	0	″″
Pure digi-tal	1	1234556
			Pure letter	2	Abc
Normal character	3	-
			Number+letter	4	123-111
Number+normal character	5	123abc
			Letter+normal character	6	abc_abc
Number+character+normal character	7	123-abc_
			Other characters	8	＜ a href

Table 2

According to the content in 2, type feature is extracted, uri can essentially be carried out according to type corresponding value Marking.

Type feature, including but not limited to：Uri_type, name_type and value_type.

uri_type：By "" type of preceding part after cutting；

name_type：The type that argument section passes through name after " ＆ " and "=" cutting；

value_type：The type that argument section passes through value after " ＆ " and "=" cutting.

Current embodiment require that illustrating, each sequence of steps in the present embodiment is not fixed, and can be executed with reversed order.

It, can be to the mark of the access log belonging to the characteristic of URL, the URL and the URL according to features described above data Label are packaged, and form the corresponding data model objects DataModel of the URL.

Such as：The data model objects DataModel of one URL is as shown in table 3：

Embodiment five

The present embodiment will be further described through the process for using data model objects to train abnormality detection model.Figure 6 be the step flow chart of training abnormality detection model according to a fifth embodiment of the present invention.Fig. 7 is implemented according to the present invention the 5th The schematic diagram of the training abnormality detection model of example.

Following operation is executed by the decision tree (abnormality detection model) of spark：

Label in data model objects is numbered step S610；Label in the data model objects is institute State the label of the access log belonging to the corresponding URL of data model objects.

The label in data model objects is numbered by StringIndexer classes, using label fields as defeated Enter, inputs transformed field indexedLabel.Can by it is label converting be 1 and 0,1 is expressed as abnormal access daily record, 0 table Show normal access log.

Step S620 converts the characteristic in the data model objects to single-row feature vector.

In order to facilitate abnormality detection model training, the part in data model objects can be arranged and be converted to feature vector, And Uniform Name.

Because arranging (request_uri) comprising characteristic, label column (label) and request uri in data model objects, The single-row feature vector that characteristic is converted to comprising multinomial feature can be arranged by row method for transformation.Row conversion can To be completed using the VectorAssembler classes of spark, using characteristic as input, the single vector row of output one featureVector.Further, first the characteristic converted will be needed to generate an array, passes through VectorAssembler All characteristics are converted to a row featureVector by class.

Step S630 is standardized the single-row feature vector to obtain standardized feature vector.

It is to quickly propel machine learning using standardized means to the purpose that single-row feature vector is standardized Pace of learning, and some character numerical values can also be avoided excessive, lead in abnormality detection model that proportion is excessive, make other Feature becomes secondary index.

The StandardScaler built in spark may be used to be standardized single-row feature vector.

When single-row feature vector to be standardized, it is 0 to make data mean value, variance 1.

It can be selected there are two parameter in the process：

1：Variance is zoomed to 1 by withStd=true

2：Mean value is moved on to 0 by withMean=true.

In the present embodiment, two parameters can be selected all to be set to true, by taking the DataModel of generation as an example, input For single-row feature vector featureVector, export as standardized feature vector scaledFeatures.

Step S640 uses number and the standardized feature vector of the label, the training abnormality detection model.

In the present embodiment, machine learning is carried out using spark decision trees (abnormality detection model), by label column IndexedLabel and standardized feature vector scaledFeatures inputs decision tree, the optimal abnormal inspection after output training Survey model.

The parameter set for needing to define decision tree by ParamGridBuilder () class, can concentrate from multiple parameters, select Optimal parameter set is selected, according to optimal parameter set, the abnormality detection Model Identification accuracy highest of acquisition, and most by accuracy High abnormality detection model is as output.

Decision tree needs the parameter to be used to include：MaxBins, maxDepth, impurity and numTrees.

maxBins：For the maximum division numbers of disruptive features, in the present embodiment, maximum division numbers selectable value is 25、28、31、33；

maxDepth：The maximum height of decision tree, in the present embodiment, the maximum height selectable value of decision tree is 4,6,8, 10；

impurity：Purity computational methods are used for the calculating of information gain, and computational methods only include " entropy " and “gini”；

numTrees：The quantity of decision tree is built, in the present embodiment, the selectable value of numTrees is 10,15,20.

Parameter set is the combination of the selectable value of maxBins, maxDepth, impurity and numTrees.

Furthermore it is possible to carry out the verification of two classification by BinaryClassficationEvaluator, in order to Judge the function admirable of abnormality detection model.

As shown in fig. 7, the process of the present embodiment training abnormality detection model is ten folding cross validation flows, ten foldings intersection is tested Flow is demonstrate,proved using paramsMap, estimator and evaluator as input, trains optimal abnormality detection model.At this In embodiment, Spark supports are trained abnormality detection model by CrossValidator tools.Input parameter includes ParamsMap, estimator and evaluator.

paramsMap：For decision tree need parameter maxBins, maxDepth, impurity to be used and NumTrees carries out the combination of selectable value, obtains multigroup parameter set；

estimator：It is instructed using label column indexedLabel and standardized feature vector scaledFeatures Practice, determines the corresponding training result of every group of parameter set；

evaluator：Training result is assessed, the best abnormality detection model of performance is determined and exports.

Embodiment six

The present embodiment provides a kind of web access exceptions detection devices.

Fig. 8 is the structure chart of web access exception detection devices according to a seventh embodiment of the present invention.

The web access exception detection devices, including：

Training module 810, for according to multiple access logs, training abnormality detection model；Wherein, in the multiple access Daily record includes normal access log and abnormal access daily record.

Receiving module 820, the hypertext transfer protocol http request for receiving user equipment transmission.

Identification module 830, for whether being exception request by http request described in the abnormality detection Model Identification.

Blocking module 840, for the identification module judge the http request for exception request in the case of, intercept The http request.

Further, as shown in figure 9, the training module 810, including：

Processing unit 811 is carried out for obtaining multiple access logs, and to the multiple access log at data cleansing Reason.

Extraction unit 812, for after data cleansing processing, in the multiple access log, extracting each unified money The characteristic of source finger URL URL.

Generation unit 813, for according to each data cleansing handling result of the access log and each URL Characteristic, corresponded to for each URL and generate data model objects.

Training unit 814 is handled each data model objects, and make for the decision tree by spark With treated, data model objects train abnormality detection model.

The processing unit 811, is further used for：Filter out the static file in each access log；To described The URL repeated in multiple access logs carries out duplicate removal processing；Alphabet size is carried out to the URL in the multiple access log Write consistency treatment；Processing is decoded to the URL being encoded in the multiple access log；Add for each access log It tags, the type of the label includes normal sample and exceptional sample；It is right according to pre-prepd normal ULR and exception URL ULR quantity in URL quantity and the corresponding access log of exceptional sample in the corresponding access log of normal sample carries out balanced.

The extraction unit 812, is further used for：According to preset parameter type, the parameter extracted in each URL is special Sign；According to preset abnormal keyword, the danger classes feature of each URL is extracted；According to preset characteristic character, extraction is each Length characteristic, quantative attribute and the type feature of URL.

The training unit 814, is further used for：Label in the data model objects is numbered；The number According to the label that the label in model object is the access log belonging to the corresponding URL of the data model objects；By the data Characteristic in model object is converted into single-row feature vector；The single-row feature vector is standardized, is obtained Standardized feature vector；Use number and the standardized feature vector of the label, the training abnormality detection model.

The function of device described in the present embodiment is described in Fig. 2-embodiments of the method shown in Fig. 7, therefore Not detailed place, may refer to the related description in previous embodiment, this will not be repeated here in the description of the present embodiment.

The present invention carries out feature extraction to a large amount of normal sample and exceptional sample, can grasp normal access and invasion is visited More than ten of feature for asking most critical carries out decision tree classification study by these features, and the one-time detection model of generation is grasped A large amount of intrusion behavior feature.

The present invention can also verify the parameter in decision tree, select optimal parameter combination, generate optimal Abnormality detection model has better protection to intrusion behavior.

The abnormality detection model that the present invention generates has very high flexibility, avoids maintenance regulation this link, reduces Maintenance cost and the abnormality detection model of generation unknown exception can be protected.

Although being example purpose, the preferred embodiment of the present invention is had been disclosed for, those skilled in the art will recognize Various improvement, increase and substitution are also possible, and therefore, the scope of the present invention should be not limited to the above embodiments.

Claims

1. a kind of global wide area network web access exception detection methods, which is characterized in that including：

According to multiple access logs, training abnormality detection model；Wherein, it include normal access day in the multiple access log Will and abnormal access daily record；

Receive the hypertext transfer protocol http request that user equipment is sent；

Whether it is exception request by http request described in the abnormality detection Model Identification；

If the http request is exception request, the http request is intercepted.

2. the method as described in claim 1, which is characterized in that according to multiple access logs, training abnormality detection model, packet It includes：

Multiple access logs are obtained, and data cleansing processing is carried out to the multiple access log；

After data cleansing processing, in the multiple access log, the characteristic of each uniform resource position mark URL is extracted According to；

According to the data cleansing handling result of each access log and the characteristic of each URL, for each institute It states URL and corresponds to generation data model objects；

By the decision tree of spark, each data model objects are handled, and use treated data model pair As training abnormality detection model.

3. method as claimed in claim 2, which is characterized in that carry out data cleansing processing, packet to the multiple access log It includes：

Filter out the static file in each access log；

Duplicate removal processing is carried out to the URL repeated in the multiple access log；

Alphabet size is carried out to the URL in the multiple access log and writes consistency treatment；

Processing is decoded to the URL being encoded in the multiple access log；

Label is added for each access log, the type of the label includes normal sample and exceptional sample；

According to pre-prepd normal ULR and exception URL, to the URL quantity and exception in the corresponding access log of normal sample ULR quantity in the corresponding access log of sample carries out balanced.

4. method as claimed in claim 3, which is characterized in that the characteristic of each URL is extracted, including：

According to preset parameter type, the parameter attribute in each URL is extracted；

According to preset abnormal keyword, the danger classes feature of each URL is extracted；

According to preset characteristic character, the length characteristic, quantative attribute and type feature of each URL are extracted.

5. method as claimed in claim 3, which is characterized in that handle each data model objects, and use Treated, and data model objects train abnormality detection model, including：

Label in the data model objects is numbered；Label in the data model objects is the data model The label of access log belonging to the corresponding URL of object；

Convert the characteristic in the data model objects to single-row feature vector；

The single-row feature vector is standardized, standardized feature vector is obtained；

Use number and the standardized feature vector of the label, the training abnormality detection model.

6. a kind of web access exceptions detection device, which is characterized in that including：

Training module, for according to multiple access logs, training abnormality detection model；Wherein, in the multiple access log Including normal access log and abnormal access daily record；

Receiving module, the hypertext transfer protocol http request for receiving user equipment transmission；

Identification module, for whether being exception request by http request described in the abnormality detection Model Identification；

Blocking module, for the identification module judge the http request for exception request in the case of, described in interception Http request.

7. device as claimed in claim 6, which is characterized in that the training module, including：

Processing unit carries out data cleansing processing for obtaining multiple access logs, and to the multiple access log；

Extraction unit, for after data cleansing processing, in the multiple access log, extracting each uniform resource locator The characteristic of URL；

Generation unit, for according to the data cleansing handling result of each access log and the feature of each URL Data correspond to for each URL and generate data model objects；

Training unit is handled each data model objects for the decision tree by spark, and uses processing Data model objects afterwards train abnormality detection model.

8. device as claimed in claim 7, which is characterized in that the processing unit is further used for：

Filter out the static file in each access log；

Processing is decoded to the URL being encoded in the multiple access log；

9. device as claimed in claim 8, which is characterized in that the extraction unit is further used for：

10. device as claimed in claim 8, which is characterized in that the training unit is further used for：