CN109218294A

CN109218294A - Anti-scanning method, device and server based on machine learning bayesian algorithm

Info

Publication number: CN109218294A
Application number: CN201810957134.0A
Authority: CN
Inventors: 唐其彪; 范渊
Original assignee: Hangzhou Dbappsecurity Technology Co Ltd
Current assignee: Hangzhou Dbappsecurity Technology Co Ltd
Priority date: 2018-08-21
Filing date: 2018-08-21
Publication date: 2019-01-15

Abstract

The present invention provides a kind of anti-scanning method based on machine learning bayesian algorithm, device and servers, wherein this method is applied to server, this method comprises: the access log of acquisition client current accessed behavior；The characteristic value of access log is extracted from access log；Characteristic value is input in preset scanning behavior identification model, recognition result is exported；The scanning behavior identification model is obtained by NB Algorithm model training；If recognition result shows that current accessed behavior is scanning behavior, the corresponding IP address of identification current accessed behavior；The access behavior that IP address issues is intercepted in network layer.The present invention establishes scanning behavior identification model by way of machine learning bayesian algorithm, identifies scanning behavior according to scanning behavior identification model, improves the discrimination of scanning behavior, reduce rate of failing to report, also reduces the testing cost for scanning behavior.

Description

Anti-scanning method, device and server based on machine learning bayesian algorithm

Technical field

The present invention relates to safe web page protection technology field, more particularly, to a kind of based on machine learning bayesian algorithm Anti-scanning method, device and server.

Background technique

With the development of internet technology, web application system have been widely used for government portals, e-commerce, The industries such as internet still while providing amenities for the people and working, also bring Network Security Vulnerabilities.Hacker utilizes scanning skill Art, which can not only find server loophole, is attacked and scans the mass data message of generation and also occupy a large amount of network Bandwidth causes normal network communication that can not carry out.Currently, mainly passing through simple statistics method and height for scanning behavior Level security expert scans behavior by experience manual identified, and both methods discrimination is low, and in magnanimity access log, Heavy workload, rate of failing to report is high, effectively cannot go out scanning behavior to guarantee network security by recognition detection.

Summary of the invention

In view of this, the purpose of the present invention is to provide a kind of anti-scanning method based on machine learning bayesian algorithm, Device and server reduce rate of failing to report to improve the discrimination of scanning behavior, reduce the testing cost of scanning behavior.

In a first aspect, the embodiment of the invention provides a kind of anti-scanning method based on machine learning bayesian algorithm, In, this method is applied to server, this method comprises: the access log of acquisition client current accessed behavior；From access log The middle characteristic value for extracting access log；Characteristic value is input in preset scanning behavior identification model, recognition result is exported；It sweeps Activity recognition model is retouched to obtain by NB Algorithm model training；If recognition result shows that current accessed behavior is to sweep Retouch behavior, the corresponding IP address of identification current accessed behavior；The access behavior that IP address issues is intercepted in network layer.

With reference to first aspect, the embodiment of the invention provides the first possible embodiments of first aspect, wherein from The step of characteristic value of access log is extracted in access log includes: log lazy weight two seconds or insufficient in removal access log 100 IP address；Feature extraction is carried out to the access log after removal, obtains the characteristic value of access log.

With reference to first aspect, the embodiment of the invention provides second of possible embodiments of first aspect, wherein special Value indicative include answer code, the tangent value of the past two seconds log measuring angles, the past two seconds with this access log identical IP Number accountings, 404 accountings of the past two seconds IPs identical with this access log, two seconds identical IP with this access log of past end 100 mouth variance, the number accounting of 100 logs of past IP identical with this log, past logs identical IP with this log 404 accountings, and it is a variety of in the port variance of the IP identical with this log of 100 logs in the past.

The possible embodiment of second with reference to first aspect, the embodiment of the invention provides the third of first aspect Possible embodiment, wherein this method further include: two seconds identical IP of past, which are arranged, less than the variance yields of 100 ports is 65535；It is 65535 that the identical IP of 100 logs in the past, which is arranged, less than the variance yields of 3 ports.

With reference to first aspect, the embodiment of the invention provides the 4th kind of possible embodiments of first aspect, wherein sweeps Activity recognition model is retouched, is obtained especially by following manner: acquisition client access log sample；Access log sample includes sweeping Retouch device user behaviors log sample and normal access log sample；Build initial NB Algorithm model；Extract access log The characteristic value of sample；Access log sample is divided into specified number, cross-validation method is rolled over using K, at least a will be visited in turn It asks that the characteristic value of log sample is input in initial NB Algorithm model to be trained, obtains scanning Activity recognition mould Type；The characteristic value of remaining at least a access log sample is identified by scanning behavior identification model, output identification knot Fruit；The actual result for comparing recognition result access log sample corresponding with recognition result, obtains scanning behavior identification model Accuracy rate and recall rate；Accuracy rate is recognition result and actual result is that genuine access log sample size is with recognition result The ratio of genuine access log sample size；Recall rate is recognition result and actual result is genuine access log sample size It is the ratio of genuine access log sample size with actual result；According to accuracy rate and recall rate adjustment scanning Activity recognition mould Type.

The 4th kind of possible embodiment with reference to first aspect, the embodiment of the invention provides the 5th kind of first aspect Possible embodiment, wherein scanner user behaviors log sample includes: scanning software log in 1 hour, and filtering production environment Scanning behavior IP and the continuous log in 1 hour of extraction；Normal access log sample includes: the normal access log of production environment, and The log for being 200 by log answer code after filtering rule.

With reference to first aspect, the embodiment of the invention provides the 6th kind of possible embodiments of first aspect, wherein Network layer intercepts the step of access behavior that IP address issues, comprising: intercept the access behavior that currently issues of IP address and/or after The access behavior that supervention goes out.

With reference to first aspect, the embodiment of the invention provides the 7th kind of possible embodiments of first aspect, wherein should Method further include: the corresponding scanning IP address of identification scanning behavior；According to the data information and threat level of preset IP address Corresponding relationship, impend grade classification to scanning IP address；With being identified as the IP of scanning IP address to continuous predetermined number of times Location carries out exponential duration block；The domain name that the IP address that exponential duration is blocked is scanned is sent to the corresponding service of domain name Device.

Second aspect, the embodiment of the invention also provides a kind of anti-scanning means based on machine learning bayesian algorithm, Wherein, which is set to server, which includes: acquisition module, for acquiring the access of client current accessed behavior Log；Extraction module, for extracting the characteristic value of access log from access log；Identification module, for inputting characteristic value To in preset scanning behavior identification model, recognition result is exported；Scanning behavior identification model passes through NB Algorithm mould Type training obtains；It identifies address module, if showing that current accessed behavior is scanning behavior for recognition result, identifies current visit Ask behavior corresponding IP address；Blocking module, for intercepting the access behavior that IP address issues in network layer.

The third aspect, the embodiment of the invention also provides a kind of servers, wherein including processor and machine readable storage Medium, machine readable storage medium are stored with the machine-executable instruction that can be executed by processor, and processor executes machine can It executes instruction to realize method described in above-mentioned first aspect.

The embodiment of the present invention bring it is following the utility model has the advantages that

The present invention provides a kind of anti-scanning method based on machine learning bayesian algorithm, device and server, acquisitions The access log of client current accessed behavior；The characteristic value of access log is extracted from access log；Characteristic value is input to In preset scanning behavior identification model, recognition result is exported；The scanning behavior identification model passes through NB Algorithm mould Type training obtains；If recognition result shows that current accessed behavior is scanning behavior, with identifying the corresponding IP of current accessed behavior Location；The access behavior that IP address issues is intercepted in network layer.The present invention is established by way of machine learning bayesian algorithm and is swept Activity recognition model is retouched, scanning behavior is identified according to scanning behavior identification model, improves the discrimination of scanning behavior, is reduced Rate of failing to report, also reduces the testing cost of scanning behavior.

Other features and advantages of the present invention will illustrate in the following description, alternatively, Partial Feature and advantage can be with Deduce from specification or unambiguously determine, or by implementing above-mentioned technology of the invention it can be learnt that.

To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, better embodiment is cited below particularly, and match Appended attached drawing is closed, is described in detail below.

Detailed description of the invention

It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.

Fig. 1 is a kind of process of the anti-scanning method based on machine learning bayesian algorithm provided in an embodiment of the present invention Figure；

Fig. 2 is a kind of data distribution architecture schematic diagram provided in an embodiment of the present invention；

Fig. 3 is a kind of flow chart for the method for establishing scanning behavior identification model provided in an embodiment of the present invention；

Fig. 4 is a kind of method flow diagram for intercepting the access behavior that scanning IP address issues provided in an embodiment of the present invention；

Fig. 5 is that a kind of structure of the anti-scanning means based on machine learning bayesian algorithm provided in an embodiment of the present invention is shown It is intended to.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.

Currently, traditional safe web page means of defence, is applied to web service end, web service end receives target user's hair The accessing page request sent；If the page link that accessing page request is accessed is preset camouflage link, generation is directed to The camouflage page of camouflage link；Wherein it is invisible to link the normal users for being pre-set to be directed to Non-scanning mode equipment for camouflage； The camouflage page includes the camouflage link for being linked to next stage camouflage subpage frame；The camouflage page is returned to target user, to scanning Equipment is protected for the illegal vulnerability scanning at web service end, and this method need to distinguish scanning device in advance and set with Non-scanning mode Standby, the scanning behavior of the equipment for being not set to scanning device can not be identified effectively, be based on this, and the present invention is implemented A kind of anti-scanning method based on machine learning bayesian algorithm, device and the server that example provides can be applied to detection and know It Sao Miao not be in the scene of behavior.

To be based on machine learning shellfish to one kind disclosed in the embodiment of the present invention first convenient for understanding the present embodiment The anti-scanning method of this algorithm of leaf describes in detail.

A kind of flow chart of anti-scanning method based on machine learning bayesian algorithm shown in Figure 1, wherein the party Method is applied to server, and specific step is as follows for this method:

Step S102 acquires the access log of client current accessed behavior；

Web log file is an indispensable ring in information security, is had in system exception, user behavior analysis etc. non- Normal important role, web log file are the various raw informations such as mistake when record web page server reception processing is requested and run The file to be ended up with " .log ", by web log file can be clear that user what IP address, when, with assorted Which page of which website, Yi Jishi are had accessed in the case where operating system, what browser, what resolution display It is no to access successfully.

The embodiment of the present invention obtains the data information of the corresponding client of current accessed behavior by acquisition access log, In order to carry out processing identification to data information, judge whether the access behavior is scanning behavior.

Step S104 extracts the characteristic value of access log from access log；

During carrying out characteristics extraction, log lazy weight two seconds is removed in access log or less than 100 IP address；Feature extraction is carried out to the access log after removal, obtains the characteristic value of access log.

Features described above value includes answer code, the tangent value of the past two seconds log measuring angles, past two seconds and this access The number accounting of the identical IP of log, 404 accountings of the past two seconds identical IP with this access log, past two seconds and this access The port variance of the identical IP of log, the number accounting of 100 logs of past identical IP with this log, 100 logs of past and It is a variety of in 404 accountings of the identical IP of this log, and the port variance of the IP identical with this log of 100 logs in the past.

The characteristic value extracted is handled, two seconds identical IP of setting past are less than the variance yields of 100 ports 65535；It is 65535 that the identical IP of 100 logs in the past, which is arranged, less than the variance yields of 3 ports.

Characteristic value is input in preset scanning behavior identification model by step S106, exports recognition result；Scanning behavior Identification model is obtained by NB Algorithm model training；

Above-mentioned characteristic value after treatment is input in preset scanning behavior identification model, is known according to scanning behavior Other model identifies whether current accessed behavior is scanning behavior.

Scanning behavior identification model is established by the way of machine learning bayesian algorithm, using NB Algorithm Data are classified, i.e., current accessed behavior is classified according to preset NB Algorithm, is divided into scan line For or normally access behavior.

Above-mentioned NB Algorithm is classified based on feature independence assumed condition and using probability statistics knowledge Algorithm, shown in a kind of data distribution architecture schematic diagram as shown in Figure 2, it is assumed that have a data set, which has two class numbers According to classification 1 (classification that figure orbicular spot indicates) and classification 2 (classification that figure intermediate cam shape indicates), it is assumed that p1 (x, y) indicates data Point (x, y) belongs to the probability of classification 1, and p2 (x, y) indicates that data point (x, y) belongs to the probability of classification 2, the data new for one Point (x, y), if p1 (x, y) > p2 (x, y), which is classification 1, otherwise belongs to classification 2.

Current accessed behavior is calculated using NB Algorithm by means of following formula,

Wherein, p (c_i| x, y) indicate that data point (x, y) belongs to c_iProbability, p (x, y | c_i) indicate c_iClass data point (x, y) The probability of appearance.If scanning behavior is set as classification c₁, normal access behavior is set as classification c₂If p (c₁|x,y)>p(c₂|x, Y), then current accessed behavior (x, y) belongs to classification c₁, i.e. current accessed behavior is identified as scanning behavior, otherwise, current to visit The behavior of asking is identified as normal access behavior.

Step S108, if recognition result shows that current accessed behavior is scanning behavior, identification current accessed behavior is corresponding IP address；

Step S110 intercepts the access behavior that IP address issues in network layer.

The above-mentioned access behavior for intercepting IP address sending in network layer includes the access behavior for intercepting IP address and currently issuing And/or the access behavior of subsequent sending.

By being identified in real time to current accessed behavior, according to recognition result in time to the corresponding IP address of scanning behavior It is intercepted in network layer, can effectively prevent the premeditated attack initiated of hacker, some unnecessary data packets is avoided to exist Flow is occupied in network, improves the utilization rate of network bandwidth, has ensured the safe operation of enterprise network, high with accurate rate, Low, the at low cost feature of rate of false alarm.

The embodiment of the invention provides a kind of anti-scanning method based on machine learning bayesian algorithm, this method is by adopting Collect the access log of client current accessed behavior；The characteristic value of access log is extracted from access log；Characteristic value is inputted To in preset scanning behavior identification model, recognition result is exported；The scanning behavior identification model passes through NB Algorithm Model training obtains；If recognition result shows that current accessed behavior is scanning behavior, the corresponding IP of identification current accessed behavior Address；The access behavior that IP address issues is intercepted in network layer.The embodiment of the present invention passes through the side of machine learning bayesian algorithm Formula establishes scanning behavior identification model, identifies scanning behavior according to scanning behavior identification model, improves the knowledge of scanning behavior Not rate reduces rate of failing to report, while also reducing the testing cost of detection scanning behavior.

A kind of flow chart of the method for foundation scanning behavior identification model shown in Figure 3, this method are shown in Fig. 1 It is realized on the basis of embodiment of the method, in the present embodiment, emphasis describes to establish the specific implementation of scanning behavior identification model, Steps are as follows:

Step S302 acquires client access log sample；

Above-mentioned access log sample includes scanner user behaviors log sample and normal access log sample；Scanner behavior day Will sample includes: scanning software log in 1 hour, and filtering production environment scanning behavior IP and the continuous log in 1 hour of extraction；Just Normal access log sample includes: the normal access log of production environment, and the day for being 200 by log answer code after filtering rule Will.

Step S304 builds initial NB Algorithm model；

Step S306 extracts the characteristic value of access log sample；

The characteristic value extracted is handled, two seconds identical IP of setting past are less than the variance yields of 100 ports 65535；It is 65535 that the identical IP of 100 logs in the past, which is arranged, less than the variance yields of 3 ports, is gone over two seconds to scanning behavior Log measuring angle tangent value is set as 45, represents the angle of scanning behavior.

Access log sample is divided into specified number by step S308, rolls over cross-validation method using K, in turn will at least one The characteristic value of part access log sample, which is input in initial NB Algorithm model, to be trained, and scanning behavior knowledge is obtained Other model；

With the feature vector of mark scan behavior for 1 in the embodiment of the present invention, marking the feature vector that normally accesses is 0 to be Example, is trained with NB Algorithm model, and being trained verifying by K folding cross method, (it is that will count that K, which rolls over cross validation, It is divided into K parts according to collection, wherein at least a will be used as training sample in turn, remaining number finally ties test as test sample Fruit takes mean value).

Step S310 knows the characteristic value of remaining at least a access log sample by scanning behavior identification model Not, recognition result is exported；

According to example described in above-mentioned steps S308, recognition result is 1 or 0, if 1, then current accessed log sample quilt It is identified as scanning behavior, if 0, then behavior is asked in the current accessed log sample frequentation that is positive.

Step S312, the actual result of comparison recognition result access log sample corresponding with recognition result, is scanned The accuracy rate and recall rate of Activity recognition model；

Accuracy rate is recognition result and actual result be genuine access log sample size with recognition result is really to visit Ask the ratio of log sample size；Recall rate is recognition result and actual result is genuine access log sample size and reality It as a result is the ratio of genuine access log sample size.

Step S314 adjusts scanning behavior identification model according to accuracy rate and recall rate.

The embodiment of the present invention passes through acquisition scans device user behaviors log and normal access log as access log sample, to this Sample carries out feature extraction, and the characteristic value of a part of sample is input in the NB Algorithm model built in advance and is carried out Training, obtains scanning behavior identification model, the characteristic value of the sample of remainder is input to established scanning Activity recognition In model, the actual result of the recognition result of the model and sample is compared, is completed by accuracy rate and recall rate to this The verification processing of model constantly adjusts scanning behavior identification model according to accuracy rate and recall rate, ties it in identification process Fruit is more acurrate, improves the discrimination of scanning behavior, reduces rate of failing to report.

Anti-scanning method the embodiment of the invention provides another kind based on machine learning bayesian algorithm, this method is upper It states and realizes on the basis of embodiment the method；In the present embodiment, emphasis description intercepts the corresponding IP of scanning behavior in network layer The specific implementation for the access behavior that address issues.

As shown in figure 4, specific step is as follows for the access behavior of above-mentioned interception scanning IP address sending:

Step S402, the corresponding scanning IP address of identification scanning behavior；

According to scanning behavior identification model identify as a result, will be identified that the corresponding IP of access behavior of scanning behavior The scanning IP address is blocked in network layer as scanning IP address in address.

Step S404, according to the corresponding relationship of the data information of preset IP address and threat level, to scanning IP address Impend grade classification；

According to the threat level of the threat degree judgement scanning IP address of scanning IP address, threatened with being divided according to answer code Be illustrated for grade, 404 accounting of answer code be greater than 80% or 403 accounting be greater than and 80% be defined as high-risk, answer code 404 Accounting be greater than 50% and less than 80% or 403 accounting be greater than and 50% and be defined as middle danger less than 80%, otherwise being defined as Low danger, threat level is higher, and block duration is accordingly higher.

Step S406, the IP address for being identified as scanning IP address to continuous predetermined number of times carry out exponential duration block；

Predetermined number of times is preset, when carrying out exponential for the IP address that continuous predetermined number of times is identified as scanning IP address Long block, targetedly block in time can effectively prevent the premeditated attack initiated of hacker, avoid some unnecessary Data packet occupies flow in a network, improves the utilization rate of network bandwidth, ensures the safe operation of enterprise network.

The domain name that the IP address that exponential duration is blocked is scanned is sent to the corresponding server of domain name by step S408.

The domain name that the IP address that exponential duration is blocked is scanned is sent to the corresponding server of domain name, informs the service The administrator of device, so that administrator carries out the processing of next step to the IP address.

The embodiment of the present invention by the way that the block of certain time length is carried out according to threat degree to the corresponding IP address of scanning behavior, And it is sent to the corresponding server of domain name scanned by the IP address by the too long IP address of duration is blocked, by the server Administrator in time handles it to the IP address, can prevent the premeditated scanning attack behavior initiated of hacker in time, ensure net The safe operation of network.

Corresponding to above method embodiment, the embodiment of the invention also provides a kind of based on machine learning bayesian algorithm Anti- scanning means, as shown in Figure 5, wherein the device is set to server-side, which includes:

Acquisition module 50, for acquiring the access log of client current accessed behavior；

Extraction module 51, for extracting the characteristic value of access log from access log；

Identification module 52 exports recognition result for characteristic value to be input in preset scanning behavior identification model；It sweeps Activity recognition model is retouched to obtain by NB Algorithm model training；

It identifies address module 53, if showing that current accessed behavior is scanning behavior for recognition result, identifies current visit Ask behavior corresponding IP address；

Blocking module 54, for intercepting the access behavior that IP address issues in network layer.

Anti- scanning means provided in an embodiment of the present invention based on machine learning bayesian algorithm is provided with above-described embodiment The technical characteristic having the same of the anti-scanning method based on machine learning bayesian algorithm, so also can solve identical technology Problem reaches identical technical effect.

Corresponding to foregoing invention embodiment, the embodiment of the invention also provides a kind of servers, wherein including processor and Machine readable storage medium, machine readable storage medium are stored with the machine-executable instruction that can be executed by processor, processing Device executes machine-executable instruction to realize the above-mentioned anti-scanning method based on machine learning bayesian algorithm.

Specifically, processor may be a kind of IC chip, the processing capacity with signal.During realization, Each step of the above method can be completed by the integrated logic circuit of the hardware in processor or the instruction of software form.On The processor stated can be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), net Network processor (Network Processor, abbreviation NP) etc.；It can also be digital signal processor (Digital Signal Processing, abbreviation DSP), specific integrated circuit (Application Specific Integrated Circuit, referred to as ASIC), ready-made programmable gate array (Field-Programmable Gate Array, abbreviation FPGA) or other are programmable Logical device, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute in the embodiment of the present invention Disclosed each method, step and logic diagram.General processor can be microprocessor or the processor is also possible to appoint What conventional processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in hardware decoding processing Device executes completion, or in decoding processor hardware and software module combination execute completion.Software module can be located at Machine memory, flash memory, read-only memory, programmable read only memory or electrically erasable programmable memory, register etc. are originally In the storage medium of field maturation.The storage medium is located at memory, and processor reads the information in memory, in conjunction with its hardware The step of completing the method for previous embodiment.

Specifically, machine readable storage medium is stored with machine-executable instruction, the machine-executable instruction is processed When device is called and executed, machine-executable instruction promotes processor to realize the above-mentioned anti-scanning based on machine learning bayesian algorithm Method, specific implementation can be found in embodiment of the method, and details are not described herein.

Anti-scanning method, device and server based on machine learning bayesian algorithm provided by the embodiment of the present invention with And the computer program product of system, the computer readable storage medium including storing program code, what program code included Instruction can be used for executing previous methods method as described in the examples, and specific implementation can be found in embodiment of the method, no longer superfluous herein It states.

It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.

Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features；And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention Within the scope of.Therefore, protection scope of the present invention should be subject to the protection scope in claims.

Claims

1. a kind of anti-scanning method based on machine learning bayesian algorithm, which is characterized in that the method is applied to server, The described method includes:

Acquire the access log of client current accessed behavior；

The characteristic value of the access log is extracted from the access log；

The characteristic value is input in preset scanning behavior identification model, recognition result is exported；The scanning Activity recognition Model is obtained by NB Algorithm model training；

If the recognition result shows that current accessed behavior is scanning behavior, with identifying the corresponding IP of the current accessed behavior Location；

The access behavior that the IP address issues is intercepted in network layer.

2. the method according to claim 1, wherein extracting the spy of the access log from the access log The step of value indicative includes:

Remove in the access log log lazy weight two seconds or the IP address less than 100；

Feature extraction is carried out to the access log after removal, obtains the characteristic value of the access log.

3. the method according to claim 1, wherein the characteristic value includes the log of answer code, the past two seconds Number accounting, past two seconds and this access log of the tangent value of measuring angle, the past two seconds identical IP with this access log 404 accountings of identical IP, the port variance of the past two seconds identical IP with this access log, past 100 logs and this next day 404 accountings of the number accounting of the identical IP of will, 100 logs of past identical IP with this log, and 100 logs in the past and It is a variety of in the port variance of the identical IP of this log.

4. according to the method described in claim 3, it is characterized in that, the method also includes:

It is 65535 that two seconds identical IP of past, which are arranged, less than the variance yields of 100 ports；

It is 65535 that the identical IP of 100 logs in the past, which is arranged, less than the variance yields of 3 ports.

5. the method according to claim 1, wherein the scanning behavior identification model, especially by following sides Formula obtains:

Acquire client access log sample；The access log sample includes scanner user behaviors log sample and normal access day Will sample；

Build initial NB Algorithm model；

Extract the characteristic value of the access log sample；

The access log sample is divided into specified number, cross-validation method is rolled over using K, it in turn will at least a access The characteristic value of log sample is input in the initial NB Algorithm model and is trained, and obtains scanning Activity recognition Model；

The characteristic value of remaining at least a access log sample is identified by the scanning behavior identification model, it is defeated Recognition result out；

The actual result for comparing recognition result access log sample corresponding with the recognition result, obtains the scan line For the accuracy rate and recall rate of identification model；The accuracy rate is the recognition result and actual result is genuine access log Sample size and recognition result are the ratio of genuine access log sample size；The recall rate is the recognition result and reality Result is genuine access log sample size and actual result is the ratio of genuine access log sample size；

The scanning behavior identification model is adjusted according to the accuracy rate and the recall rate.

6. according to the method described in claim 5, it is characterized in that, the scanner user behaviors log sample includes: scanning software 1 Hour log, and filtering production environment scanning behavior IP and the continuous log in 1 hour of extraction；The normal access log sample packet It includes: the normal access log of production environment, and the log for being 200 by log answer code after filtering rule.

7. the method according to claim 1, wherein described intercept the access that the IP address issues in network layer The step of behavior, comprising: intercept the access behavior of access behavior and/or subsequent sending that the IP address currently issues.

8. the method according to claim 1, wherein the method also includes:

Identify the corresponding scanning IP address of the scanning behavior；

According to the corresponding relationship of the data information of preset IP address and threat level, impend to the scanning IP address Grade divides；

The IP address for being identified as scanning IP address to continuous predetermined number of times carries out exponential duration block；

The domain name that the IP address of the exponential duration block is scanned is sent to the corresponding server of domain name.

9. a kind of anti-scanning means based on machine learning bayesian algorithm, which is characterized in that described device is set to server, Described device includes:

Acquisition module, for acquiring the access log of client current accessed behavior；

Extraction module, for extracting the characteristic value of the access log from the access log；

Identification module exports recognition result for the characteristic value to be input in preset scanning behavior identification model；It is described Scanning behavior identification model is obtained by NB Algorithm model training；

It identifies address module, if showing that current accessed behavior is scanning behavior for the recognition result, identifies described current The corresponding IP address of access behavior；

Blocking module, for intercepting the access behavior that the IP address issues in network layer.

10. a kind of server, which is characterized in that including processor and machine readable storage medium, the machine readable storage is situated between Matter is stored with the machine-executable instruction that can be executed by the processor, and the processor executes the machine-executable instruction To realize the described in any item methods of claim 1 to 8.