CN109218294A - Anti-scanning method, device and server based on machine learning bayesian algorithm - Google Patents
Anti-scanning method, device and server based on machine learning bayesian algorithm Download PDFInfo
- Publication number
- CN109218294A CN109218294A CN201810957134.0A CN201810957134A CN109218294A CN 109218294 A CN109218294 A CN 109218294A CN 201810957134 A CN201810957134 A CN 201810957134A CN 109218294 A CN109218294 A CN 109218294A
- Authority
- CN
- China
- Prior art keywords
- behavior
- scanning
- log
- access log
- access
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/1466—Active attacks involving interception, injection, modification, spoofing of data unit addresses, e.g. hijacking, packet injection or TCP sequence number attacks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L61/00—Network arrangements, protocols or services for addressing or naming
- H04L61/45—Network directories; Name-to-address mapping
- H04L61/4505—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
- H04L61/4511—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1433—Vulnerability analysis
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The present invention provides a kind of anti-scanning method based on machine learning bayesian algorithm, device and servers, wherein this method is applied to server, this method comprises: the access log of acquisition client current accessed behavior;The characteristic value of access log is extracted from access log;Characteristic value is input in preset scanning behavior identification model, recognition result is exported;The scanning behavior identification model is obtained by NB Algorithm model training;If recognition result shows that current accessed behavior is scanning behavior, the corresponding IP address of identification current accessed behavior;The access behavior that IP address issues is intercepted in network layer.The present invention establishes scanning behavior identification model by way of machine learning bayesian algorithm, identifies scanning behavior according to scanning behavior identification model, improves the discrimination of scanning behavior, reduce rate of failing to report, also reduces the testing cost for scanning behavior.
Description
Technical field
The present invention relates to safe web page protection technology field, more particularly, to a kind of based on machine learning bayesian algorithm
Anti-scanning method, device and server.
Background technique
With the development of internet technology, web application system have been widely used for government portals, e-commerce,
The industries such as internet still while providing amenities for the people and working, also bring Network Security Vulnerabilities.Hacker utilizes scanning skill
Art, which can not only find server loophole, is attacked and scans the mass data message of generation and also occupy a large amount of network
Bandwidth causes normal network communication that can not carry out.Currently, mainly passing through simple statistics method and height for scanning behavior
Level security expert scans behavior by experience manual identified, and both methods discrimination is low, and in magnanimity access log,
Heavy workload, rate of failing to report is high, effectively cannot go out scanning behavior to guarantee network security by recognition detection.
Summary of the invention
In view of this, the purpose of the present invention is to provide a kind of anti-scanning method based on machine learning bayesian algorithm,
Device and server reduce rate of failing to report to improve the discrimination of scanning behavior, reduce the testing cost of scanning behavior.
In a first aspect, the embodiment of the invention provides a kind of anti-scanning method based on machine learning bayesian algorithm,
In, this method is applied to server, this method comprises: the access log of acquisition client current accessed behavior;From access log
The middle characteristic value for extracting access log;Characteristic value is input in preset scanning behavior identification model, recognition result is exported;It sweeps
Activity recognition model is retouched to obtain by NB Algorithm model training;If recognition result shows that current accessed behavior is to sweep
Retouch behavior, the corresponding IP address of identification current accessed behavior;The access behavior that IP address issues is intercepted in network layer.
With reference to first aspect, the embodiment of the invention provides the first possible embodiments of first aspect, wherein from
The step of characteristic value of access log is extracted in access log includes: log lazy weight two seconds or insufficient in removal access log
100 IP address;Feature extraction is carried out to the access log after removal, obtains the characteristic value of access log.
With reference to first aspect, the embodiment of the invention provides second of possible embodiments of first aspect, wherein special
Value indicative include answer code, the tangent value of the past two seconds log measuring angles, the past two seconds with this access log identical IP
Number accountings, 404 accountings of the past two seconds IPs identical with this access log, two seconds identical IP with this access log of past end
100 mouth variance, the number accounting of 100 logs of past IP identical with this log, past logs identical IP with this log
404 accountings, and it is a variety of in the port variance of the IP identical with this log of 100 logs in the past.
The possible embodiment of second with reference to first aspect, the embodiment of the invention provides the third of first aspect
Possible embodiment, wherein this method further include: two seconds identical IP of past, which are arranged, less than the variance yields of 100 ports is
65535;It is 65535 that the identical IP of 100 logs in the past, which is arranged, less than the variance yields of 3 ports.
With reference to first aspect, the embodiment of the invention provides the 4th kind of possible embodiments of first aspect, wherein sweeps
Activity recognition model is retouched, is obtained especially by following manner: acquisition client access log sample;Access log sample includes sweeping
Retouch device user behaviors log sample and normal access log sample;Build initial NB Algorithm model;Extract access log
The characteristic value of sample;Access log sample is divided into specified number, cross-validation method is rolled over using K, at least a will be visited in turn
It asks that the characteristic value of log sample is input in initial NB Algorithm model to be trained, obtains scanning Activity recognition mould
Type;The characteristic value of remaining at least a access log sample is identified by scanning behavior identification model, output identification knot
Fruit;The actual result for comparing recognition result access log sample corresponding with recognition result, obtains scanning behavior identification model
Accuracy rate and recall rate;Accuracy rate is recognition result and actual result is that genuine access log sample size is with recognition result
The ratio of genuine access log sample size;Recall rate is recognition result and actual result is genuine access log sample size
It is the ratio of genuine access log sample size with actual result;According to accuracy rate and recall rate adjustment scanning Activity recognition mould
Type.
The 4th kind of possible embodiment with reference to first aspect, the embodiment of the invention provides the 5th kind of first aspect
Possible embodiment, wherein scanner user behaviors log sample includes: scanning software log in 1 hour, and filtering production environment
Scanning behavior IP and the continuous log in 1 hour of extraction;Normal access log sample includes: the normal access log of production environment, and
The log for being 200 by log answer code after filtering rule.
With reference to first aspect, the embodiment of the invention provides the 6th kind of possible embodiments of first aspect, wherein
Network layer intercepts the step of access behavior that IP address issues, comprising: intercept the access behavior that currently issues of IP address and/or after
The access behavior that supervention goes out.
With reference to first aspect, the embodiment of the invention provides the 7th kind of possible embodiments of first aspect, wherein should
Method further include: the corresponding scanning IP address of identification scanning behavior;According to the data information and threat level of preset IP address
Corresponding relationship, impend grade classification to scanning IP address;With being identified as the IP of scanning IP address to continuous predetermined number of times
Location carries out exponential duration block;The domain name that the IP address that exponential duration is blocked is scanned is sent to the corresponding service of domain name
Device.
Second aspect, the embodiment of the invention also provides a kind of anti-scanning means based on machine learning bayesian algorithm,
Wherein, which is set to server, which includes: acquisition module, for acquiring the access of client current accessed behavior
Log;Extraction module, for extracting the characteristic value of access log from access log;Identification module, for inputting characteristic value
To in preset scanning behavior identification model, recognition result is exported;Scanning behavior identification model passes through NB Algorithm mould
Type training obtains;It identifies address module, if showing that current accessed behavior is scanning behavior for recognition result, identifies current visit
Ask behavior corresponding IP address;Blocking module, for intercepting the access behavior that IP address issues in network layer.
The third aspect, the embodiment of the invention also provides a kind of servers, wherein including processor and machine readable storage
Medium, machine readable storage medium are stored with the machine-executable instruction that can be executed by processor, and processor executes machine can
It executes instruction to realize method described in above-mentioned first aspect.
The embodiment of the present invention bring it is following the utility model has the advantages that
The present invention provides a kind of anti-scanning method based on machine learning bayesian algorithm, device and server, acquisitions
The access log of client current accessed behavior;The characteristic value of access log is extracted from access log;Characteristic value is input to
In preset scanning behavior identification model, recognition result is exported;The scanning behavior identification model passes through NB Algorithm mould
Type training obtains;If recognition result shows that current accessed behavior is scanning behavior, with identifying the corresponding IP of current accessed behavior
Location;The access behavior that IP address issues is intercepted in network layer.The present invention is established by way of machine learning bayesian algorithm and is swept
Activity recognition model is retouched, scanning behavior is identified according to scanning behavior identification model, improves the discrimination of scanning behavior, is reduced
Rate of failing to report, also reduces the testing cost of scanning behavior.
Other features and advantages of the present invention will illustrate in the following description, alternatively, Partial Feature and advantage can be with
Deduce from specification or unambiguously determine, or by implementing above-mentioned technology of the invention it can be learnt that.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, better embodiment is cited below particularly, and match
Appended attached drawing is closed, is described in detail below.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art
Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below
Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor
It puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of process of the anti-scanning method based on machine learning bayesian algorithm provided in an embodiment of the present invention
Figure;
Fig. 2 is a kind of data distribution architecture schematic diagram provided in an embodiment of the present invention;
Fig. 3 is a kind of flow chart for the method for establishing scanning behavior identification model provided in an embodiment of the present invention;
Fig. 4 is a kind of method flow diagram for intercepting the access behavior that scanning IP address issues provided in an embodiment of the present invention;
Fig. 5 is that a kind of structure of the anti-scanning means based on machine learning bayesian algorithm provided in an embodiment of the present invention is shown
It is intended to.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention
Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than
Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise
Under every other embodiment obtained, shall fall within the protection scope of the present invention.
Currently, traditional safe web page means of defence, is applied to web service end, web service end receives target user's hair
The accessing page request sent;If the page link that accessing page request is accessed is preset camouflage link, generation is directed to
The camouflage page of camouflage link;Wherein it is invisible to link the normal users for being pre-set to be directed to Non-scanning mode equipment for camouflage;
The camouflage page includes the camouflage link for being linked to next stage camouflage subpage frame;The camouflage page is returned to target user, to scanning
Equipment is protected for the illegal vulnerability scanning at web service end, and this method need to distinguish scanning device in advance and set with Non-scanning mode
Standby, the scanning behavior of the equipment for being not set to scanning device can not be identified effectively, be based on this, and the present invention is implemented
A kind of anti-scanning method based on machine learning bayesian algorithm, device and the server that example provides can be applied to detection and know
It Sao Miao not be in the scene of behavior.
To be based on machine learning shellfish to one kind disclosed in the embodiment of the present invention first convenient for understanding the present embodiment
The anti-scanning method of this algorithm of leaf describes in detail.
A kind of flow chart of anti-scanning method based on machine learning bayesian algorithm shown in Figure 1, wherein the party
Method is applied to server, and specific step is as follows for this method:
Step S102 acquires the access log of client current accessed behavior;
Web log file is an indispensable ring in information security, is had in system exception, user behavior analysis etc. non-
Normal important role, web log file are the various raw informations such as mistake when record web page server reception processing is requested and run
The file to be ended up with " .log ", by web log file can be clear that user what IP address, when, with assorted
Which page of which website, Yi Jishi are had accessed in the case where operating system, what browser, what resolution display
It is no to access successfully.
The embodiment of the present invention obtains the data information of the corresponding client of current accessed behavior by acquisition access log,
In order to carry out processing identification to data information, judge whether the access behavior is scanning behavior.
Step S104 extracts the characteristic value of access log from access log;
During carrying out characteristics extraction, log lazy weight two seconds is removed in access log or less than 100
IP address;Feature extraction is carried out to the access log after removal, obtains the characteristic value of access log.
Features described above value includes answer code, the tangent value of the past two seconds log measuring angles, past two seconds and this access
The number accounting of the identical IP of log, 404 accountings of the past two seconds identical IP with this access log, past two seconds and this access
The port variance of the identical IP of log, the number accounting of 100 logs of past identical IP with this log, 100 logs of past and
It is a variety of in 404 accountings of the identical IP of this log, and the port variance of the IP identical with this log of 100 logs in the past.
The characteristic value extracted is handled, two seconds identical IP of setting past are less than the variance yields of 100 ports
65535;It is 65535 that the identical IP of 100 logs in the past, which is arranged, less than the variance yields of 3 ports.
Characteristic value is input in preset scanning behavior identification model by step S106, exports recognition result;Scanning behavior
Identification model is obtained by NB Algorithm model training;
Above-mentioned characteristic value after treatment is input in preset scanning behavior identification model, is known according to scanning behavior
Other model identifies whether current accessed behavior is scanning behavior.
Scanning behavior identification model is established by the way of machine learning bayesian algorithm, using NB Algorithm
Data are classified, i.e., current accessed behavior is classified according to preset NB Algorithm, is divided into scan line
For or normally access behavior.
Above-mentioned NB Algorithm is classified based on feature independence assumed condition and using probability statistics knowledge
Algorithm, shown in a kind of data distribution architecture schematic diagram as shown in Figure 2, it is assumed that have a data set, which has two class numbers
According to classification 1 (classification that figure orbicular spot indicates) and classification 2 (classification that figure intermediate cam shape indicates), it is assumed that p1 (x, y) indicates data
Point (x, y) belongs to the probability of classification 1, and p2 (x, y) indicates that data point (x, y) belongs to the probability of classification 2, the data new for one
Point (x, y), if p1 (x, y) > p2 (x, y), which is classification 1, otherwise belongs to classification 2.
Current accessed behavior is calculated using NB Algorithm by means of following formula,
Wherein, p (ci| x, y) indicate that data point (x, y) belongs to ciProbability, p (x, y | ci) indicate ciClass data point (x, y)
The probability of appearance.If scanning behavior is set as classification c1, normal access behavior is set as classification c2If p (c1|x,y)>p(c2|x,
Y), then current accessed behavior (x, y) belongs to classification c1, i.e. current accessed behavior is identified as scanning behavior, otherwise, current to visit
The behavior of asking is identified as normal access behavior.
Step S108, if recognition result shows that current accessed behavior is scanning behavior, identification current accessed behavior is corresponding
IP address;
Step S110 intercepts the access behavior that IP address issues in network layer.
The above-mentioned access behavior for intercepting IP address sending in network layer includes the access behavior for intercepting IP address and currently issuing
And/or the access behavior of subsequent sending.
By being identified in real time to current accessed behavior, according to recognition result in time to the corresponding IP address of scanning behavior
It is intercepted in network layer, can effectively prevent the premeditated attack initiated of hacker, some unnecessary data packets is avoided to exist
Flow is occupied in network, improves the utilization rate of network bandwidth, has ensured the safe operation of enterprise network, high with accurate rate,
Low, the at low cost feature of rate of false alarm.
The embodiment of the invention provides a kind of anti-scanning method based on machine learning bayesian algorithm, this method is by adopting
Collect the access log of client current accessed behavior;The characteristic value of access log is extracted from access log;Characteristic value is inputted
To in preset scanning behavior identification model, recognition result is exported;The scanning behavior identification model passes through NB Algorithm
Model training obtains;If recognition result shows that current accessed behavior is scanning behavior, the corresponding IP of identification current accessed behavior
Address;The access behavior that IP address issues is intercepted in network layer.The embodiment of the present invention passes through the side of machine learning bayesian algorithm
Formula establishes scanning behavior identification model, identifies scanning behavior according to scanning behavior identification model, improves the knowledge of scanning behavior
Not rate reduces rate of failing to report, while also reducing the testing cost of detection scanning behavior.
A kind of flow chart of the method for foundation scanning behavior identification model shown in Figure 3, this method are shown in Fig. 1
It is realized on the basis of embodiment of the method, in the present embodiment, emphasis describes to establish the specific implementation of scanning behavior identification model,
Steps are as follows:
Step S302 acquires client access log sample;
Above-mentioned access log sample includes scanner user behaviors log sample and normal access log sample;Scanner behavior day
Will sample includes: scanning software log in 1 hour, and filtering production environment scanning behavior IP and the continuous log in 1 hour of extraction;Just
Normal access log sample includes: the normal access log of production environment, and the day for being 200 by log answer code after filtering rule
Will.
Step S304 builds initial NB Algorithm model;
Step S306 extracts the characteristic value of access log sample;
Features described above value includes answer code, the tangent value of the past two seconds log measuring angles, past two seconds and this access
The number accounting of the identical IP of log, 404 accountings of the past two seconds identical IP with this access log, past two seconds and this access
The port variance of the identical IP of log, the number accounting of 100 logs of past identical IP with this log, 100 logs of past and
It is a variety of in 404 accountings of the identical IP of this log, and the port variance of the IP identical with this log of 100 logs in the past.
The characteristic value extracted is handled, two seconds identical IP of setting past are less than the variance yields of 100 ports
65535;It is 65535 that the identical IP of 100 logs in the past, which is arranged, less than the variance yields of 3 ports, is gone over two seconds to scanning behavior
Log measuring angle tangent value is set as 45, represents the angle of scanning behavior.
Access log sample is divided into specified number by step S308, rolls over cross-validation method using K, in turn will at least one
The characteristic value of part access log sample, which is input in initial NB Algorithm model, to be trained, and scanning behavior knowledge is obtained
Other model;
With the feature vector of mark scan behavior for 1 in the embodiment of the present invention, marking the feature vector that normally accesses is 0 to be
Example, is trained with NB Algorithm model, and being trained verifying by K folding cross method, (it is that will count that K, which rolls over cross validation,
It is divided into K parts according to collection, wherein at least a will be used as training sample in turn, remaining number finally ties test as test sample
Fruit takes mean value).
Step S310 knows the characteristic value of remaining at least a access log sample by scanning behavior identification model
Not, recognition result is exported;
According to example described in above-mentioned steps S308, recognition result is 1 or 0, if 1, then current accessed log sample quilt
It is identified as scanning behavior, if 0, then behavior is asked in the current accessed log sample frequentation that is positive.
Step S312, the actual result of comparison recognition result access log sample corresponding with recognition result, is scanned
The accuracy rate and recall rate of Activity recognition model;
Accuracy rate is recognition result and actual result be genuine access log sample size with recognition result is really to visit
Ask the ratio of log sample size;Recall rate is recognition result and actual result is genuine access log sample size and reality
It as a result is the ratio of genuine access log sample size.
Step S314 adjusts scanning behavior identification model according to accuracy rate and recall rate.
The embodiment of the present invention passes through acquisition scans device user behaviors log and normal access log as access log sample, to this
Sample carries out feature extraction, and the characteristic value of a part of sample is input in the NB Algorithm model built in advance and is carried out
Training, obtains scanning behavior identification model, the characteristic value of the sample of remainder is input to established scanning Activity recognition
In model, the actual result of the recognition result of the model and sample is compared, is completed by accuracy rate and recall rate to this
The verification processing of model constantly adjusts scanning behavior identification model according to accuracy rate and recall rate, ties it in identification process
Fruit is more acurrate, improves the discrimination of scanning behavior, reduces rate of failing to report.
Anti-scanning method the embodiment of the invention provides another kind based on machine learning bayesian algorithm, this method is upper
It states and realizes on the basis of embodiment the method;In the present embodiment, emphasis description intercepts the corresponding IP of scanning behavior in network layer
The specific implementation for the access behavior that address issues.
As shown in figure 4, specific step is as follows for the access behavior of above-mentioned interception scanning IP address sending:
Step S402, the corresponding scanning IP address of identification scanning behavior;
According to scanning behavior identification model identify as a result, will be identified that the corresponding IP of access behavior of scanning behavior
The scanning IP address is blocked in network layer as scanning IP address in address.
Step S404, according to the corresponding relationship of the data information of preset IP address and threat level, to scanning IP address
Impend grade classification;
According to the threat level of the threat degree judgement scanning IP address of scanning IP address, threatened with being divided according to answer code
Be illustrated for grade, 404 accounting of answer code be greater than 80% or 403 accounting be greater than and 80% be defined as high-risk, answer code 404
Accounting be greater than 50% and less than 80% or 403 accounting be greater than and 50% and be defined as middle danger less than 80%, otherwise being defined as
Low danger, threat level is higher, and block duration is accordingly higher.
Step S406, the IP address for being identified as scanning IP address to continuous predetermined number of times carry out exponential duration block;
Predetermined number of times is preset, when carrying out exponential for the IP address that continuous predetermined number of times is identified as scanning IP address
Long block, targetedly block in time can effectively prevent the premeditated attack initiated of hacker, avoid some unnecessary
Data packet occupies flow in a network, improves the utilization rate of network bandwidth, ensures the safe operation of enterprise network.
The domain name that the IP address that exponential duration is blocked is scanned is sent to the corresponding server of domain name by step S408.
The domain name that the IP address that exponential duration is blocked is scanned is sent to the corresponding server of domain name, informs the service
The administrator of device, so that administrator carries out the processing of next step to the IP address.
The embodiment of the present invention by the way that the block of certain time length is carried out according to threat degree to the corresponding IP address of scanning behavior,
And it is sent to the corresponding server of domain name scanned by the IP address by the too long IP address of duration is blocked, by the server
Administrator in time handles it to the IP address, can prevent the premeditated scanning attack behavior initiated of hacker in time, ensure net
The safe operation of network.
Corresponding to above method embodiment, the embodiment of the invention also provides a kind of based on machine learning bayesian algorithm
Anti- scanning means, as shown in Figure 5, wherein the device is set to server-side, which includes:
Acquisition module 50, for acquiring the access log of client current accessed behavior;
Extraction module 51, for extracting the characteristic value of access log from access log;
Identification module 52 exports recognition result for characteristic value to be input in preset scanning behavior identification model;It sweeps
Activity recognition model is retouched to obtain by NB Algorithm model training;
It identifies address module 53, if showing that current accessed behavior is scanning behavior for recognition result, identifies current visit
Ask behavior corresponding IP address;
Blocking module 54, for intercepting the access behavior that IP address issues in network layer.
Anti- scanning means provided in an embodiment of the present invention based on machine learning bayesian algorithm is provided with above-described embodiment
The technical characteristic having the same of the anti-scanning method based on machine learning bayesian algorithm, so also can solve identical technology
Problem reaches identical technical effect.
Corresponding to foregoing invention embodiment, the embodiment of the invention also provides a kind of servers, wherein including processor and
Machine readable storage medium, machine readable storage medium are stored with the machine-executable instruction that can be executed by processor, processing
Device executes machine-executable instruction to realize the above-mentioned anti-scanning method based on machine learning bayesian algorithm.
Specifically, processor may be a kind of IC chip, the processing capacity with signal.During realization,
Each step of the above method can be completed by the integrated logic circuit of the hardware in processor or the instruction of software form.On
The processor stated can be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), net
Network processor (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (Digital Signal
Processing, abbreviation DSP), specific integrated circuit (Application Specific Integrated Circuit, referred to as
ASIC), ready-made programmable gate array (Field-Programmable Gate Array, abbreviation FPGA) or other are programmable
Logical device, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute in the embodiment of the present invention
Disclosed each method, step and logic diagram.General processor can be microprocessor or the processor is also possible to appoint
What conventional processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in hardware decoding processing
Device executes completion, or in decoding processor hardware and software module combination execute completion.Software module can be located at
Machine memory, flash memory, read-only memory, programmable read only memory or electrically erasable programmable memory, register etc. are originally
In the storage medium of field maturation.The storage medium is located at memory, and processor reads the information in memory, in conjunction with its hardware
The step of completing the method for previous embodiment.
Specifically, machine readable storage medium is stored with machine-executable instruction, the machine-executable instruction is processed
When device is called and executed, machine-executable instruction promotes processor to realize the above-mentioned anti-scanning based on machine learning bayesian algorithm
Method, specific implementation can be found in embodiment of the method, and details are not described herein.
Anti-scanning method, device and server based on machine learning bayesian algorithm provided by the embodiment of the present invention with
And the computer program product of system, the computer readable storage medium including storing program code, what program code included
Instruction can be used for executing previous methods method as described in the examples, and specific implementation can be found in embodiment of the method, no longer superfluous herein
It states.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product
It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words
The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a
People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.
And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited
The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.
Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention
Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair
It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art
In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light
It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make
The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention
Within the scope of.Therefore, protection scope of the present invention should be subject to the protection scope in claims.
Claims (10)
1. a kind of anti-scanning method based on machine learning bayesian algorithm, which is characterized in that the method is applied to server,
The described method includes:
Acquire the access log of client current accessed behavior;
The characteristic value of the access log is extracted from the access log;
The characteristic value is input in preset scanning behavior identification model, recognition result is exported;The scanning Activity recognition
Model is obtained by NB Algorithm model training;
If the recognition result shows that current accessed behavior is scanning behavior, with identifying the corresponding IP of the current accessed behavior
Location;
The access behavior that the IP address issues is intercepted in network layer.
2. the method according to claim 1, wherein extracting the spy of the access log from the access log
The step of value indicative includes:
Remove in the access log log lazy weight two seconds or the IP address less than 100;
Feature extraction is carried out to the access log after removal, obtains the characteristic value of the access log.
3. the method according to claim 1, wherein the characteristic value includes the log of answer code, the past two seconds
Number accounting, past two seconds and this access log of the tangent value of measuring angle, the past two seconds identical IP with this access log
404 accountings of identical IP, the port variance of the past two seconds identical IP with this access log, past 100 logs and this next day
404 accountings of the number accounting of the identical IP of will, 100 logs of past identical IP with this log, and 100 logs in the past and
It is a variety of in the port variance of the identical IP of this log.
4. according to the method described in claim 3, it is characterized in that, the method also includes:
It is 65535 that two seconds identical IP of past, which are arranged, less than the variance yields of 100 ports;
It is 65535 that the identical IP of 100 logs in the past, which is arranged, less than the variance yields of 3 ports.
5. the method according to claim 1, wherein the scanning behavior identification model, especially by following sides
Formula obtains:
Acquire client access log sample;The access log sample includes scanner user behaviors log sample and normal access day
Will sample;
Build initial NB Algorithm model;
Extract the characteristic value of the access log sample;
The access log sample is divided into specified number, cross-validation method is rolled over using K, it in turn will at least a access
The characteristic value of log sample is input in the initial NB Algorithm model and is trained, and obtains scanning Activity recognition
Model;
The characteristic value of remaining at least a access log sample is identified by the scanning behavior identification model, it is defeated
Recognition result out;
The actual result for comparing recognition result access log sample corresponding with the recognition result, obtains the scan line
For the accuracy rate and recall rate of identification model;The accuracy rate is the recognition result and actual result is genuine access log
Sample size and recognition result are the ratio of genuine access log sample size;The recall rate is the recognition result and reality
Result is genuine access log sample size and actual result is the ratio of genuine access log sample size;
The scanning behavior identification model is adjusted according to the accuracy rate and the recall rate.
6. according to the method described in claim 5, it is characterized in that, the scanner user behaviors log sample includes: scanning software 1
Hour log, and filtering production environment scanning behavior IP and the continuous log in 1 hour of extraction;The normal access log sample packet
It includes: the normal access log of production environment, and the log for being 200 by log answer code after filtering rule.
7. the method according to claim 1, wherein described intercept the access that the IP address issues in network layer
The step of behavior, comprising: intercept the access behavior of access behavior and/or subsequent sending that the IP address currently issues.
8. the method according to claim 1, wherein the method also includes:
Identify the corresponding scanning IP address of the scanning behavior;
According to the corresponding relationship of the data information of preset IP address and threat level, impend to the scanning IP address
Grade divides;
The IP address for being identified as scanning IP address to continuous predetermined number of times carries out exponential duration block;
The domain name that the IP address of the exponential duration block is scanned is sent to the corresponding server of domain name.
9. a kind of anti-scanning means based on machine learning bayesian algorithm, which is characterized in that described device is set to server,
Described device includes:
Acquisition module, for acquiring the access log of client current accessed behavior;
Extraction module, for extracting the characteristic value of the access log from the access log;
Identification module exports recognition result for the characteristic value to be input in preset scanning behavior identification model;It is described
Scanning behavior identification model is obtained by NB Algorithm model training;
It identifies address module, if showing that current accessed behavior is scanning behavior for the recognition result, identifies described current
The corresponding IP address of access behavior;
Blocking module, for intercepting the access behavior that the IP address issues in network layer.
10. a kind of server, which is characterized in that including processor and machine readable storage medium, the machine readable storage is situated between
Matter is stored with the machine-executable instruction that can be executed by the processor, and the processor executes the machine-executable instruction
To realize the described in any item methods of claim 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810957134.0A CN109218294A (en) | 2018-08-21 | 2018-08-21 | Anti-scanning method, device and server based on machine learning bayesian algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810957134.0A CN109218294A (en) | 2018-08-21 | 2018-08-21 | Anti-scanning method, device and server based on machine learning bayesian algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109218294A true CN109218294A (en) | 2019-01-15 |
Family
ID=64989339
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810957134.0A Pending CN109218294A (en) | 2018-08-21 | 2018-08-21 | Anti-scanning method, device and server based on machine learning bayesian algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109218294A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110912874A (en) * | 2019-11-07 | 2020-03-24 | 苏宁云计算有限公司 | Method and system for effectively identifying machine access behaviors |
CN112073426A (en) * | 2020-09-16 | 2020-12-11 | 杭州安恒信息技术股份有限公司 | Website scanning detection method, system and equipment in cloud protection environment |
CN114157439A (en) * | 2020-08-18 | 2022-03-08 | 中国电信股份有限公司 | Vulnerability scanning method, computing device and recording medium |
CN115426202A (en) * | 2022-11-03 | 2022-12-02 | 北京源堡科技有限公司 | Scanning task issuing method and device, computer equipment and readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104917627A (en) * | 2015-01-20 | 2015-09-16 | 杭州安恒信息技术有限公司 | Log cluster scanning and analysis method used for large-scale server cluster |
CN105468977A (en) * | 2015-12-14 | 2016-04-06 | 厦门安胜网络科技有限公司 | Method and device for Android malicious software classification based on Naive Bayes |
CN106709513A (en) * | 2016-12-10 | 2017-05-24 | 中泰证券股份有限公司 | Supervised machine learning-based security financing account identification method |
US9781139B2 (en) * | 2015-07-22 | 2017-10-03 | Cisco Technology, Inc. | Identifying malware communications with DGA generated domains by discriminative learning |
CN107423205A (en) * | 2017-07-11 | 2017-12-01 | 北京明朝万达科技股份有限公司 | A kind of system failure method for early warning and system for anti-data-leakage system |
-
2018
- 2018-08-21 CN CN201810957134.0A patent/CN109218294A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104917627A (en) * | 2015-01-20 | 2015-09-16 | 杭州安恒信息技术有限公司 | Log cluster scanning and analysis method used for large-scale server cluster |
US9781139B2 (en) * | 2015-07-22 | 2017-10-03 | Cisco Technology, Inc. | Identifying malware communications with DGA generated domains by discriminative learning |
CN105468977A (en) * | 2015-12-14 | 2016-04-06 | 厦门安胜网络科技有限公司 | Method and device for Android malicious software classification based on Naive Bayes |
CN106709513A (en) * | 2016-12-10 | 2017-05-24 | 中泰证券股份有限公司 | Supervised machine learning-based security financing account identification method |
CN107423205A (en) * | 2017-07-11 | 2017-12-01 | 北京明朝万达科技股份有限公司 | A kind of system failure method for early warning and system for anti-data-leakage system |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110912874A (en) * | 2019-11-07 | 2020-03-24 | 苏宁云计算有限公司 | Method and system for effectively identifying machine access behaviors |
CN110912874B (en) * | 2019-11-07 | 2022-04-05 | 苏宁云计算有限公司 | Method and system for effectively identifying machine access behaviors |
CN114157439A (en) * | 2020-08-18 | 2022-03-08 | 中国电信股份有限公司 | Vulnerability scanning method, computing device and recording medium |
CN114157439B (en) * | 2020-08-18 | 2024-03-05 | 中国电信股份有限公司 | Vulnerability scanning method, computing device and recording medium |
CN112073426A (en) * | 2020-09-16 | 2020-12-11 | 杭州安恒信息技术股份有限公司 | Website scanning detection method, system and equipment in cloud protection environment |
CN115426202A (en) * | 2022-11-03 | 2022-12-02 | 北京源堡科技有限公司 | Scanning task issuing method and device, computer equipment and readable storage medium |
CN115426202B (en) * | 2022-11-03 | 2023-01-24 | 北京源堡科技有限公司 | Scanning task issuing method and device, computer equipment and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112003870B (en) | Network encryption traffic identification method and device based on deep learning | |
CN109218294A (en) | Anti-scanning method, device and server based on machine learning bayesian algorithm | |
CN107888571B (en) | Multi-dimensional webshell intrusion detection method and system based on HTTP log | |
CN104615760A (en) | Phishing website recognizing method and phishing website recognizing system | |
CN107659583A (en) | A kind of method and system attacked in detection thing | |
CN108712453A (en) | Detection method for injection attack, device and the server of logic-based regression algorithm | |
CN111447204B (en) | Weak password detection method, device, equipment and medium | |
CA2606326A1 (en) | System and method for fraud monitoring, detection, and tiered user authentication | |
CN110351237B (en) | Honeypot method and device for numerical control machine tool | |
CN107786564A (en) | Based on attack detection method, system and the electronic equipment for threatening information | |
CN107864128B (en) | Network behavior based scanning detection method and device and readable storage medium | |
CN110460611B (en) | Machine learning-based full-flow attack detection technology | |
CN111478892A (en) | Attacker portrait multi-dimensional analysis method based on browser fingerprints | |
CN107332804A (en) | The detection method and device of webpage leak | |
CN107026731A (en) | A kind of method and device of subscriber authentication | |
CN107231383B (en) | CC attack detection method and device | |
CN112272175A (en) | Trojan horse virus detection method based on DNS | |
Boggs et al. | Discovery of emergent malicious campaigns in cellular networks | |
CN110058565B (en) | Industrial control PLC system fingerprint simulation method based on Linux operating system | |
CN116015800A (en) | Scanner identification method and device, electronic equipment and storage medium | |
CN115396218A (en) | Enterprise API (application program interface) safety control method and system based on flow analysis | |
CN108810028A (en) | A kind of detection method and system of the whole network wooden horse control terminal | |
EP3059694B1 (en) | System and method for detecting fraudulent online transactions | |
CN114124453A (en) | Network security information processing method and device, electronic equipment and storage medium | |
CN117787998B (en) | Mobile internet secure payment verification method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190115 |