CN107241296A - A kind of Webshell detection method and device - Google Patents

A kind of Webshell detection method and device Download PDF

Info

Publication number
CN107241296A
CN107241296A CN201610184299.XA CN201610184299A CN107241296A CN 107241296 A CN107241296 A CN 107241296A CN 201610184299 A CN201610184299 A CN 201610184299A CN 107241296 A CN107241296 A CN 107241296A
Authority
CN
China
Prior art keywords
page
suspicious
access
source
webshell
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610184299.XA
Other languages
Chinese (zh)
Other versions
CN107241296B (en
Inventor
朱伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610184299.XA priority Critical patent/CN107241296B/en
Publication of CN107241296A publication Critical patent/CN107241296A/en
Application granted granted Critical
Publication of CN107241296B publication Critical patent/CN107241296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/972Access to data in other repository systems, e.g. legacy data or dynamic Web page generation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application and computer realm, disclose a kind of Webshell detection method and device, to improve detection accuracy.This method is:The access relation of each page characterized based on system access log sets up graph model, filter out and access the suspicious page set of the first kind linked with being not present between other pages, and filter out the suspicious page of Equations of The Second Kind that the state accessed by source IP does not meet default access consideration, all pages included are concentrated based on the suspicious page set of the first kind and the suspicious page intersection of sets of Equations of The Second Kind again, Webshell alarms are carried out.So, the access state of each page can precisely be drawn by graph model, the feature of the suspicious page is made more intuitively to be shown in graph model, so as to accurately filter out the page where Webshell, this not only increases Webshell detection accuracy, while also reducing rate of failing to report and rate of false alarm.

Description

A kind of Webshell detection method and device
Technical field
The application is related to computer realm, more particularly to a kind of Webshell detection method and device.
Background technology
Webshell is commonly called as webpage back door, be exactly with Active Server Pages (Active Server Page, asp), HyperText Preprocessor (Hypertext Preprocessor, php), the java server page (Java Server Pages, JSP) or the webpage such as CGI(Common gateway interface) (Common Gateway Interface, CGI) text A kind of order performing environment that part form is present.
Hacker has certain operating right by uploading one after a website has been invaded to server Webshell programs, such as:The operations such as execution system order, deletion web page, modification homepage.And this Individual Webshell programs are usually and the normal page program such as asp, php, jsp mixes, and website is managed Reason personnel are typically difficult to find, hacker is so as to utilize this characteristic to carry out long-term manipulation website and server.
At present, the detection to Webshell generally detects two using static state Webshell detections and Dynamic Web shell The method of kind:
1st, so-called static Webshell detections, mainly detect to file content, detect whether to include Webshell features, e.g., matching webshell common mathematical functions etc..
However, for Static Detection means, hacker is easy to bypass using following methods:Such as code encryption, Code obfuscation etc., thus static code detected rule is difficult to find various very strange Webshell codes, And static code detection means place one's entire reliance upon artificial rule, without initiative recognition, autonomous learning.Cause This Static Detection means can band cause substantial amounts of to fail to report and report by mistake.
2nd, so-called Dynamic Web shell detections, are to utilize sandbox technology, apocrypha are placed in sandbox and held OK, detected according to operation characteristic of the apocrypha in sandbox.This scheme detects hand relative to static code Section can bypass code encryption, still, and most of Webshell operations are all triggered by Transfer Parameters, Thus, it is that can not trigger corresponding Webshell if apocrypha is no in sandbox to transmit normal parameter Operation, therefore sandbox can not also capture individual features, so as to cause largely to fail to report.
The content of the invention
The embodiment of the present application provides a kind of Webshell detection method and device.To improve Webshell The detection accuracy of detection scheme.
The concrete technical scheme that the embodiment of the present application is provided is as follows:
A kind of Webshell detection method, including:
Based on system access log, the access path and referer (i.e. source pages of each page are extracted Information), and all source IPs are extracted, wherein, the referer of a page, which is characterized, accesses one page The access path of the upper page accessed before face;
Access path and referer based on each page, filter out and are accessed with being not present between other pages The suspicious page of the first kind of link, obtains the suspicious page set of the first kind;
Access path and all source IPs based on each page, filter out the state accessed by source IP and are not inconsistent The suspicious page of Equations of The Second Kind of default access consideration is closed, the suspicious page set of Equations of The Second Kind is obtained;
The suspicious page set of the first kind and the suspicious page intersection of sets of Equations of The Second Kind are concentrated to all pages included, It is defined as including the Webshell page.
Optionally, based on system access log, the access path and referer of each page are extracted, is wrapped Include:
Each page recorded in acquisition system access log;
Static page is removed from each page obtained;
Unified resource positioning URL addresses for removing each page after static page, carry out parameter Remove, extract the access path of each page;
Access link order based on each page recorded in system access log, determines each page respectively The referer in face.
Optionally, access path and referer based on each page, filter out and are not present with other pages The suspicious page of the first kind of link is accessed, including:
Access path and referer based on each page, draw access link between characterizing each page and close The digraph of system, is filtered out based on the digraph and accesses the isolated page linked with being not present between other pages Face, regard all isolated pages as the suspicious page of the first kind;
Optionally, based on each page access path and all source IPs, filter out what is accessed by source IP State does not meet the suspicious page of Equations of The Second Kind of default access consideration, including:
Access path and all source IPs based on each page, draw and characterize each source IP and each page Between access relation bigraph (bipartite graph), based on the bigraph (bipartite graph), filter out the state accessed by source IP and do not meet The abnormal page of preparatory condition, regard all abnormal pages as the suspicious page of the Equations of The Second Kind.
Optionally, filtered out based on the digraph and access the isolated page linked with being not present between other pages Face, including:
Based on the digraph, it is space-time to determine the corresponding referer of any one page, is judged described any one The page is the isolated page;And,
Based on the digraph, the access path pointed by the corresponding referer of any one page is determined, is being visited When not accessed before asking any one page, judge any one page as the isolated page.
Optionally, based on the bigraph (bipartite graph), filter out the state accessed by source IP and do not meet preparatory condition The abnormal page, including:
Based on the bigraph (bipartite graph), determine that the total number of the source IP of any one page association is less than the first pre- gating Limit, and the total number of the network segment of the source IP ownership of association is when being less than the second pre-determined threshold, determines described any One page is the abnormal page.
Optionally, the suspicious page set of the first kind and the suspicious page intersection of sets of Equations of The Second Kind are concentrated to the institute included There is the page, be defined as including the Webshell page, including:
The suspicious page set of the first kind and the suspicious page intersection of sets of Equations of The Second Kind are concentrated to all pages included, It is determined directly as including the Webshell page;Or,
What the calculating suspicious page set of the first kind and the suspicious page intersection of sets concentration of Equations of The Second Kind were included successively is every One page, the similarity with default Webshell sample sets, and similarity is reached into given threshold The page is defined as including the Webshell page.
A kind of Webshell detection means, including:
Extraction unit, for based on system access log, extracting access path and the source of each page Page info referer, and all source IPs are extracted, wherein, the referer of a page, which is characterized, accesses institute State the access path of the upper page accessed before a page;
First screening unit, for access path and referer based on each page, is filtered out and other The suspicious page of the first kind for accessing link is not present between the page, the suspicious page set of the first kind is obtained;
Second screening unit, for the access path based on each page and all source IPs, is filtered out by source The state that IP is accessed does not meet the suspicious page of Equations of The Second Kind of default access consideration, obtains the suspicious page of Equations of The Second Kind Set;
Processing unit, for concentrating and wrapping the suspicious page set of the first kind and the suspicious page intersection of sets of Equations of The Second Kind All pages contained, are defined as including the Webshell page.
Optionally, based on system access log, when extracting access path and the referer of each page, The extraction unit is used for:
Each page recorded in acquisition system access log;
Static page is removed from each page obtained;
Unified resource positioning URL addresses for removing each page after static page, carry out parameter Remove, extract the access path of each page;
Access link order based on each page recorded in system access log, determines each page respectively The referer in face.
Optionally, access path and referer based on each page, filter out and are not present with other pages When accessing the suspicious page of the first kind of link, first screening unit is used for:
Access path and referer based on each page, draw access link between characterizing each page and close The digraph of system, is filtered out based on the digraph and accesses the isolated page linked with being not present between other pages Face, regard all isolated pages as the suspicious page of the first kind;
Optionally, based on each page access path and all source IPs, filter out what is accessed by source IP When state does not meet the Equations of The Second Kind suspicious page of default access consideration, second screening unit is used for:
Access path and all source IPs based on each page, draw and characterize each source IP and each page Between access relation bigraph (bipartite graph), based on the bigraph (bipartite graph), filter out the state accessed by source IP and do not meet The abnormal page of preparatory condition, regard all abnormal pages as the suspicious page of the Equations of The Second Kind.
Optionally, filtered out based on the digraph and access the isolated page linked with being not present between other pages During face, first screening unit is used for, including:
Based on the digraph, it is space-time to determine the corresponding referer of any one page, is judged described any one The page is the isolated page;And,
Based on the digraph, the access path pointed by the corresponding referer of any one page is determined, is being visited When not accessed before asking any one page, judge any one page as the isolated page.
Optionally, based on the bigraph (bipartite graph), filter out the state accessed by source IP and do not meet preparatory condition During the abnormal page, second screening unit is used for, including:
Based on the bigraph (bipartite graph), determine that the total number of the source IP of any one page association is less than the first pre- gating Limit, and the total number of the network segment of the source IP ownership of association is when being less than the second pre-determined threshold, determines described any One page is the abnormal page.
Optionally, the suspicious page set of the first kind and the suspicious page intersection of sets of Equations of The Second Kind are concentrated to the institute included There is the page, when being defined as the page comprising Webshell, the processing unit is used for:
The suspicious page set of the first kind and the suspicious page intersection of sets of Equations of The Second Kind are concentrated to all pages included, It is determined directly as including the Webshell page;Or,
What the calculating suspicious page set of the first kind and the suspicious page intersection of sets concentration of Equations of The Second Kind were included successively is every One page, the similarity with default Webshell sample sets, and similarity is reached into given threshold The page is defined as including the Webshell page.
In the embodiment of the present application, the access relation of each page characterized based on system access log sets up artwork Type, filters out and accesses the suspicious page set of the first kind linked with being not present between other pages, and screening Go out the suspicious page of Equations of The Second Kind that the state accessed by source IP does not meet default access consideration, then based on first The suspicious page set of class and the suspicious page intersection of sets of Equations of The Second Kind concentrate all pages included, carry out Webshell is alerted.So, the access state of each page can precisely be drawn by graph model, The feature of the suspicious page is made more intuitively to be shown in graph model, so as to accurately filter out The page where Webshell, this not only increases Webshell detection accuracy, reduce rate of failing to report and Rate of false alarm, meanwhile, algorithm complex is also reduced, detection efficiency is improved.
Brief description of the drawings
Fig. 1 is digraph schematic diagram in the embodiment of the present application;
Fig. 2 is two parts diagram intention in the embodiment of the present application;
Fig. 3 is Webshell overhaul flow charts in the embodiment of the present application;
Fig. 4 is detection means illustrative view of functional configuration in the embodiment of the present application.
Embodiment
In order to improve in the detection accuracy of Webshell detection schemes, the embodiment of the present application, accessed in system On the basis of daily record, the access relation based on each page sets up graph model, then based on graph model, filters out The abnormal suspicious page of access situation, these suspicious pages are alerted as Webshell, so as to Active Learning is realized, without manual intervention.
The application is preferred embodiment described in detail below in conjunction with the accompanying drawings.
Firstly, it is necessary to which the basic concepts being related in the application are introduced.
, it is necessary to set up graph model based on each page, so-called figure (Graph) is one in the embodiment of the present application Plant compared with linear list and set increasingly complex data structure.It is only linear between data element to close in linear list System, i.e., each data element only one of which direct precursor and an immediate successor;In tree structure, data Have obvious hierarchical relationship between element, although data element on each layer may with it is multiple in next layer Element is related, but can only be related to an element in last layer;And in graphic structure, the pass between node System can be arbitrary, all may be related between any two data element.G is schemed by two set V and E Composition, is designated as:G=(V, E), wherein, V is that the finite nonempty set on summit is closed, and E is summit idol in V To the finite aggregate of (being referred to as side).Generally, the vertex set and side collection that scheme G are also designated as V (G) and E (G) respectively. E (G) can be empty set.If E (G) is sky, figure G only has summit without side.
Graphic structure has in many kinds of implementations, the embodiment of the present application, has mainly used digraph and two Portion's figure.
So-called digraph, it is directly perceived for, if each edge in figure is all directive, referred to as digraph. The ordered pair that side in digraph is made up of two summits, ordered pair generally represents with angle brackets, such as <vi,vj>A directed edge is represented, wherein vi is the initial point on side, and vj is the terminal on side.<vi,vj>With<vj,vi> Represent two different directed edges.Concrete example is as shown in Figure 1.
So-called bigraph (bipartite graph), bigraph (bipartite graph) is also referred to as bipartite graph, is a kind of particular module in graph theory.If G=(V, E) It is a non-directed graph, if summit V may be partitioned into two mutually disjoint subsets (A) and subset (B), and Two summits i and j associated by each edge (i, j) in figure are belonging respectively to the two different vertex set (i In A, j in B), then it is called a bigraph (bipartite graph) to scheme G.Concrete example is as shown in Figure 2.
According to safety experience, Webshell has following characteristics in access:1st, the page belongs to the isolated page, 2, Initiated to access by a small amount of source IP.
Feature based 1, can regard each page under some website as one point, including Webshell With normal page, a digraph is constituted a little, and e.g., page A is connected to page B, also means that There is a line in this figure is A->B, is not in necessarily that some page link is arrived under whole website Webshell's.Because in order to ensure Webshell itself hiding, hacker is often direct accesses Webshell, therefore, Webshell must be an isolated node, i.e., the one isolated page in digraph.
Feature based 2, it is generally the case that Webshell visitor is single, therefore Webshell can be directed to Visitor, that is, source IP dimension analyze, typically only attacker oneself can just access Webshell, And the typically fluctuation of source IP of the attacker on network is little, therefore, by source IP and the access path of the page Between relation can regard a bigraph (bipartite graph) as, in bigraph (bipartite graph), Webshell in-degree number necessarily compares It is small.
Based on above-mentioned realization principle, as shown in fig.3, in the embodiment of the present application, carrying out Webshell detections Detailed process it is as follows:
Step 300:Acquisition system access log, obtains each page recorded in system access log.
Specifically, the whole system access log in a certain website can be obtained from server side, system accesses day The access situation of each page in the website is have recorded in will, e.g., access time, visitor accesses source Etc..
Step 301:Static page is removed from each page obtained.
Because, Webshell is usually dynamic page, therefore, in order to effectively reduce amount of calculation, can be with The static page recorded in system access log is removed in advance.
Certainly, if the negligible amounts of static page, static page can not also be removed, be will not be repeated here.
Step 302:URL addresses for removing each page after static page, carry out parameter removal, Extract the access path of each page.
Specifically, can include many kinds of parameters in the URL addresses of each page, these parameters are for painting Cartographic model is useless, and can reduce the treatment effeciency of graph model, accordingly, it would be desirable to go unless each in advance All kinds of parameters in page URL addresses, so as to extract the access path of each page, are designated as path.
For example, there is URL:http://www.example/list.phpType=news&limit=10, according to Http agreements are provided, there are two parameters:Type=news and limit=10, removes the access road after parameter Footpath is then:http://www.example/list.php.
Step 303:Access link order based on each page recorded in system access log, it is true respectively The referer of each fixed page, and determine all source IPs.
So-called referer, is the access path of the upper page accessed before a page is accessed, The most page has corresponding referer, according to the page access order recorded in system access log, The referer of each page can be known.
For example, user elder generation accession page 1, then the page 2 is linked to by the page 1, and continue through page Face 2 links to the page 3, and the access path of the page 1 is path 1, and the access path of the page 2 is path 2, So, path 1 is the referer of the page 2, and path2 is the referer of the page 3.
And source IP, that is, refer to access promoter, according to the record of system access log, source IP can be known Quantity, and source IP ownership the network segment.
Step 304:The session cycle is divided, that is, divides session.
So-called session, is counted time range, because may have recorded very in system access log Access information in long a period of time, therefore, in order to targetedly draw graph model, it is thus necessary to determine that session Cycle.
For example, the division in the cycle that can be conversated according to daily record flow, by each source IP same The website access of lower 30 minutes is divided into a session cycle.
Step 305:Access path and referer based on each page, drafting are characterized between each page Access the digraph of linking relationship.
For example, as shown in fig.1, the referer of each website in the same session cycle can be arranged With path corresponding relation, each page is plotted as node, the access order between each page is drawn For directed edge, so as to form digraph as shown in Figure 1.
Step 306:Based on above-mentioned digraph filter out between other pages be not present access link isolate The page, regard all isolated pages as the suspicious page set of the first kind.
For example, as shown in fig.1, whether node abnormal, determined by other nodes in the environment residing for it, If great deal of nodes points to some nodes X, that is, nodes X in-degree than larger, then nodes X pair The page answered is that Webshell probability is just smaller, if on the contrary, nodes X is an orphan in Fig. 1 Vertical node (in-degree is 0), then, nodes X is just likely to Webshell.
Specifically, still by taking nodes X as an example, based on above-mentioned digraph, within a session cycle, it is determined that section The corresponding referer of point X are space-time, and the corresponding pages of predicate node X are the isolated page;And, it is based on Above-mentioned digraph, determines the access path pointed by the corresponding referer of nodes X, accessed node X it Preceding (the specific historical record that accesses can be inquired about according to system access log) when not accessing, determines that referer is Forge, and the corresponding pages of predicate node X are the isolated page.
In practical application, Webshell mono- is set to the isolated page, but the isolated page but might not all be Webshell.Because:For such as robots.txt, corssdomain.xml, rest api etc. links Speech, the pointed node of these links is normal page, but these nodes but pointed to without other nodes, because And the feature of the isolated page is also complied with, but be not Webshell.Therefore, in order to avoid erroneous judgement, in addition it is also necessary to Continue executing with subsequent operation.
Step 307:Access path and all source IPs based on each page, draw and characterize each source IP The bigraph (bipartite graph) of access relation between each page.
For example:As shown in fig.1, according to the record of system access log, each source IP can be known most Which access path has been pointed to eventually, so as to form bigraph (bipartite graph) as shown in Figure 1.
Certainly, need to carry out duplicate removal processing to source IP before drawing bigraph (bipartite graph), will not be repeated here.
Step 308:Based on above-mentioned bigraph (bipartite graph), filter out the state accessed by source IP and do not meet preparatory condition The abnormal page, regard all abnormal pages as the suspicious page set of Equations of The Second Kind.
Under normal circumstances, an abnormal nodes, most of normal users will not go to access, only on a small quantity Attacker can go access, therefore, based between source IP and the path of the page access relation draw as scheme Bigraph (bipartite graph) shown in 2, can pick out the less node of in-degree, the accessed number of times of the fewer instruction page of in-degree It is fewer, then the page is that Webshell probability is higher.
Specifically, the state accessed by source IP described in step 308 does not meet preparatory condition, it is Refer to " node in-degree is less ", and " node in-degree is less " then can be from the description of following two angles (to save Exemplified by point X):
1) total number of the source IP of nodes X association is less than the first pre-determined threshold.
For example, the total number of the source IP associated with nodes X is less than 20.
2) total number of the network segment of the source IP ownership of nodes X association is less than the second pre-determined threshold.
For example, the total number of the network segment of the source IP ownership associated with nodes X is less than 5.
Above-mentioned two condition must simultaneously meet, and can meet " node in-degree is less " this preparatory condition, Only meet the corresponding page of node of one of them and be not viewed as the abnormal page.
Step 309:Bag will be concentrated based on the suspicious page set of the first kind and the suspicious page intersection of sets of Equations of The Second Kind All pages contained, are defined as including the Webshell page.
Specifically, step 309 can use but be not limited to following two implementations:
First way is:The suspicious page set of the first kind and the suspicious page intersection of sets of Equations of The Second Kind are concentrated into bag All pages contained, are determined directly as including the Webshell page..
Under normal circumstances, node suspicious in digraph and bigraph (bipartite graph), occurs different in very maximum probability Often, at this point it is possible to directly be defined as the corresponding page of all nodes in the common factor of digraph and bigraph (bipartite graph) The page comprising Webshell, it is possible to further be alerted, so, can be in rate of false alarm and close When in the range of reason, treatment effeciency is effectively improved.
The second way is:The suspicious page set of the first kind and the suspicious page intersection of sets of Equations of The Second Kind are calculated successively Each page included, the similarity with default Webshell sample sets are concentrated, and similarity is reached The page to given threshold is defined as including the Webshell page.
The second way is used, primarily to Webshell detection precisions are improved, accordingly, it would be desirable to will have To the page corresponding with all nodes in the common factor of bigraph (bipartite graph) is schemed, carried out successively with Webshell sample sets Similarity-rough set, the similarity of any one page and any one Webshell sample reaches given threshold, The page for now including Webshell can be determined, it is possible to further be alerted.
Based on above-described embodiment, as shown in fig.4, in the embodiment of the present application, detection means at least includes carrying Unit 40, the first screening unit 41, the second screening unit 42 and processing unit 43 are taken, wherein,
Extraction unit 40, for based on system access log, extracting the access path of each page and coming Source page info referer, and all source IPs are extracted, wherein, the referer of a page, which is characterized, to be accessed The access path of the upper page accessed before one page;
First screening unit 41, for access path and referer based on each page, is filtered out and it The suspicious page of the first kind for accessing link is not present between his page, the suspicious page set of the first kind is obtained;
Second screening unit 42, for the access path based on each page and all source IPs, is filtered out The state accessed by source IP does not meet the suspicious page of Equations of The Second Kind of default access consideration, and obtaining Equations of The Second Kind can Doubt page set;
Processing unit 43, for the suspicious page set of the first kind and the suspicious page intersection of sets of Equations of The Second Kind will to be based on All pages included are concentrated, are defined as including the Webshell page.
Optionally, based on system access log, when extracting access path and the referer of each page, Extraction unit 40 is used for:
Each page recorded in acquisition system access log;
Static page is removed from each page obtained;
Unified resource positioning URL addresses for removing each page after static page, carry out parameter Remove, extract the access path of each page;
Access link order based on each page recorded in system access log, determines each page respectively The referer in face.
Optionally, access path and referer based on each page, filter out and are not present with other pages When accessing the suspicious page of the first kind of link, the first screening unit 41 is used for:
Access path and referer based on each page, draw access link between characterizing each page and close The digraph of system, is filtered out based on the digraph and accesses the isolated page linked with being not present between other pages Face, regard all isolated pages as the suspicious page of the first kind;
Access path and all source IPs based on each page, filter out the state accessed by source IP and are not inconsistent When closing the suspicious page of Equations of The Second Kind of default access consideration, the second screening unit 42 is used for:
Access path and all source IPs based on each page, draw and characterize each source IP and each page Between access relation bigraph (bipartite graph), based on the bigraph (bipartite graph), filter out the state accessed by source IP and do not meet The abnormal page of preparatory condition, regard all abnormal pages as the suspicious page of the Equations of The Second Kind.
Optionally, filtered out based on the digraph and access the isolated page linked with being not present between other pages During face, the first screening unit 41 is used for, including:
Based on the digraph, it is space-time to determine the corresponding referer of any one page, is judged described any one The page is the isolated page;And,
Based on the digraph, the access path pointed by the corresponding referer of any one page is determined, is being visited When not accessed before asking any one page, judge any one page as the isolated page.
Optionally, based on the bigraph (bipartite graph), filter out the state accessed by source IP and do not meet preparatory condition During the abnormal page, the second screening unit 42 is used for, including:
Based on the bigraph (bipartite graph), determine that the total number of the source IP of any one page association is less than the first pre- gating Limit, and the total number of the network segment of the source IP ownership of association is when being less than the second pre-determined threshold, determines described any One page is the abnormal page.
Optionally, the suspicious page set of the first kind and the suspicious page intersection of sets of Equations of The Second Kind are concentrated to the institute included There is the page, when being defined as the page comprising Webshell, processing unit 43 is used for:
The suspicious page set of the first kind and the suspicious page intersection of sets of Equations of The Second Kind are concentrated to all pages included, It is determined directly as including the Webshell page;Or,
What the calculating suspicious page set of the first kind and the suspicious page intersection of sets concentration of Equations of The Second Kind were included successively is every One page, the similarity with default Webshell sample sets, and similarity is reached into given threshold The page is defined as including the Webshell page.
In summary, in the embodiment of the present application, the access of each page characterized based on system access log is closed System sets up graph model, filters out and accesses the suspicious page set of the first kind linked with being not present between other pages Close, and filter out the suspicious page of Equations of The Second Kind that the state accessed by source IP does not meet default access consideration, All pages included will be concentrated based on the suspicious page set of the first kind and the suspicious page intersection of sets of Equations of The Second Kind again Face, is defined as including the Webshell page.So, the access state of each page can be passed through artwork Type is precisely drawn, and makes the feature of the suspicious page more intuitively be shown in graph model, so that The page where Webshell can be accurately filtered out, the detection that this not only increases Webshell is accurate Degree, reduces rate of failing to report and rate of false alarm, meanwhile, algorithm complex is also reduced, detection efficiency is improved.
Further, the technical scheme that the embodiment of the present application is provided can realize Active Learning, without artificial dry In advance, O&M cost is greatly reduced.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or meter Calculation machine program product.Therefore, the application can be using complete hardware embodiment, complete software embodiment or knot The form of embodiment in terms of conjunction software and hardware.Wherein wrapped one or more moreover, the application can be used Containing computer usable program code computer-usable storage medium (include but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) on the form of computer program product implemented.
The application is produced with reference to according to the method, equipment (system) and computer program of the embodiment of the present application The flow chart and/or block diagram of product is described.It should be understood that can by computer program instructions implementation process figure and / or each flow and/or square frame in block diagram and the flow in flow chart and/or block diagram and/ Or the combination of square frame.These computer program instructions can be provided to all-purpose computer, special-purpose computer, insertion Formula processor or the processor of other programmable data processing devices are to produce a machine so that pass through and calculate The instruction of the computing device of machine or other programmable data processing devices is produced for realizing in flow chart one The device for the function of being specified in individual flow or multiple flows and/or one square frame of block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or the processing of other programmable datas to set In the standby computer-readable memory worked in a specific way so that be stored in the computer-readable memory Instruction produce include the manufacture of command device, the command device realization in one flow or multiple of flow chart The function of being specified in one square frame of flow and/or block diagram or multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices, made Obtain and perform series of operation steps on computer or other programmable devices to produce computer implemented place Reason, so that the instruction performed on computer or other programmable devices is provided for realizing in flow chart one The step of function of being specified in flow or multiple flows and/or one square frame of block diagram or multiple square frames.
Although having been described for the preferred embodiment of the application, those skilled in the art once know base This creative concept, then can make other change and modification to these embodiments.So, appended right will Ask and be intended to be construed to include preferred embodiment and fall into having altered and changing for the application scope.
Obviously, those skilled in the art can carry out various changes and modification without de- to the embodiment of the present application From the spirit and scope of the embodiment of the present application.So, if these modifications and variations category of the embodiment of the present application Within the scope of the application claim and its equivalent technologies, then the application be also intended to comprising these change and Including modification.

Claims (14)

1. a kind of webpage back door Webshell detection method, it is characterised in that including:
Based on system access log, the access path and source page information referer of each page are extracted, And all source IPs are extracted, wherein, the referer of a page is characterized to be visited before accessing one page The access path for the upper page asked;
Access path and referer based on each page, filter out and are accessed with being not present between other pages The suspicious page of the first kind of link, obtains the suspicious page set of the first kind;
Access path and all source IPs based on each page, filter out the state accessed by source IP and are not inconsistent The suspicious page of Equations of The Second Kind of default access consideration is closed, the suspicious page set of Equations of The Second Kind is obtained;
The suspicious page set of the first kind and the suspicious page intersection of sets of Equations of The Second Kind are concentrated to all pages included, It is defined as including the Webshell page.
2. the method as described in claim 1, it is characterised in that based on system access log, extract The access path and referer of each page, including:
Each page recorded in acquisition system access log;
Static page is removed from each page obtained;
Unified resource positioning URL addresses for removing each page after static page, carry out parameter Remove, extract the access path of each page;
Access link order based on each page recorded in system access log, determines each page respectively The referer in face.
3. the method as described in claim 1, it is characterised in that the access path based on each page with And referer, filter out and the suspicious page of the first kind that access is linked is not present with other pages, including:
Access path and referer based on each page, draw access link between characterizing each page and close The digraph of system, is filtered out based on the digraph and accesses the isolated page linked with being not present between other pages Face, regard all isolated pages as the suspicious page of the first kind.
4. the method as described in claim 1, it is characterised in that the access path based on each page with And all source IPs, filter out the state accessed by source IP do not meet default access consideration Equations of The Second Kind it is suspicious The page, including:
Access path and all source IPs based on each page, draw and characterize each source IP and each page Between access relation bigraph (bipartite graph), based on the bigraph (bipartite graph), filter out the state accessed by source IP and do not meet The abnormal page of preparatory condition, regard all abnormal pages as the suspicious page of the Equations of The Second Kind.
5. method as claimed in claim 3, it is characterised in that filtered out and it based on the digraph The isolated page for accessing link is not present between his page, including:
Based on the digraph, it is space-time to determine the corresponding referer of any one page, is judged described any one The page is the isolated page;And,
Based on the digraph, the access path pointed by the corresponding referer of any one page is determined, is being visited When not accessed before asking any one page, judge any one page as the isolated page.
6. method as claimed in claim 4, it is characterised in that based on the bigraph (bipartite graph), filter out by The state that source IP is accessed does not meet the abnormal page of preparatory condition, including:
Based on the bigraph (bipartite graph), determine that the total number of the source IP of any one page association is less than the first pre- gating Limit, and the total number of the network segment of the source IP ownership of association is when being less than the second pre-determined threshold, determines described any One page is the abnormal page.
7. the method as described in claim any one of 1-6, it is characterised in that by the suspicious page of the first kind Set and the suspicious page intersection of sets of Equations of The Second Kind concentrate all pages included, are defined as including Webshell The page, including:
The suspicious page set of the first kind and the suspicious page intersection of sets of Equations of The Second Kind are concentrated to all pages included, It is determined directly as including the Webshell page;Or,
What the calculating suspicious page set of the first kind and the suspicious page intersection of sets concentration of Equations of The Second Kind were included successively is every One page, the similarity with default Webshell sample sets, and similarity is reached into given threshold The page is defined as including the Webshell page.
8. a kind of webpage back door Webshell detection means, it is characterised in that including:
Extraction unit, for based on system access log, extracting access path and the source of each page Page info referer, and all source IPs are extracted, wherein, the referer of a page, which is characterized, accesses institute State the access path of the upper page accessed before a page;
First screening unit, for access path and referer based on each page, is filtered out and other The suspicious page of the first kind for accessing link is not present between the page, the suspicious page set of the first kind is obtained;
Second screening unit, for the access path based on each page and all source IPs, is filtered out by source The state that IP is accessed does not meet the suspicious page of Equations of The Second Kind of default access consideration, obtains the suspicious page of Equations of The Second Kind Set;
Processing unit, for the suspicious page set of the first kind and the suspicious page intersection of sets collection of Equations of The Second Kind will to be based on In all pages for including, be defined as including the Webshell page.
9. device as claimed in claim 8, it is characterised in that based on system access log, extract When the access path and referer of each page, the extraction unit is used for:
Each page recorded in acquisition system access log;
Static page is removed from each page obtained;
Unified resource positioning URL addresses for removing each page after static page, carry out parameter Remove, extract the access path of each page;
Access link order based on each page recorded in system access log, determines each page respectively The referer in face.
10. device as claimed in claim 8, it is characterised in that the access path based on each page with And referer, filter out and be not present with other pages when accessing the suspicious page of the first kind that links, described first Screening unit is used for:
Access path and referer based on each page, draw access link between characterizing each page and close The digraph of system, is filtered out based on the digraph and accesses the isolated page linked with being not present between other pages Face, regard all isolated pages as the suspicious page of the first kind.
11. device as claimed in claim 8, it is characterised in that the access path based on each page with And all source IPs, filter out the state accessed by source IP do not meet default access consideration Equations of The Second Kind it is suspicious During the page, second screening unit is used for:
Access path and all source IPs based on each page, draw and characterize each source IP and each page Between access relation bigraph (bipartite graph), based on the bigraph (bipartite graph), filter out the state accessed by source IP and do not meet The abnormal page of preparatory condition, regard all abnormal pages as the suspicious page of the Equations of The Second Kind.
12. device as claimed in claim 10, it is characterised in that based on the digraph filter out with When the isolated page for accessing link being not present between other pages, first screening unit is used for, including:
Based on the digraph, it is space-time to determine the corresponding referer of any one page, is judged described any one The page is the isolated page;And,
Based on the digraph, the access path pointed by the corresponding referer of any one page is determined, is being visited When not accessed before asking any one page, judge any one page as the isolated page.
13. device as claimed in claim 11, it is characterised in that based on the bigraph (bipartite graph), filter out When the state accessed by source IP does not meet the abnormal page of preparatory condition, second screening unit is used for, Including:
Based on the bigraph (bipartite graph), determine that the total number of the source IP of any one page association is less than the first pre- gating Limit, and the total number of the network segment of the source IP ownership of association is when being less than the second pre-determined threshold, determines described any One page is the abnormal page.
14. the device as described in claim any one of 8-13, it is characterised in that by first kind suspect pages Face is gathered and the suspicious page intersection of sets of Equations of The Second Kind concentrates all pages included, is defined as including Webshell The page when, the processing unit is used for:
The suspicious page set of the first kind and the suspicious page intersection of sets of Equations of The Second Kind are concentrated to all pages included, It is determined directly as including the Webshell page;Or,
What the calculating suspicious page set of the first kind and the suspicious page intersection of sets concentration of Equations of The Second Kind were included successively is every One page, the similarity with default Webshell sample sets, and similarity is reached into given threshold The page is defined as including the Webshell page.
CN201610184299.XA 2016-03-28 2016-03-28 Webshell detection method and device Active CN107241296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610184299.XA CN107241296B (en) 2016-03-28 2016-03-28 Webshell detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610184299.XA CN107241296B (en) 2016-03-28 2016-03-28 Webshell detection method and device

Publications (2)

Publication Number Publication Date
CN107241296A true CN107241296A (en) 2017-10-10
CN107241296B CN107241296B (en) 2020-06-05

Family

ID=59983227

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610184299.XA Active CN107241296B (en) 2016-03-28 2016-03-28 Webshell detection method and device

Country Status (1)

Country Link
CN (1) CN107241296B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107911433A (en) * 2017-12-21 2018-04-13 上海数烨数据科技有限公司 A kind of LAN cluster system access method based on WebShell
CN108337269A (en) * 2018-03-28 2018-07-27 杭州安恒信息技术股份有限公司 A kind of WebShell detection methods
CN108959928A (en) * 2018-06-29 2018-12-07 北京奇虎科技有限公司 A kind of detection method, device, equipment and the storage medium at webpage back door
CN109040073A (en) * 2018-08-07 2018-12-18 北京神州绿盟信息安全科技股份有限公司 A kind of detection method, device, medium and the equipment of the access of WWW abnormal behaviour
CN109067696A (en) * 2018-05-29 2018-12-21 湖南鼎源蓝剑信息科技有限公司 Webshell detection method and system based on figure similarity analysis
CN109492692A (en) * 2018-11-07 2019-03-19 北京知道创宇信息技术有限公司 A kind of webpage back door detection method, device, electronic equipment and storage medium
CN109831429A (en) * 2019-01-30 2019-05-31 新华三信息安全技术有限公司 A kind of Webshell detection method and device
CN110135162A (en) * 2019-05-27 2019-08-16 深信服科技股份有限公司 The recognition methods of the back door WEBSHELL, device, equipment and storage medium
CN110446314A (en) * 2019-07-10 2019-11-12 奇酷互联网络科技(深圳)有限公司 Mesh road lamp system, management method and storage medium
CN110572397A (en) * 2019-09-10 2019-12-13 上海斗象信息科技有限公司 Flow-based webshell detection method
CN112182561A (en) * 2020-09-24 2021-01-05 百度在线网络技术(北京)有限公司 Method and device for detecting rear door, electronic equipment and medium
CN112765022A (en) * 2021-01-18 2021-05-07 北京长亭未来科技有限公司 Webshell static detection method based on data stream and electronic device
CN113779571A (en) * 2020-06-10 2021-12-10 中国电信股份有限公司 WebShell detection device, WebShell detection method and computer-readable storage medium
CN113806742A (en) * 2020-06-15 2021-12-17 中国电信股份有限公司 WebShell detection device, WebShell detection method and computer-readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103532969A (en) * 2013-10-23 2014-01-22 国家电网公司 Zombie network detection method, device and processor
CN103561012A (en) * 2013-10-28 2014-02-05 中国科学院信息工程研究所 WEB backdoor detection method and system based on relevance tree
CN103823883A (en) * 2014-03-06 2014-05-28 焦点科技股份有限公司 Analysis method and system for website user access path
US20150256551A1 (en) * 2012-10-05 2015-09-10 Myoung Hun Kang Log analysis system and log analysis method for security system
CN106911636A (en) * 2015-12-22 2017-06-30 北京奇虎科技有限公司 A kind of method and device of detection website with the presence or absence of backdoor programs

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150256551A1 (en) * 2012-10-05 2015-09-10 Myoung Hun Kang Log analysis system and log analysis method for security system
CN103532969A (en) * 2013-10-23 2014-01-22 国家电网公司 Zombie network detection method, device and processor
CN103561012A (en) * 2013-10-28 2014-02-05 中国科学院信息工程研究所 WEB backdoor detection method and system based on relevance tree
CN103823883A (en) * 2014-03-06 2014-05-28 焦点科技股份有限公司 Analysis method and system for website user access path
CN106911636A (en) * 2015-12-22 2017-06-30 北京奇虎科技有限公司 A kind of method and device of detection website with the presence or absence of backdoor programs

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MOOM: "webshell检测-日志分析", 《MOON专注网络安全》 *
石刘洋等: ""基于Web日志的Webshell检测方法"", 《信息安全研究》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107911433A (en) * 2017-12-21 2018-04-13 上海数烨数据科技有限公司 A kind of LAN cluster system access method based on WebShell
CN108337269A (en) * 2018-03-28 2018-07-27 杭州安恒信息技术股份有限公司 A kind of WebShell detection methods
CN108337269B (en) * 2018-03-28 2020-12-15 杭州安恒信息技术股份有限公司 WebShell detection method
CN109067696B (en) * 2018-05-29 2020-12-08 湖南鼎源蓝剑信息科技有限公司 Webshell detection method and system based on graph similarity analysis
CN109067696A (en) * 2018-05-29 2018-12-21 湖南鼎源蓝剑信息科技有限公司 Webshell detection method and system based on figure similarity analysis
CN108959928A (en) * 2018-06-29 2018-12-07 北京奇虎科技有限公司 A kind of detection method, device, equipment and the storage medium at webpage back door
CN109040073A (en) * 2018-08-07 2018-12-18 北京神州绿盟信息安全科技股份有限公司 A kind of detection method, device, medium and the equipment of the access of WWW abnormal behaviour
CN109492692A (en) * 2018-11-07 2019-03-19 北京知道创宇信息技术有限公司 A kind of webpage back door detection method, device, electronic equipment and storage medium
CN109831429A (en) * 2019-01-30 2019-05-31 新华三信息安全技术有限公司 A kind of Webshell detection method and device
CN110135162A (en) * 2019-05-27 2019-08-16 深信服科技股份有限公司 The recognition methods of the back door WEBSHELL, device, equipment and storage medium
CN110446314A (en) * 2019-07-10 2019-11-12 奇酷互联网络科技(深圳)有限公司 Mesh road lamp system, management method and storage medium
CN110572397A (en) * 2019-09-10 2019-12-13 上海斗象信息科技有限公司 Flow-based webshell detection method
CN113779571A (en) * 2020-06-10 2021-12-10 中国电信股份有限公司 WebShell detection device, WebShell detection method and computer-readable storage medium
CN113779571B (en) * 2020-06-10 2024-04-26 天翼云科技有限公司 WebShell detection device, webShell detection method and computer readable storage medium
CN113806742A (en) * 2020-06-15 2021-12-17 中国电信股份有限公司 WebShell detection device, WebShell detection method and computer-readable storage medium
CN112182561A (en) * 2020-09-24 2021-01-05 百度在线网络技术(北京)有限公司 Method and device for detecting rear door, electronic equipment and medium
CN112182561B (en) * 2020-09-24 2024-04-30 百度在线网络技术(北京)有限公司 Rear door detection method and device, electronic equipment and medium
CN112765022A (en) * 2021-01-18 2021-05-07 北京长亭未来科技有限公司 Webshell static detection method based on data stream and electronic device
CN112765022B (en) * 2021-01-18 2023-07-25 北京长亭未来科技有限公司 Webshell static detection method based on data stream and electronic equipment

Also Published As

Publication number Publication date
CN107241296B (en) 2020-06-05

Similar Documents

Publication Publication Date Title
CN107241296A (en) A kind of Webshell detection method and device
US11196756B2 (en) Identifying notable events based on execution of correlation searches
CN105554007B (en) A kind of web method for detecting abnormality and device
CN108259494B (en) Network attack detection method and device
CN104486461B (en) Domain name classification method and device, domain name recognition methods and system
CN108092962A (en) A kind of malice URL detection method and device
CN103077254B (en) Webpage acquisition methods and device
CN104579773B (en) Domain name system analyzes method and device
US9699142B1 (en) Cross-site scripting defense using document object model template
CN106888211A (en) The detection method and device of a kind of network attack
US20160350370A1 (en) Search results based on a search history
CN109040073A (en) A kind of detection method, device, medium and the equipment of the access of WWW abnormal behaviour
Spreitzer et al. Scandroid: Automated side-channel analysis of android apis
Karim et al. Mining android apps to recommend permissions
Landauer et al. Time series analysis: unsupervised anomaly detection beyond outlier detection
CN114528457A (en) Web fingerprint detection method and related equipment
CN112307292A (en) Information processing method and system based on advanced persistent threat attack
CN104836779B (en) XSS leak detection method, system and Web server
CN112988509A (en) Alarm message filtering method and device, electronic equipment and storage medium
CN109428857A (en) A kind of detection method and device of malice detection behavior
CN103036896A (en) Method and system for testing malicious links
Negoita et al. Enhanced security using elasticsearch and machine learning
CN106650610A (en) Human face expression data collection method and device
Camiña et al. Towards building a masquerade detection method based on user file system navigation
CN104158697B (en) A kind of dead chain detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant