CN104050257A - Detection method and device for phishing webpage - Google Patents

Detection method and device for phishing webpage Download PDF

Info

Publication number
CN104050257A
CN104050257A CN201410265323.3A CN201410265323A CN104050257A CN 104050257 A CN104050257 A CN 104050257A CN 201410265323 A CN201410265323 A CN 201410265323A CN 104050257 A CN104050257 A CN 104050257A
Authority
CN
China
Prior art keywords
webpage
detected
summary info
target web
web page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410265323.3A
Other languages
Chinese (zh)
Inventor
梅银明
邹荣新
刘军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu International Technology Shenzhen Co Ltd
Original Assignee
Baidu International Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baidu International Technology Shenzhen Co Ltd filed Critical Baidu International Technology Shenzhen Co Ltd
Priority to CN201410265323.3A priority Critical patent/CN104050257A/en
Publication of CN104050257A publication Critical patent/CN104050257A/en
Priority to PCT/CN2014/094147 priority patent/WO2015188604A1/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a detection method for a phishing webpage. The method comprises the steps of extracting the webpage template characteristic of a webpage to be detected, and obtaining the first summary information of the webpage template characteristic; determining whether the first summary information belongs to the second summary information in a preset data base, wherein the second summary information is the summary information which is obtained according to the webpage template characteristic of a target webpage; when the condition that the first summary information belongs to the second summary information in the preset data base is determined, further determining whether the domain name of the webpage to be detected and the domain name of the target webpage are consistent; when the condition that the domain name of the webpage to be detected and the domain name of the target webpage are inconsistent is determined, determining that the webpage to be detected is a phishing webpage which counterfeits the target webpage. According to the detection method, the problem that the domain name of the phishing webpage has the timeliness characteristic is avoided, the detection accuracy is improved, the phishing webpage can be fundamentally detected, and the feasibility and the usability are improved. The invention also discloses a detection device for the phishing webpage.

Description

The detection method of fishing webpage and device
Technical field
The present invention relates to Internet technical field, relate in particular to a kind of detection method and device of fishing webpage.
Background technology
Along with the fast development of Internet technology, the black interests chain in internet is organized and is formed, and development fast.Along with the maturation of antivirus technique, utilize scale-of-two trojan horse to make a profit and become more and more difficult, black interests chain tissue starts target diversion fishing webpage to swindle, because the cost of fishing webpage is low, it is fast to benefit, it is fast to propagate, anti-phishing technology imperfection etc., various fishing webpages are emerged in an endless stream.
For the various fishing webpages that emerge in an endless stream; correlation technique can adopt following two kinds of modes to take precautions against: one is network protection product; for example; net purchase bodyguard, account protection product etc., these network protection products provide a safety entrance to user; user is logined from safety entrance; but the problem that this mode exists is, cannot fundamentally detect fishing website, can only play a protective role to particular webpage.Another kind is to collect fishing webpage to form fishing webpage storehouse, in the time that accessing certain webpage, user judges by inquiry fishing webpage storehouse whether the webpage that user accesses is fishing webpage, but the problem that this mode exists is, fishing webpage ageing very short, general several hours, some is even less than one hour, and in a lot of situations, fishing webpage is not also just put in storage and lost efficacy.
Summary of the invention
Object of the present invention is intended to solve at least to a certain extent one of above-mentioned technical matters.
For this reason, first object of the present invention is to propose a kind of detection method of fishing webpage.The method can avoid the domain name of fishing webpage to have the problem of ageing feature, has improved the accuracy detecting, and can fundamentally detect fishing website, thereby improved feasibility and availability.
Second object of the present invention is to propose a kind of pick-up unit of fishing webpage.
To achieve these goals, the detection method of the fishing webpage of first aspect present invention embodiment, comprising: extract the web page template feature of webpage to be detected, and obtain the first summary info of described web page template feature; Determine whether described the first summary info belongs to the second summary info in presetting database, described the second summary info is the summary info obtaining according to the web page template feature of target web; In the time that definite described the first summary info belongs to the second summary info in presetting database, further determine that whether the domain name of described webpage to be detected and the domain name of described target web be consistent; And when inconsistent, determine that described webpage to be detected is the fishing webpage of counterfeit described target web in definite domain name of described webpage to be detected and the domain name of described target web.
The detection method of the fishing webpage of the embodiment of the present invention, can extract the web page template feature of webpage to be detected and obtain its first summary info, and in the time that definite the first summary info belongs to the second summary info in presetting database, further determine that whether the domain name of webpage to be detected and the domain name of target web be consistent, when inconsistent, determine that webpage to be detected is the fishing webpage of counterfeit target web, avoid the domain name of fishing webpage to there is the problem of ageing feature, improve the accuracy detecting, and can fundamentally detect fishing website, thereby feasibility and availability are improved.
To achieve these goals, the pick-up unit of the fishing webpage of second aspect present invention embodiment, comprising: acquisition module, for extracting the web page template feature of webpage to be detected, and obtains the first summary info of described web page template feature; The first determination module, for determining whether described the first summary info belongs to the second summary info of presetting database, and described the second summary info is the summary info obtaining according to the web page template feature of target web; The second determination module, in the time that described the first determination module determines that described the first summary info belongs to the second summary info of presetting database, further determines that whether the domain name of described webpage to be detected and the domain name of described target web be consistent; And the 3rd determination module, determine that for described the second determination module the domain name of described webpage to be detected and the domain name of described target web when inconsistent, determine that described webpage to be detected is the fishing webpage of counterfeit described target web.
The pick-up unit of the fishing webpage of the embodiment of the present invention, can extract the web page template feature of webpage to be detected and obtain its first summary info by acquisition module, when the second determination module determines that at the first determination module the first summary info belongs to the second summary info in presetting database, further determine that whether the domain name of webpage to be detected and the domain name of target web be consistent, when inconsistent, the 3rd determination module determines that webpage to be detected is the fishing webpage of counterfeit target web, avoid the domain name of fishing webpage to there is the problem of ageing feature, improve the accuracy detecting, and can fundamentally detect fishing website, thereby feasibility and availability are improved.
The aspect that the present invention is additional and advantage in the following description part provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Brief description of the drawings
The present invention above-mentioned and/or additional aspect and advantage will become from the following description of the accompanying drawings of embodiments obviously and easily and understand, wherein,
Fig. 1 is the process flow diagram of the detection method of fishing webpage according to an embodiment of the invention;
Fig. 2 is the process flow diagram of the detection method of fishing webpage in accordance with another embodiment of the present invention;
Fig. 3 is the process flow diagram of the detection method of the fishing webpage of another embodiment according to the present invention;
Fig. 4 is the process flow diagram of the detection method of the fishing webpage of another embodiment according to the present invention;
Fig. 5 is the structural representation of the pick-up unit of fishing webpage according to an embodiment of the invention;
Fig. 6 is the structural representation of the pick-up unit of fishing webpage in accordance with another embodiment of the present invention.
Embodiment
Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar module or has the module of identical or similar functions from start to finish.Be exemplary below by the embodiment being described with reference to the drawings, only for explaining the present invention, and can not be interpreted as limitation of the present invention.On the contrary, embodiments of the invention comprise all changes, amendment and the equivalent within the scope of spirit and the intension that falls into additional claims.
In description of the invention, it will be appreciated that, term " first ", " second " etc. are only for describing object, and can not be interpreted as instruction or hint relative importance.In addition,, in description of the invention, except as otherwise noted, the implication of " multiple " is two or more.
Any process of otherwise describing in process flow diagram or at this or method are described and can be understood to, represent to comprise that one or more is for realizing module, fragment or the part of code of executable instruction of step of specific logical function or process, and the scope of the preferred embodiment of the present invention comprises other realization, wherein can be not according to order shown or that discuss, comprise according to related function by the mode of basic while or by contrary order, carry out function, this should be understood by embodiments of the invention person of ordinary skill in the field.
Describe according to the detection method of the fishing webpage of the embodiment of the present invention and device below with reference to accompanying drawing.
The detection method that the present invention proposes a kind of fishing webpage, comprising: extract the web page template feature of webpage to be detected, and obtain the first summary info of web page template feature; Determine whether the first summary info belongs to the second summary info in presetting database, the second summary info is the summary info obtaining according to the web page template feature of target web; In the time that definite the first summary info belongs to the second summary info in presetting database, further determine that whether the domain name of webpage to be detected and the domain name of target web be consistent; And when inconsistent, determine that webpage to be detected is the fishing webpage of counterfeit target web in definite domain name of webpage to be detected and the domain name of target web.
Fig. 1 is the process flow diagram of the detection method of fishing webpage according to an embodiment of the invention.
As shown in Figure 1, the detection method of this fishing webpage comprises:
S101, extracts the web page template feature of webpage to be detected, and obtains the first summary info of web page template feature.
Wherein, in one embodiment of the invention, webpage to be detected can be user's login page.Be to be understood that; webpage quantity is very huge at present; protect each webpage be unrealistic be also unnecessary; the final purpose of fishing website is to steal user's useful information (as the information such as account, password); hence one can see that, only needs to detect user's login page and can effectively protect user profile, reduced so greatly protection domain; and the problem that a problem not restraining has been become to a convergence, has improved feasibility.
In addition, in one embodiment of the invention, web page template feature can comprise web page title, the descriptor of webpage, the copyright information of webpage, the content information of the <h1><h2GreatT.Gr eaT.GT<h3><h4Gre atT.GreaT.GT label of webpage, the content information of the <p> label of webpage, the style sheet information of webpage, the form information of webpage, the navigation information of webpage, the label frame information of webpage, at least one in the display icon information of webpage etc.
In addition, the first summary info can be the information of HTML (Hyper Text Markup language, HTML (Hypertext Markup Language)) file.Be to be understood that, due to URL (the Uniform Resource Locator of fishing webpage, uniform resource locator) all have ageing, and the Page Template of fishing webpage can be followed by the template of counterfeit target web substantially similar, therefore, judge when whether webpage to be detected is fishing webpage and also need to obtain the html file information that webpage to be detected is corresponding, can effectively tackle like this fishing webpage identification and the random more fishing website of new domain name of automatic generation.
That is to say, can first from webpage to be detected (being user's login page), extract web page template feature, can from web page template feature, obtain afterwards the information of the html file in webpage.
S102, determines whether the first summary info belongs to the second summary info in presetting database, and the second summary info is the summary info obtaining according to the web page template feature of target web.
Wherein, in one embodiment of the invention, the second summary info can be the information of html file.
Particularly, database that can be default according to this locality, determines whether the first summary info belongs to the second summary info; And/or, the first summary info can be sent to high in the clouds, so that high in the clouds determines according to high in the clouds database whether the first summary info belongs to the second summary info in presetting database.That is to say, local default database can be stored the second summary info corresponding to webpage that some temperatures are larger, can first scan the default database in this locality by local engine, if local engine does not detect the second corresponding summary info in the default database in this locality according to the first summary info, the first summary info can be sent to high in the clouds, high in the clouds determines according to high in the clouds database whether the first summary info belongs to the second summary info in presetting database.Thus, by local engine and high in the clouds engine economic benefits and social benefits combination, improved availability.
Further, in one embodiment of the invention, the detection method of this fishing webpage can also comprise: set up presetting database.Particularly, can first obtain target web, and judge whether the visit capacity of target web exceedes default visit capacity, and/or whether the counterfeit number of times of target web exceedes default counterfeit number of times.Afterwards, exceed default visit capacity in the visit capacity that judges target web, and/or, when the counterfeit number of times of target web exceedes default counterfeit number of times, extract the web page template feature of target web, and the second summary info of web page template feature that obtains target web is to set up presetting database.
It should be noted that, in one embodiment of the invention, in the time that definite the first summary info does not belong to the second summary info in presetting database, can return to unknown message, and whether finish webpage to be detected be the detection of fishing webpage.Afterwards, can whether be fishing webpage by this webpage to be detected of manual analysis, if not, can extract the web page template feature of this webpage, and obtain the second summary info of web page template feature, and be saved in presetting database.Thus, can expand and improve presetting database.
S103, in the time that definite the first summary info belongs to the second summary info in presetting database, further determines that whether the domain name of webpage to be detected and the domain name of target web be consistent.
S104, when inconsistent, determines that webpage to be detected is the fishing webpage of counterfeit target web in definite domain name of webpage to be detected and the domain name of target web.
Be to be understood that, in an embodiment of the present invention, belong to the second summary info in presetting database at definite the first summary info, and determine that the domain name of webpage to be detected and the domain name of target web are when consistent, can determine that this webpage to be detected is secure web-page, be not fishing webpage.
The detection method of the fishing webpage of the embodiment of the present invention, can extract the web page template feature of webpage to be detected and obtain its first summary info, and in the time that definite the first summary info belongs to the second summary info in presetting database, further determine that whether the domain name of webpage to be detected and the domain name of target web be consistent, when inconsistent, determine that webpage to be detected is the fishing webpage of counterfeit target web, avoid the domain name of fishing webpage to there is the problem of ageing feature, improve the accuracy detecting, and can fundamentally detect fishing website, thereby feasibility and availability are improved.
Fig. 2 is the process flow diagram of the detection method of fishing webpage in accordance with another embodiment of the present invention.
Experience in order to promote user, in an embodiment of the present invention, after definite webpage to be detected is the fishing webpage of counterfeit target web, can sends warning message and target web is provided to user.Particularly, as shown in Figure 2, the detection method of this fishing webpage can comprise:
S201, extracts the web page template feature of webpage to be detected, and obtains the first summary info of web page template feature.
S202, determines whether the first summary info belongs to the second summary info in presetting database, and the second summary info is the summary info obtaining according to the web page template feature of target web.
S203, in the time that definite the first summary info belongs to the second summary info in presetting database, further determines that whether the domain name of webpage to be detected and the domain name of target web be consistent.
S204, when inconsistent, determines that webpage to be detected is the fishing webpage of counterfeit target web in definite domain name of webpage to be detected and the domain name of target web.
S205, sends warning message and target web is provided to user.
Particularly, after definite webpage to be detected is the fishing webpage of counterfeit target web, can send warning message to user, taking prompting user opening or the webpage checked as fishing webpage, and will be represented to user by the correct network address of counterfeit target web, so that user goes login to target web.
The detection method of the fishing webpage of the embodiment of the present invention, after definite webpage to be detected is the fishing webpage of counterfeit target web, can sends warning message and target web is provided to user, so that user goes login to target web, has promoted user's experience.
Fig. 3 is the process flow diagram of the detection method of the fishing webpage of another embodiment according to the present invention.
In order to improve detection efficiency, improve accuracy in detection, before extracting the web page template feature of webpage to be detected, the network address that also can determine webpage to be detected whether in white list list of websites, if, whether finish webpage to be detected is the detection of fishing webpage.Particularly, as shown in Figure 3, the detection method of this fishing webpage can comprise:
S301, determines that the network address of webpage to be detected is whether in white list list of websites.
Particularly, in the time that user opens or checks webpage to be detected, can first obtain the network address of webpage to be detected, can judge that afterwards the network address of webpage to be detected is whether in white list list of websites.
S302, in the time determining that the network address of webpage to be detected is not in white list list of websites, extracts the web page template feature of webpage to be detected, and obtains the first summary info of web page template feature.
S303, determines whether the first summary info belongs to the second summary info in presetting database, and the second summary info is the summary info obtaining according to the web page template feature of target web.
S304, in the time that definite the first summary info belongs to the second summary info in presetting database, further determines that whether the domain name of webpage to be detected and the domain name of target web be consistent.
S305, when inconsistent, determines that webpage to be detected is the fishing webpage of counterfeit target web in definite domain name of webpage to be detected and the domain name of target web.
S306, sends warning message and target web is provided to user.
S307, whether in the time determining that the network address of webpage to be detected is in white list list of websites, finishing webpage to be detected is the detection of fishing webpage.
Whether particularly, in the time determining that the network address of webpage to be detected is in white list list of websites, can determine that the webpage to be detected that user accesses is normal webpage, can finish webpage to be detected is the detection of fishing webpage, the testing process after having saved.Thus, improve detection efficiency, improved accuracy in detection.
The detection method of the fishing webpage of the embodiment of the present invention, before extracting the web page template feature of webpage to be detected, can determine that the network address of webpage to be detected is whether in white list list of websites, if, whether can finish webpage to be detected is the detection of fishing webpage, testing process after having saved, has improved detection efficiency, and has improved accuracy in detection.
Fig. 4 is the process flow diagram of the detection method of the fishing webpage of another embodiment according to the present invention.
In order further to improve detection efficiency, before extracting the web page template feature of webpage to be detected, also can determine in webpage to be detected whether comprise login label information, whether if do not comprise, can finish webpage to be detected is the detection of fishing webpage.Particularly, as shown in Figure 4, the detection method of this fishing webpage can comprise:
S401, determines that the network address of webpage to be detected is whether in white list list of websites.
S402, in the time determining that the network address of webpage to be detected is not in white list list of websites, determines in webpage to be detected, whether to comprise login label information.
For example, can be by judging whether comprise <input type=' password '/> in webpage to be detected, to determine whether comprise login label information in webpage to be detected.
S403, while comprising login label information, extracts the web page template feature of webpage to be detected, and obtains the first summary info of web page template feature in definite webpage to be detected.
S404, determines whether the first summary info belongs to the second summary info in presetting database, and the second summary info is the summary info obtaining according to the web page template feature of target web.
S405, in the time that definite the first summary info belongs to the second summary info in presetting database, further determines that whether the domain name of webpage to be detected and the domain name of target web be consistent.
S406, when inconsistent, determines that webpage to be detected is the fishing webpage of counterfeit target web in definite domain name of webpage to be detected and the domain name of target web.
S407, sends warning message and target web is provided to user.
S408, is determining that the network address of webpage to be detected is in white list list of websites, or, determine whether while not comprising login label information in webpage to be detected finishing webpage to be detected is the detection of fishing webpage.
Particularly, do not comprise login label information in definite webpage to be detected time, can determine that the webpage to be detected that user accesses does not comprise login page, be that user does not need input about oneself privacy information (as the information such as account, password) is with regard to webpage accessible, fishing webpage harm for user can reduce greatly like this, whether be the detection of fishing webpage, the testing process after having saved if now can finish webpage to be detected.Thus, improved detection efficiency.
It should be noted that, in one embodiment of the invention, step S401 (determining that the network address of webpage to be detected is whether in white list list of websites) is optional.Should be appreciated that in another embodiment of the present invention, step S402 also can carry out before step S401.That is to say first to determine in webpage to be detected, whether comprising login label information; If comprise, determine again that the network address of webpage to be detected is whether in white list list of websites.
The detection method of the fishing webpage of the embodiment of the present invention, before extracting the web page template feature of webpage to be detected, whether the network address that can determine webpage to be detected comprises login label information, if do not comprise, whether can finish webpage to be detected is the detection of fishing webpage, testing process after having saved, has further improved detection efficiency.
In order to realize above-described embodiment, embodiments of the invention also propose a kind of pick-up unit of fishing webpage, comprising: acquisition module, for extracting the web page template feature of webpage to be detected, and obtains the first summary info of web page template feature; The first determination module, for determining whether the first summary info belongs to the second summary info of presetting database, and the second summary info is the summary info obtaining according to the web page template feature of target web; The second determination module, in the time that the first determination module determines that the first summary info belongs to the second summary info of presetting database, further determines that whether the domain name of webpage to be detected and the domain name of target web be consistent; And the 3rd determination module, determine that for the second determination module the domain name of webpage to be detected and the domain name of target web when inconsistent, determine that webpage to be detected is the fishing webpage of counterfeit target web.
Fig. 5 is the structural representation of the pick-up unit of fishing webpage according to an embodiment of the invention.
As shown in Figure 5, the pick-up unit of this fishing webpage comprises: acquisition module 10, the first determination module 20, the second determination module 30 and the 3rd determination module 40.
Particularly, acquisition module 10 can be used for extracting the web page template feature of webpage to be detected, and obtains the first summary info of web page template feature.Wherein, in one embodiment of the invention, webpage to be detected can be user's login page.Be to be understood that; webpage quantity is very huge at present; protect each webpage be unrealistic be also unnecessary; the final purpose of fishing website is to steal user's useful information (as the information such as account, password); hence one can see that, only needs to detect user's login page and can effectively protect user profile, reduced so greatly protection domain; and the problem that a problem not restraining has been become to a convergence, has improved feasibility.
In addition, in one embodiment of the invention, web page template feature can comprise web page title, the descriptor of webpage, the copyright information of webpage, the content information of the <h1><h2GreatT.Gr eaT.GT<h3><h4Gre atT.GreaT.GT label of webpage, the content information of the <p> label of webpage, the style sheet information of webpage, the form information of webpage, the navigation information of webpage, the label frame information of webpage, at least one in the display icon information of webpage etc.
In addition, the first summary info can be the information of html file.Be to be understood that, because the URL of fishing webpage all has ageing, and the Page Template of fishing webpage can be followed by the template of counterfeit target web substantially similar, therefore, judge when whether webpage to be detected is fishing webpage and also need to obtain the html file information that webpage to be detected is corresponding, can effectively tackle like this fishing webpage identification and the random more fishing website of new domain name of automatic generation.
That is to say, acquisition module 10 can first extract web page template feature from webpage to be detected (being user's login page), can from web page template feature, obtain afterwards the information of the html file in webpage.
The first determination module 20 can be used for determining whether the first summary info belongs to the second summary info in presetting database, and the second summary info is the summary info obtaining according to the web page template feature of target web.Wherein, in one embodiment of the invention, the second summary info can be the information of html file.
Particularly, the database that the first determination module 20 can be default according to this locality, determines whether the first summary info belongs to the second summary info; And/or, the first summary info can be sent to high in the clouds, so that high in the clouds determines according to high in the clouds database whether the first summary info belongs to the second summary info in presetting database.。That is to say, local default database can be stored the second summary info corresponding to webpage that some temperatures are larger, the first determination module 20 can first scan the default database in this locality by local engine, if local engine does not detect the second corresponding summary info in the default database in this locality according to the first summary info, the first summary info can be sent to high in the clouds, high in the clouds determines according to high in the clouds database whether the first summary info belongs to the second summary info in presetting database.Thus, by local engine and high in the clouds engine economic benefits and social benefits combination, improved availability.
It should be noted that, in one embodiment of the invention, in the time that definite the first summary info does not belong to the second summary info in presetting database, can return to unknown message, and whether finish webpage to be detected be the detection of fishing webpage.Afterwards, can whether be fishing webpage by this webpage to be detected of manual analysis, if not, can extract the web page template feature of this webpage, and obtain the second summary info of web page template feature, and be saved in presetting database.Thus, can expand and improve presetting database.
The second determination module 30 is used in the first determination module 20 to be determined when the first summary info belongs to the second summary info in presetting database, further determines that whether the domain name of webpage to be detected and the domain name of target web be consistent.The 3rd determination module 40 can be used for the second determination module and determines that the domain name of 30 webpages to be detected and the domain name of target web when inconsistent, determine that webpage to be detected is the fishing webpage of counterfeit target web.
Be to be understood that, in an embodiment of the present invention, belong to the second summary info in presetting database at definite the first summary info, and determine that the domain name of webpage to be detected and the domain name of target web are when consistent, can determine that this webpage to be detected is secure web-page, be not fishing webpage.
The pick-up unit of the fishing webpage of the embodiment of the present invention, can extract the web page template feature of webpage to be detected and obtain its first summary info by acquisition module, when the second determination module determines that at the first determination module the first summary info belongs to the second summary info in presetting database, further determine that whether the domain name of webpage to be detected and the domain name of target web be consistent, when inconsistent, the 3rd determination module determines that webpage to be detected is the fishing webpage of counterfeit target web, avoid the domain name of fishing webpage to there is the problem of ageing feature, improve the accuracy detecting, and can fundamentally detect fishing website, thereby feasibility and availability are improved.
Fig. 6 is the structural representation of the pick-up unit of fishing webpage in accordance with another embodiment of the present invention.
As shown in Figure 6, the pick-up unit of this fishing webpage can comprise: acquisition module 10, the first determination module 20, the second determination module 30, the 3rd determination module 40, sending module 50.
Particularly, sending module 50 is used in the 3rd determination module 40 and determines that webpage to be detected is after the fishing webpage of counterfeit target web, sends warning message and target web is provided to user.More specifically, sending module 50 is after the 3rd determination module 40 determines that webpage to be detected is the fishing webpage of counterfeit target web, can send warning message to user, taking prompting user opening or the webpage checked as fishing webpage, and will be represented to user by the correct network address of counterfeit target web, so that user goes login to target web.Thus, promoted user's experience.
Alternatively, in one embodiment of the invention, as shown in Figure 6, the pick-up unit of this fishing webpage can also comprise the 4th determination module 60 and exit module 70.Before the 4th determination module 60 is used in the web page template feature that acquisition module 10 extracts webpage to be detected, determine that the network address of webpage to be detected is whether in white list list of websites.Whether exit module 70 and be used in network address that the 4th determination module 60 determines webpages to be detected in white list list of websites time, finishing webpage to be detected is the detection of fishing webpage.
More specifically, in the time that user opens or checks webpage to be detected, the 4th determination module 60 can first obtain the network address of webpage to be detected, can judge that afterwards the network address of webpage to be detected is whether in white list list of websites.Exit network address that module 70 determines webpages to be detected at the 4th determination module 60 in white list list of websites time, can determine that the webpage to be detected that user accesses is normal webpage, whether be the detection of fishing webpage, the testing process after having saved if can finish webpage to be detected.Thus, improve detection efficiency, and improved accuracy in detection.
Alternatively, in one embodiment of the invention, as shown in Figure 6, the pick-up unit of this fishing webpage can also comprise the 5th determination module 80, before the 5th determination module 80 is used in the web page template feature that acquisition module 10 extracts webpage to be detected, determine in webpage to be detected, whether to comprise login label information.Wherein, in one embodiment of the invention, exit module 70 and be also used in the 5th determination module 80 and determine whether while not comprising login label information in webpage to be detected finishing webpage to be detected is the detection of fishing webpage.
For example, the 5th determination module 80 can be by judging whether comprise <input type=' password '/> in webpage to be detected, to determine whether comprise login label information in webpage to be detected.Exit module 70 does not comprise while logining label information in the 5th determination module 80 is determined webpage to be detected, can determine that the webpage to be detected that user accesses does not comprise login page, be user do not need input about the privacy information (as the information such as account, password) of oneself be webpage accessible, fishing webpage harm for user can reduce greatly like this, whether be the detection of fishing webpage, the testing process after having saved if now can finish webpage to be detected.Thus, improved detection efficiency.
Further, in one embodiment of the invention, as shown in Figure 6, the pick-up unit of this fishing webpage also can comprise and set up module 90, sets up module 90 and can be used for setting up presetting database.Particularly, in an embodiment of the present invention, set up module 90 and can comprise acquiring unit 91, judging unit 92 and set up unit 93.Acquiring unit 91 can be used for obtaining target web.Judging unit 92 can be used for judging whether the visit capacity of target web exceedes default visit capacity, and/or whether the counterfeit number of times of target web exceedes default counterfeit number of times.Setting up unit 93 is used in the visit capacity that judging unit 92 judges target web and exceedes default visit capacity, and/or, when the counterfeit number of times of target web exceedes default counterfeit number of times, extract the web page template feature of target web, and the second summary info of web page template feature that obtains target web is to set up presetting database.Thus, determine according to presetting database is convenient whether the first summary info belongs to the second summary info in presetting database, has improved availability.
In description of the invention, it will be appreciated that, term " first ", " second " be only for describing object, and can not be interpreted as instruction or hint relative importance or the implicit quantity that indicates indicated technical characterictic.Thus, at least one this feature can be expressed or impliedly be comprised to the feature that is limited with " first ", " second ".In description of the invention, the implication of " multiple " is at least two, for example two, and three etc., unless otherwise expressly limited specifically.
Any process of otherwise describing in process flow diagram or at this or method are described and can be understood to, represent to comprise that one or more is for realizing module, fragment or the part of code of executable instruction of step of specific logical function or process, and the scope of the preferred embodiment of the present invention comprises other realization, wherein can be not according to order shown or that discuss, comprise according to related function by the mode of basic while or by contrary order, carry out function, this should be understood by embodiments of the invention person of ordinary skill in the field.
The logic and/or the step that in process flow diagram, represent or otherwise describe at this, for example, can be considered to the sequencing list of the executable instruction for realizing logic function, may be embodied in any computer-readable medium, use for instruction execution system, device or equipment (as computer based system, comprise that the system of processor or other can and carry out the system of instruction from instruction execution system, device or equipment instruction fetch), or use in conjunction with these instruction execution systems, device or equipment.With regard to this instructions, " computer-readable medium " can be anyly can comprise, device that storage, communication, propagation or transmission procedure use for instruction execution system, device or equipment or in conjunction with these instruction execution systems, device or equipment.The example more specifically (non-exhaustive list) of computer-readable medium comprises following: the electrical connection section (electronic installation) with one or more wirings, portable computer diskette box (magnetic device), random access memory (RAM), ROM (read-only memory) (ROM), the erasable ROM (read-only memory) (EPROM or flash memory) of editing, fiber device, and portable optic disk ROM (read-only memory) (CDROM).In addition, computer-readable medium can be even paper or other the suitable medium that can print described program thereon, because can be for example by paper or other media be carried out to optical scanning, then edit, decipher or process in electronics mode and obtain described program with other suitable methods if desired, be then stored in computer memory.
Should be appreciated that each several part of the present invention can realize with hardware, software, firmware or their combination.In the above-described embodiment, multiple steps or method can realize with being stored in software or the firmware carried out in storer and by suitable instruction execution system.For example, if realized with hardware, the same in another embodiment, can realize by any one in following technology well known in the art or their combination: there is the discrete logic for data-signal being realized to the logic gates of logic function, there is the special IC of suitable combinational logic gate circuit, programmable gate array (PGA), field programmable gate array (FPGA) etc.
Those skilled in the art are appreciated that realizing all or part of step that above-described embodiment method carries is can carry out the hardware that instruction is relevant by program to complete, described program can be stored in a kind of computer-readable recording medium, this program, in the time carrying out, comprises step of embodiment of the method one or a combination set of.
In addition, the each functional unit in each embodiment of the present invention can be integrated in a processing module, can be also that the independent physics of unit exists, and also can be integrated in a module two or more unit.Above-mentioned integrated module both can adopt the form of hardware to realize, and also can adopt the form of software function module to realize.If described integrated module realizes and during as production marketing independently or use, also can be stored in a computer read/write memory medium using the form of software function module.
The above-mentioned storage medium of mentioning can be ROM (read-only memory), disk or CD etc.
In the description of this instructions, the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means to be contained at least one embodiment of the present invention or example in conjunction with specific features, structure, material or the feature of this embodiment or example description.In this manual, the schematic statement of above-mentioned term is not necessarily referred to identical embodiment or example.And specific features, structure, material or the feature of description can be with suitable mode combination in any one or more embodiment or example.
Although illustrated and described embodiments of the invention, those having ordinary skill in the art will appreciate that: in the situation that not departing from principle of the present invention and aim, can carry out multiple variation, amendment, replacement and modification to these embodiment, scope of the present invention is limited by claim and equivalent thereof.

Claims (18)

1. a detection method for fishing webpage, is characterized in that, comprising:
Extract the web page template feature of webpage to be detected, and obtain the first summary info of described web page template feature;
Determine whether described the first summary info belongs to the second summary info in presetting database, described the second summary info is the summary info obtaining according to the web page template feature of target web;
In the time that definite described the first summary info belongs to the second summary info in presetting database, further determine that whether the domain name of described webpage to be detected and the domain name of described target web be consistent; And
When inconsistent, determine that described webpage to be detected is the fishing webpage of counterfeit described target web in definite domain name of described webpage to be detected and the domain name of described target web.
2. method according to claim 1, is characterized in that, described webpage to be detected is user's login page.
3. method according to claim 1, is characterized in that, the information that described the first summary info and described the second summary info are html files.
4. method according to claim 1, is characterized in that, whether described definite described the first summary info belongs to the second summary info in presetting database, comprising:
The database default according to this locality, determines whether described the first summary info belongs to the second summary info; And/or,
Described the first summary info is sent to high in the clouds, so that described high in the clouds determines according to high in the clouds database whether described the first summary info belongs to the second summary info in presetting database.
5. method according to claim 1, is characterized in that, after described definite described webpage to be detected is the fishing webpage of counterfeit described target web, also comprises:
Send warning message and described target web is provided to user.
6. method according to claim 1 or 5, is characterized in that, before the web page template feature of described extraction webpage to be detected, described method also comprises:
Determine that the network address of described webpage to be detected is whether in white list list of websites;
Whether in the time determining that the network address of described webpage to be detected is in white list list of websites, finishing described webpage to be detected is the detection of described fishing webpage.
7. method according to claim 1 or 5, is characterized in that, before the web page template feature of described extraction webpage to be detected, described method also comprises:
Determine in described webpage to be detected and whether comprise login label information;
Whether do not comprise login label information in definite described webpage to be detected time, finishing described webpage to be detected is the detection of described fishing webpage.
8. method according to claim 1 or 5, is characterized in that, also comprises: set up described presetting database, describedly set up described presetting database, comprising:
Obtain described target web, and judge whether the visit capacity of described target web exceedes default visit capacity, and/or whether the counterfeit number of times of described target web exceedes default counterfeit number of times;
Exceed default visit capacity in the visit capacity that judges described target web, and/or, when the counterfeit number of times of described target web exceedes default counterfeit number of times, extract the web page template feature of described target web, and the second summary info of web page template feature that obtains described target web is to set up described presetting database.
9. method according to claim 1 or 5, it is characterized in that, described web page template feature comprises web page title, the descriptor of webpage, the copyright information of webpage, the content information of the <h1><h2GreatT.Gr eaT.GT<h3><h4Gre atT.GreaT.GT label of webpage, the content information of the <p> label of webpage, the style sheet information of webpage, the form information of webpage, the navigation information of webpage, the label frame information of webpage, at least one in the display icon information of webpage.
10. a pick-up unit for fishing webpage, is characterized in that, comprising:
Acquisition module, for extracting the web page template feature of webpage to be detected, and obtains the first summary info of described web page template feature;
The first determination module, for determining whether described the first summary info belongs to the second summary info of presetting database, and described the second summary info is the summary info obtaining according to the web page template feature of target web;
The second determination module, in the time that described the first determination module determines that described the first summary info belongs to the second summary info of presetting database, further determines that whether the domain name of described webpage to be detected and the domain name of described target web be consistent; And
The 3rd determination module, determines that for described the second determination module the domain name of described webpage to be detected and the domain name of described target web when inconsistent, determine that described webpage to be detected is the fishing webpage of counterfeit described target web.
11. devices according to claim 10, is characterized in that, described webpage to be detected is user's login page.
12. devices according to claim 10, is characterized in that, the information that described the first summary info and described the second summary info are html files.
13. devices according to claim 10, is characterized in that, described the first determination module specifically for:
The database default according to this locality, determines whether described the first summary info belongs to the second summary info; And/or,
Described the first summary info is sent to high in the clouds, so that described high in the clouds determines according to high in the clouds database whether described the first summary info belongs to the second summary info in presetting database.
14. devices according to claim 10, is characterized in that, also comprise:
Sending module, for after described the 3rd determination module determines that described webpage to be detected is the fishing webpage of counterfeit described target web, sends warning message and described target web is provided to user.
15. according to the device described in claim 10 or 14, it is characterized in that, also comprises:
The 4th determination module, before extracting the web page template feature of described webpage to be detected at described acquisition module, determines that the network address of described webpage to be detected is whether in white list list of websites; And
Exit module, for the network address of determining described webpage to be detected at described the 4th determination module, whether during in white list list of websites, finishing described webpage to be detected is the detection of described fishing webpage.
16. according to the device described in claim 10 or 14, it is characterized in that, also comprises:
The 5th determination module, before extracting the web page template feature of described webpage to be detected at described acquisition module, determines in described webpage to be detected whether comprise login label information; Wherein,
The described module that exits is also for determining that at described the 5th determination module described webpage to be detected does not comprise login when label information, and whether finish described webpage to be detected is the detection of described fishing webpage.
17. according to the device described in claim 10 or 14, it is characterized in that, also comprises: set up module, for detection of described presetting database, the described module of setting up comprises:
Acquiring unit, for obtaining described target web;
Whether judging unit, exceed default visit capacity for the visit capacity that judges described target web, and/or whether the counterfeit number of times of described target web exceedes default counterfeit number of times;
Set up unit, for judging that at described judging unit the visit capacity of described target web exceedes default visit capacity, and/or, when the counterfeit number of times of described target web exceedes default counterfeit number of times, extract the web page template feature of described target web, and the second summary info of web page template feature that obtains described target web is to set up described presetting database.
18. according to the device described in claim 10 or 14, it is characterized in that, described web page template feature comprises web page title, the descriptor of webpage, the copyright information of webpage, the content information of the <h1><h2GreatT.Gr eaT.GT<h3><h4Gre atT.GreaT.GT label of webpage, the content information of the <p> label of webpage, the style sheet information of webpage, the form information of webpage, the navigation information of webpage, the label frame information of webpage, at least one in the display icon information of webpage.
CN201410265323.3A 2014-06-13 2014-06-13 Detection method and device for phishing webpage Pending CN104050257A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410265323.3A CN104050257A (en) 2014-06-13 2014-06-13 Detection method and device for phishing webpage
PCT/CN2014/094147 WO2015188604A1 (en) 2014-06-13 2014-12-17 Phishing webpage detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410265323.3A CN104050257A (en) 2014-06-13 2014-06-13 Detection method and device for phishing webpage

Publications (1)

Publication Number Publication Date
CN104050257A true CN104050257A (en) 2014-09-17

Family

ID=51503089

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410265323.3A Pending CN104050257A (en) 2014-06-13 2014-06-13 Detection method and device for phishing webpage

Country Status (2)

Country Link
CN (1) CN104050257A (en)
WO (1) WO2015188604A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015188604A1 (en) * 2014-06-13 2015-12-17 百度国际科技(深圳)有限公司 Phishing webpage detection method and device
CN105187415A (en) * 2015-08-24 2015-12-23 成都秋雷科技有限责任公司 Phishing webpage detection method
CN107370719A (en) * 2016-05-13 2017-11-21 阿里巴巴集团控股有限公司 Abnormal login recognition methods, apparatus and system
CN111224923A (en) * 2018-11-26 2020-06-02 阿里巴巴集团控股有限公司 Detection method, device and system for counterfeit websites
CN114285627A (en) * 2021-12-21 2022-04-05 安天科技集团股份有限公司 Flow detection method and device, electronic equipment and computer readable storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110740117B (en) * 2018-10-31 2022-03-04 安天科技集团股份有限公司 Counterfeit domain name detection method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070199054A1 (en) * 2006-02-23 2007-08-23 Microsoft Corporation Client side attack resistant phishing detection
CN102082792A (en) * 2010-12-31 2011-06-01 成都市华为赛门铁克科技有限公司 Phishing webpage detection method and device
CN103268442A (en) * 2013-05-14 2013-08-28 北京奇虎科技有限公司 Method and device for achieving safe access of video websites
CN103425736A (en) * 2013-06-24 2013-12-04 腾讯科技(深圳)有限公司 Web information recognition method, device and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103179095B (en) * 2011-12-22 2016-03-30 阿里巴巴集团控股有限公司 A kind of method and client terminal device detecting fishing website
CN102737183B (en) * 2012-06-12 2014-08-13 腾讯科技(深圳)有限公司 Method and device for webpage safety access
CN103685307B (en) * 2013-12-25 2017-08-11 北京奇虎科技有限公司 The method and system of feature based storehouse detection fishing fraud webpage, client, server
CN103685308B (en) * 2013-12-25 2017-04-26 北京奇虎科技有限公司 Detection method and system of phishing web pages, client and server
CN104050257A (en) * 2014-06-13 2014-09-17 百度国际科技(深圳)有限公司 Detection method and device for phishing webpage

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070199054A1 (en) * 2006-02-23 2007-08-23 Microsoft Corporation Client side attack resistant phishing detection
CN102082792A (en) * 2010-12-31 2011-06-01 成都市华为赛门铁克科技有限公司 Phishing webpage detection method and device
CN103268442A (en) * 2013-05-14 2013-08-28 北京奇虎科技有限公司 Method and device for achieving safe access of video websites
CN103425736A (en) * 2013-06-24 2013-12-04 腾讯科技(深圳)有限公司 Web information recognition method, device and system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015188604A1 (en) * 2014-06-13 2015-12-17 百度国际科技(深圳)有限公司 Phishing webpage detection method and device
CN105187415A (en) * 2015-08-24 2015-12-23 成都秋雷科技有限责任公司 Phishing webpage detection method
CN107370719A (en) * 2016-05-13 2017-11-21 阿里巴巴集团控股有限公司 Abnormal login recognition methods, apparatus and system
CN111224923A (en) * 2018-11-26 2020-06-02 阿里巴巴集团控股有限公司 Detection method, device and system for counterfeit websites
CN111224923B (en) * 2018-11-26 2022-07-22 阿里巴巴集团控股有限公司 Detection method, device and system for counterfeit websites
CN114285627A (en) * 2021-12-21 2022-04-05 安天科技集团股份有限公司 Flow detection method and device, electronic equipment and computer readable storage medium
CN114285627B (en) * 2021-12-21 2023-12-22 安天科技集团股份有限公司 Flow detection method and device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
WO2015188604A1 (en) 2015-12-17

Similar Documents

Publication Publication Date Title
CN104050257A (en) Detection method and device for phishing webpage
CN102663319B (en) Prompting method and device for download link security
CN112003838B (en) Network threat detection method, device, electronic device and storage medium
EP3136277A1 (en) Illicit activity sensing network system and illicit activity sensing method
US20140245438A1 (en) Download resource providing method and device
CN102724187A (en) Method and device for safety detection of universal resource locators
CN102739653B (en) Detection method and device aiming at webpage address
CN104766014A (en) Method and system used for detecting malicious website
CN102467633A (en) Method and system for safely browsing webpage
CN107992738B (en) Account login abnormity detection method and device and electronic equipment
CN103268328B (en) The verification method of Quick Response Code and search engine server
CN111064745A (en) Self-adaptive back-climbing method and system based on abnormal behavior detection
CN103618696B (en) Method and server for processing cookie information
CN106548075B (en) Vulnerability detection method and device
CN102882886A (en) Network terminal and method for presenting visited website associated information
CN104158828B (en) The method and system of suspicious fishing webpage are identified based on cloud content rule base
CN102917049A (en) Method for showing information of visited website, browser and system
CN104079531A (en) Hotlinking detection method, system and device
KR20170101624A (en) System for monitoring digital contents and method for processing thereof
CN107103243B (en) Vulnerability detection method and device
CN102891861A (en) Client-based phishing website detecting method and device
CN112532624A (en) Black chain detection method and device, electronic equipment and readable storage medium
CN104468459A (en) Vulnerability detection method and apparatus
CN104717226A (en) Method and device for detecting website address
CN103390129B (en) Detect the method and apparatus of security of uniform resource locator

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20140917

RJ01 Rejection of invention patent application after publication