CN107659564B - Method for actively detecting phishing website and electronic equipment - Google Patents

Method for actively detecting phishing website and electronic equipment Download PDF

Info

Publication number
CN107659564B
CN107659564B CN201710834120.5A CN201710834120A CN107659564B CN 107659564 B CN107659564 B CN 107659564B CN 201710834120 A CN201710834120 A CN 201710834120A CN 107659564 B CN107659564 B CN 107659564B
Authority
CN
China
Prior art keywords
score
log data
information
refer
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710834120.5A
Other languages
Chinese (zh)
Other versions
CN107659564A (en
Inventor
吴灵敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Pinwei Software Co Ltd
Original Assignee
Guangzhou Weipinhui Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Weipinhui Research Institute Co ltd filed Critical Guangzhou Weipinhui Research Institute Co ltd
Priority to CN201710834120.5A priority Critical patent/CN107659564B/en
Publication of CN107659564A publication Critical patent/CN107659564A/en
Application granted granted Critical
Publication of CN107659564B publication Critical patent/CN107659564B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for actively detecting a phishing website and electronic equipment, and belongs to the field of information security. The method comprises the following steps: acquiring all log data from a Hadoop and message system regularly, deleting abnormal log data in the log data, and keeping normal log data; acquiring a refer and a domain name of normal log data, deleting the log data in a refer white list, and deleting the log data in a domain name white list; judging whether the refer of each residual log data meets a first preset condition or not, and judging whether the refer of the log data meeting the first preset condition meets a second preset condition or not; and manually detecting the refer of the log data meeting the second preset condition. Therefore, the log data acquired from the Hadoop and message system are uniformly detected, the active detection of the phishing website is realized, and the detection efficiency and timeliness of the phishing website are improved.

Description

Method for actively detecting phishing website and electronic equipment
Technical Field
The invention relates to the field of information security, in particular to a method for actively detecting a phishing website and electronic equipment.
Background
With the popularization of the internet, more and more users begin to exchange and trade businesses through the internet, and internet businesses such as e-commerce and e-banking are also developed. When a user accesses a website, information such as an account number and a password needs to be input, and if someone steals the account number and the password of the user and pretends that the user enters the website, benefits such as property of the user may be damaged. At present, some lawbreakers display pages similar to real websites to users in a phishing website mode, and are tricked into inputting accounts and passwords by the users, so that the accounts and the passwords of the users are stolen, and a method capable of actively discovering the phishing websites is needed.
The current phishing website detection method mainly comprises a blacklist filtering technology or a domain name correlation detection technology. The blacklist filtering technology mainly depends on continuously updating a blacklist comprising all known phishing websites and/or user reporting websites, and judging whether a suspicious website is a phishing website by searching whether information such as a domain name of the suspicious website is included in the blacklist or not when the suspicious website is detected; the domain name is used as an internet resource identifier and provides addressing and positioning of resources for application services such as Web access, e-mail, FTP and the like. Because domain name registration is relatively free, a user can quickly register and use a domain name.
The method for detecting the suspicious websites is passive detection, namely the detection method can play a role after a user is invaded by the phishing websites generally, has certain hysteresis and poor timeliness, so that how to effectively detect the phishing websites which are not recorded in a blacklist, namely, the active detection of the phishing websites can be realized, and the detection can be carried out just when the phishing websites appear, so that the loss of the user is avoided or reduced.
Disclosure of Invention
In order to actively detect a phishing website and improve the detection efficiency and timeliness of the phishing website, the embodiment of the invention provides a method and electronic equipment for actively detecting the phishing website. The technical scheme is as follows:
in a first aspect, a method for actively detecting phishing websites is provided, the method comprising:
acquiring all log data from a Hadoop (distributed system infrastructure) and a message system regularly, deleting abnormal log data in the log data, and keeping normal log data, wherein the abnormal log data is log data indicating that a log source is a non-phishing website;
obtaining the refer and the domain name of the normal log data, deleting the log data in a refer white list in all refer, and deleting the log data in a domain name white list in all domain names;
obtaining a score of at least one first preset feature information of the remaining refer of the log data, judging whether the refer of each remaining log data meets a first preset condition according to the score of the at least one first preset feature information, obtaining a score sum of at least one second preset feature information of the refer of the log data meeting the first preset condition, and judging whether the refer of the log data meeting the first preset condition meets a second preset condition according to the score sum;
and manually detecting the refer of the log data meeting the second preset condition.
With reference to the first aspect, in a first possible implementation manner, the first preset physical sign information includes IP information, domain name listing information, and domain name security information; the second preset characteristic information comprises whois information, domain name detection information, threat intelligence data information and page detection information.
With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner, the step of obtaining a score of at least one first preset feature information of a refer of the remaining log data is performed by at least one of the following manners:
the obtaining of the score of the IP information comprises:
according to the IP information, calculating the score of the IP information through the intranet and the third-party website;
the IP information comprises an IP risk value, an IP service provider, IP historical analysis data, an IP home location, IP website data and IP website downloading coefficients;
the obtaining of the score of the domain name listing information includes:
obtaining the score of the domain name recording information according to the score of the authoritative website;
obtaining the score of the domain name security information includes:
and acquiring the security score of the domain name through the security interface by calling different security interfaces.
With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner, the first preset condition includes that a score of each first preset sign information is 0;
the first preset condition is met, indicating that the refer of the log data has a possibility of being a phishing website.
With reference to any one of the first to third possible implementation manners of the first aspect, in a fourth possible implementation manner, the obtaining a score sum of at least one second preset feature information of a refer of the log data that meets the first preset condition includes:
obtaining a score of at least one second preset characteristic information of the refer of the log data meeting the first preset condition;
calculating the sum of the scores of at least one second preset feature information of the refer of the log data meeting the first preset condition according to the obtained score of each second preset feature information;
wherein the step of obtaining a score of at least one second preset feature information of the refer of the log data meeting the first preset condition is performed by at least one of:
obtaining the score of the whois information comprises:
calculating the score of the whois information according to the whois information, wherein the whois information comprises the registration time, the registration mailbox, the registrant, the update time, the domain name provider and the DNS server of the refer;
obtaining the score of the domain name detection comprises:
the domain name of the refer is split, and the comprehensive score of the refer is obtained according to the split length, port, sub-domain name and suffix;
obtaining the score of the threat intelligence data information comprises:
obtaining, by a threat intelligence system, a threat score of the UR L of the detected refer;
obtaining the score of the page detection includes:
and acquiring page scores according to the acquired page structure by analyzing the page content.
With reference to the fourth possible implementation manner of the first aspect, in a fifth possible implementation manner, the method further includes:
distributing the score weight of the at least one second preset feature information according to the website credit, the security and the value;
and calculating the sum of the scores of the at least one second preset characteristic information according to the weight and the score of each second preset characteristic information.
With reference to the fifth possible implementation manner of the first aspect, in a sixth possible implementation manner, the determining, according to the score and the referrer of the log data meeting the first preset condition, whether the referrer of the log data meeting the first preset condition meets a second preset condition includes:
if the score sum is equal to a preset threshold value, indicating that the refer of the log data is a phishing website with high possibility, and manually detecting the refer; and if the score sum is larger than a preset threshold value, indicating that the refer is not a phishing website.
In a second aspect, an electronic device is provided, the device comprising:
all log data acquisition modules are used for acquiring all log data from a Hadoop (distributed system infrastructure) and a message system periodically;
the first deleting module is used for deleting abnormal log data in the log, and keeping normal log data, wherein the abnormal log data is log data indicating that the log source is a non-phishing website;
the refer and domain name acquisition module is used for acquiring refer and domain names of the normal log data;
the second deleting module is used for deleting the log data in the refer white list in all the refer and deleting the log data in the domain name white list in all the domain names;
the score acquisition module of the first preset characteristic information acquires a score of at least one first preset characteristic information of the refer of the residual log data;
the first judging module is used for judging whether the refer of each residual log data meets a first preset condition or not according to the score of the at least one piece of first preset characteristic information;
the score sum acquisition module of the second preset feature information is used for acquiring the score sum of at least one second preset feature information of the refer of the log data meeting the first preset condition;
the second judgment module is used for judging whether the refer of the log data meeting the first preset condition meets a second preset condition or not according to the score;
and the manual detection module is used for manually detecting the refer of the log data meeting the second preset condition.
With reference to the second aspect, in a first possible implementation manner, the first preset physical sign information includes IP information, domain name listing information, and domain name security information; the second preset characteristic information comprises whois information, domain name detection information, threat intelligence data information and page detection information.
With reference to the first possible implementation manner of the second aspect, in a second possible implementation manner, the score obtaining module of the first preset feature information is specifically configured to:
obtaining the score of the IP information specifically comprises the following steps:
according to the IP information, calculating the score of the IP information through the intranet and the third-party website;
the IP information comprises an IP risk value, an IP service provider, IP historical analysis data, an IP home location, IP website data and IP website downloading coefficients;
obtaining the score of the domain name recording information, specifically comprising:
obtaining the score of the domain name recording information according to the score of the authoritative website;
obtaining the score of the domain name safety information specifically comprises the following steps:
and acquiring the security score of the domain name through the security interface by calling different security interfaces.
With reference to the second possible implementation manner of the second aspect, in a third possible implementation manner, the first preset condition includes that a score of each first preset sign information is 0;
the first preset condition is met, indicating that the refer of the log data has a possibility of being a phishing website.
With reference to any one of the first to third possible implementation manners of the second aspect, in a fourth possible implementation manner, the score and acquisition module for the second preset feature information includes a score acquisition submodule and a score and calculation submodule for each second preset feature information:
the score acquisition submodule of each second preset feature information is used for acquiring a score of at least one second preset feature information of the refer of the log data meeting the first preset condition;
the score sum calculation sub-module is specifically configured to calculate, according to the obtained score of each second preset feature information, a score sum of at least one second preset feature information of the refer of the log data meeting the first preset condition;
the score obtaining submodule of each second preset feature information is specifically configured to:
obtaining the score of the whois information, including:
calculating the score of the whois information according to the whois information, wherein the whois information comprises the registration time, the registration mailbox, the registrant, the update time, the domain name provider and the DNS server of the refer;
obtaining the score of the domain name detection, including:
the domain name of the refer is split, and the comprehensive score of the refer is obtained according to the split length, port, sub-domain name and suffix;
obtaining a score of the threat intelligence data information, comprising:
obtaining, by a threat intelligence system, a threat score of the UR L of the detected refer;
obtaining the score of page detection, including:
and acquiring page scores according to the acquired page structure by analyzing the page content.
With reference to the fourth possible implementation manner of the second aspect, in a fifth possible implementation manner, the score and acquisition module for the second preset feature information further includes a weight acquisition sub-module:
the weight obtaining submodule is used for distributing the score weight of the at least one second preset characteristic information according to the website credit, the security and the value;
the score sum calculation sub-module is further configured to calculate a score sum of the at least one second preset feature information according to the weight and the score of each second preset feature information.
With reference to the fifth possible implementation manner of the second aspect, in a sixth possible implementation manner, the second determining module is specifically configured to:
if the score sum is equal to a preset threshold value, indicating that the refer of the log data is a phishing website with high possibility, and manually detecting the refer; and if the score sum is larger than a preset threshold value, indicating that the refer is not a phishing website.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
all log data are acquired from the big data of the Hadoop and the message system regularly, so that whether all log data are phishing websites or not is checked regularly, active detection of the phishing websites is improved, the log data of the whole network are detected, the detection is more comprehensive and wider, the detection timeliness is improved, the phishing websites can be detected in a wider range in time, the detection efficiency, strength and timeliness are improved, and the network security is improved; in addition, abnormal log data in all log data are deleted, log data in a refer white list in the remaining normal log data are deleted, and log data in a domain name white list are deleted, so that the log data which are obviously not phishing websites are screened and deleted, the processing amount of the log data to be detected is reduced, and the detection efficiency is improved; for the rest log data, judging whether the referrer of each rest log data meets a first preset condition according to the score of at least one first preset characteristic information, and judging whether the referrer of the log data meeting the first preset condition meets a second preset condition according to the score of at least one second preset characteristic information and the referrer of the log data meeting the first preset condition, so that the rest log data are detected through two judging steps If the first preset condition is not met, the detection accuracy of the phishing website is improved, so that the detection efficiency is further improved; meanwhile, the refer of the log data meeting the first preset condition is further detected through a second preset condition, so that the detection accuracy of the phishing website is improved, the two-step judgment is based on different preset characteristic information, the judgment is firstly carried out through obvious preset characteristic information, and the judgment of the second preset condition is carried out on the refer of the log data meeting the first preset condition, so that the refer of the log data can be comprehensively detected, the detection accuracy and efficiency are greatly improved, and the timely detection of the data possibly of the phishing website in the log data is facilitated; and finally, manually detecting the refer of the log data meeting the second preset condition, so that the occurrence of some misjudgment conditions is avoided, the detection efficiency and accuracy are further improved, phishing websites can be found in time, and the network security is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart of a method for actively detecting phishing websites according to an embodiment of the invention;
fig. 2 is a schematic structural diagram of an electronic device for actively detecting a phishing website according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
The embodiment of the invention provides a method for actively detecting a phishing website, which is shown in figure 1 and comprises the following steps:
101. and acquiring all log data from a Hadoop (distributed system infrastructure) and a message system regularly, deleting abnormal log data in the log data, and keeping normal log data.
The abnormal log data is log data indicating that the log source is a non-phishing website.
Specifically, the Hadoop and message system regularly pushes log data to the electronic equipment for processing the log data;
deleting all the log data which are obviously not the log data of the phishing website, wherein the log data which are obviously not the log data of the phishing website can be the log data of the attack, and the like.
All log data are acquired from the big data of the Hadoop and the message system regularly, so that whether all log data are phishing websites or not is checked regularly, active detection of the phishing websites is improved, the log data of the whole network are detected, the detection is more comprehensive and wider, the detection timeliness is improved, the phishing websites can be detected in a wider range in time, the detection efficiency, strength and timeliness are improved, and the network security is improved; in addition, abnormal log data of all log data are deleted firstly, normal log data are reserved, and because the abnormal log data indicate that the log source is log data of a non-phishing website, the log data which obviously do not belong to the phishing website are screened and deleted firstly, so that the log data amount to be detected is reduced, and the detection efficiency is improved.
102. Obtaining the refer and the domain name of the normal log data, deleting the log data in the refer white list in all refer, and deleting the log data in the domain name white list in all domain names.
The refer deleting unit deletes the log data, for example, the refer of the intranet or the device domain name in the refer white list; the domain name which is deleted is the log data, such as the domain name with the highest rank of flow and data, such as hundredths, Google and the like in a domain name white list;
and performing phishing website detection on the rest log data.
By acquiring the refer and the domain name of the normal log data, deleting the log data in the refer white list and deleting the log data in the domain name white list, the log data amount to be detected is further reduced, and the detection efficiency is improved.
103. And acquiring the score of at least one first preset characteristic information of the refer of the rest log data.
The first preset physical sign information comprises IP information, domain name recording information and domain name safety information;
specifically, the step of obtaining the score of at least one first preset feature information of the refer of the remaining log data is performed by at least one of the following manners:
a. the obtaining of the score of the IP information comprises:
according to the IP information, calculating the score of the IP information through the intranet and the third-party website;
the IP information comprises an IP risk value, an IP service provider, IP historical analysis data, an IP attribution, IP website data and IP website downloading coefficients.
b. The obtaining of the score of the domain name listing information includes:
obtaining the score of the domain name recording information according to the score of the authoritative website;
wherein, the authoritative website can be a search engine; the score of the domain name recording information can be obtained according to the scores of a plurality of search engines;
specifically, a normal website generally has certain recording data, for example, a typical product may have tens of millions, and a newly established website may have only a few pages of recording data, wherein, since a new domain name is one of the characteristics of a phishing website, detection of the phishing website by a search engine and recording of the new site are almost zero, search recording of the search engine is used as a first preset feature information for judging the phishing website, and refer of log data is scored through the security and recording rules of the search engine itself to obtain the score; meanwhile, in order to improve the accuracy of the search, the scoring results of a plurality of search engines such as bin, yahoo, ***, etc. may be used at the same time, and if a plurality of search engines are used, the scoring average of each search engine may be used.
It should be noted that refer to log data is domestic, and the score of a domestic search engine is relatively high, refer to log data is foreign, and the score of a foreign search engine is relatively high.
Illustratively, with a hundredth degree as the search engine, the search syntax is: com, returning all the recording information corresponding to the xxx domain name through the search grammar, and giving a hundred-degree score to the xxx domain name according to the recording information, wherein the score is the score of the acquired recording information of the xxx domain name.
c. Obtaining the score of the domain name security information includes:
and acquiring the security score of the domain name through the security interface by calling different security interfaces.
Specifically, the security score of the domain name through the security interface is obtained by calling different security interfaces, such as the security interface of Google and the security interfaces of security alliances at home and abroad;
in order to improve the accuracy of the search, the scoring results of the multiple security interfaces may be obtained, and if the scores are obtained through the multiple security interfaces, the average value of the scores of the security interfaces may be obtained.
Optionally, the method further comprises:
distributing the weight of each safety interface score according to the website credit, safety and value;
and obtaining the score of the domain name safety information according to the scoring weight and the scoring.
Optionally, the method further comprises:
distributing the score weight of at least one first preset feature information according to the credit, the safety and the value of the website;
and calculating the score of the at least one first preset characteristic information according to the weight.
By acquiring at least one preset characteristic information of the website to be detected, the at least one preset characteristic information can be used for detecting whether the website to be detected is possibly a phishing website, so that newly appearing websites in the log data can be preliminarily screened, the score of the log data to be detected is acquired by calculating the score of each first preset characteristic information, and the detection accuracy is improved.
104. Judging whether the refer of each remaining log data meets a first preset condition according to the score of at least one first preset feature information, and if the refer meets the first preset condition, executing step 105-107; and if the first preset condition is not met, deleting the log data which do not meet the first preset condition.
The first preset condition comprises that the score of each first preset sign message is 0; the first preset condition is met, the possibility that the refer of the log data is a phishing website is indicated, and further detection is needed for determination.
Specifically, if the score of each first preset feature information of the refer of a certain acquired log data is 0, it is determined that the refer of the log data meets a first preset condition, and a further determination is required, and step 105-107 is executed; if the score of at least one first preset feature information of the refer of certain acquired log data is not 0, judging that the refer of the log data does not meet a first preset condition, and indicating that the refer of the log data does not have the risk of being a phishing website;
and deleting the log data which do not meet the first preset condition, wherein the log data which do not meet the first preset condition are log data without phishing website risks.
Judging the step for the refer of each remaining log data, screening out the log data meeting a first preset condition, wherein the log data meeting the first preset condition is the log data with the risk of the phishing website, and further judging by executing the step 105-107; and deleting the log data which do not meet the first preset condition, wherein the log data which do not meet the first preset condition are the log data which do not have the risk of being the phishing website.
Meanwhile, it should be noted that, in practical applications, combining the three types of first preset feature information together can further improve the accuracy of detecting the phishing websites, which is a preferable scheme, but it is not excluded that one, two or three of the first preset feature information are used to detect the phishing websites, and all of the first preset feature information and the second preset feature information are within the protection scope of the present invention, and the embodiment of the present invention does not limit the detection.
It should be noted that the first preset condition may also be that the score of each first preset sign information is a preset value, and the preset value may be greater than zero; or the first preset condition may also be a score sum or a mean value of each first preset feature information, and the score sum or the mean value is greater than a preset value; the first preset condition may be other conditions, which is not limited in the embodiment of the present invention.
105. And acquiring the score sum of at least one second preset characteristic information of the refer of the log data meeting the first preset condition.
The second preset characteristic information comprises whois information, domain name detection information, threat information data information and page detection information.
Specifically, the method comprises the following steps:
obtaining a score of at least one second preset characteristic information of the refer of the log data meeting the first preset condition;
and calculating the sum of the scores of at least one second preset characteristic information of the refer of the log data meeting the first preset condition according to the obtained score of each second preset characteristic information.
The method comprises the following steps of obtaining at least one score of second preset characteristic information of the refer of the log data meeting a first preset condition by at least one of the following modes:
obtaining the score of whois information includes:
calculating the score of whois information according to whois information, wherein the whois information comprises the registration time, the registration mailbox, the registrant, the update time, the domain name provider and the DNS server of the referrer;
specifically, the whois information can be calculated through an intranet and a third-party website according to the whois information;
obtaining the score of the domain name detection comprises:
the domain name of the refer is split, and the comprehensive score of the refer is obtained according to the split length, port, sub-domain name and suffix;
specifically, the refer comprehensive scoring is carried out according to the length coefficient, the port coefficient, the domain name coefficient and the suffix coefficient which respectively correspond to the split length, the port, the sub domain name and the suffix;
it should be noted that, in addition to splitting the domain name of the refer to obtain a score, a score of domain name detection may also be obtained in other manners, which is not limited in the embodiment of the present invention;
obtaining the score of the threat intelligence data information includes:
obtaining, by a threat intelligence system, a threat score of the UR L of the detected refer;
specifically, according to threat data sources shared by other security organizations and the like, including malicious ip, malicious url and the like, threat scoring is performed according to whether the url in the log data is the malicious url and/or the malicious ip;
obtaining the score of the page detection includes:
and acquiring page scores according to the acquired page structure by analyzing the page content.
Specifically, the page content can be analyzed through the script, the page characteristics such as the attribute of the special label and the keyword of the page can be obtained, whether the page structure conforms to the conventional website or not can be judged according to the page characteristics, page scores can be obtained, and the page scores can also be obtained through other modes.
The obtaining of the sum of the scores of the at least one second preset feature information of the refer of the log data meeting the first preset condition may further include:
distributing at least one score weight of second preset characteristic information according to the credit, the safety and the value of the website;
and calculating the score sum of at least one second preset characteristic information according to the weight and the score of each second preset characteristic information.
106. Judging whether refer of the log data meeting the first preset condition meets a second preset condition or not according to the score, and executing the step 107 if the refer of the log data meeting the first preset condition meets the second preset condition; and if the second preset condition is not met, deleting the log data which do not meet the second preset condition.
Specifically, if the score sum is equal to a preset threshold value, indicating that the likelihood that the refer of the log data is a phishing website is high, manually detecting the refer, and executing step 107; if the score sum is greater than a preset threshold, it indicates that the refer is not a phishing website.
If the second preset condition is met, indicating that the risk of the phishing website is higher when the refer of the detected log data exists, manually detecting to determine whether the phishing website is the phishing website;
the preset threshold may be zero, may also be a preset value closer to zero, or may be other preset values, which are specifically set according to an actual detection empirical value, and the specific preset threshold is not limited in the embodiment of the present invention.
107. And manually detecting the refer of the log data meeting the second preset condition.
By manually detecting the refer of the log data meeting the second preset condition, the occurrence of some misjudgment conditions is avoided, the detection efficiency and accuracy are further improved, the phishing websites can be found in time, and the network security is improved.
The embodiment of the invention provides a method for actively detecting a phishing website, which is characterized in that all log data are periodically obtained from Hadoop and message system big data, so that whether all log data are the phishing website is periodically checked, the active detection of the phishing website is improved, the log data of the whole network are detected, the detection is more comprehensive, the range is wider, the detection timeliness is improved, the phishing website can be detected in a wider range in time, the detection efficiency, strength and timeliness are improved, and the network security is improved; in addition, abnormal log data in all log data are deleted, log data in a refer white list in the remaining normal log data are deleted, and log data in a domain name white list are deleted, so that the log data which are obviously not phishing websites are screened and deleted, the processing amount of the log data to be detected is reduced, and the detection efficiency is improved; for the rest log data, judging whether the referrer of each rest log data meets a first preset condition according to the score of at least one first preset characteristic information, and judging whether the referrer of the log data meeting the first preset condition meets a second preset condition according to the score of at least one second preset characteristic information and the referrer of the log data meeting the first preset condition, so that the rest log data are detected through two judging steps If the first preset condition is not met, the detection accuracy of the phishing website is improved, so that the detection efficiency is further improved; meanwhile, the refer of the log data meeting the first preset condition is further detected through a second preset condition, so that the detection accuracy of the phishing website is improved, the two-step judgment is based on different preset characteristic information, the judgment is firstly carried out through obvious preset characteristic information, and the judgment of the second preset condition is carried out on the refer of the log data meeting the first preset condition, so that the refer of the log data can be comprehensively detected, the detection accuracy and efficiency are greatly improved, and the timely detection of the data possibly of the phishing website in the log data is facilitated; and finally, manually detecting the refer of the log data meeting the second preset condition, so that the occurrence of some misjudgment conditions is avoided, the detection efficiency and accuracy are further improved, phishing websites can be found in time, and the network security is improved.
Example two
An embodiment of the present invention provides an electronic device for actively detecting a phishing website, and as shown in fig. 2, the electronic device 2 includes:
all log data acquisition module 21, which is used to periodically acquire all log data from Hadoop (distributed system infrastructure) and message system;
the first deleting module 22 is configured to delete abnormal log data therein, and retain normal log data, where the abnormal log data is log data indicating that a log source is a non-phishing website;
a refer and domain name obtaining module 23, configured to obtain refer and domain name of normal log data;
the second deleting module 24 is configured to delete log data in a refer white list in all refer, and delete log data in a domain white list in all domain names;
the score obtaining module 25 for the first preset feature information obtains a score of at least one first preset feature information of the refer of the remaining log data;
a first judging module 26, configured to judge whether refer of each remaining log data meets a first preset condition according to a score of at least one first preset feature information;
a score sum obtaining module 27 of the second preset feature information, configured to obtain a score sum of at least one second preset feature information of the refer of the log data that meets the first preset condition;
a second judging module 28, configured to judge whether the refer of the log data meeting the first preset condition meets a second preset condition according to the score sum;
and the manual detection module 29 is configured to manually detect the refer of the log data meeting the second preset condition.
Optionally, the first preset physical sign information includes IP information, domain name listing information, and domain name security information; the second preset characteristic information comprises whois information, domain name detection information, threat information data information and page detection information.
Optionally, the score obtaining module 25 of the first preset feature information is specifically configured to:
obtaining the score of the IP information specifically comprises the following steps:
according to the IP information, calculating the score of the IP information through the intranet and the third-party website;
the IP information comprises an IP risk value, an IP service provider, IP historical analysis data, an IP attribution, IP website data and IP website downloading coefficients;
obtaining the score of the domain name recording information, specifically comprising:
obtaining the score of the domain name recording information according to the score of the authoritative website;
obtaining the score of the domain name safety information specifically comprises the following steps:
and acquiring the security score of the domain name through the security interface by calling different security interfaces.
Optionally, the first preset condition includes that the score of each first preset sign information is 0;
a first preset condition is satisfied indicating that the refer of the log data has a possibility of being a phishing website.
Optionally, the score and obtaining module 27 of the second preset feature information includes a score obtaining sub-module 271 and a score and calculating sub-module 272 of each second preset feature information:
the score obtaining sub-module 271 for each second preset feature information is configured to obtain a score of at least one second preset feature information of the refer of the log data meeting the first preset condition;
the score sum calculation sub-module 272 is specifically configured to calculate, according to the obtained score of each piece of second preset feature information, a score sum of at least one piece of second preset feature information of the refer of the log data that meets the first preset condition;
the score obtaining sub-module 271 of each second preset feature information is specifically configured to:
obtaining the score of whois information, comprising:
calculating the score of whois information according to whois information, wherein the whois information comprises the registration time, the registration mailbox, the registrant, the update time, the domain name provider and the DNS server of the referrer;
obtaining the score of domain name detection, including:
the domain name of the refer is split, and the comprehensive score of the refer is obtained according to the split length, port, sub-domain name and suffix;
obtaining a score of threat intelligence data information, comprising:
obtaining, by a threat intelligence system, a threat score of the UR L of the detected refer;
obtaining the score of page detection, including:
and acquiring page scores according to the acquired page structure by analyzing the page content.
Optionally, the score and obtaining module 27 of the second preset feature information further includes a weight obtaining sub-module 273:
the weight obtaining sub-module 273 is configured to assign a score weight of at least one second preset feature information according to the website reputation, security, and value;
the score sum calculation sub-module 272 is further configured to calculate a score sum of at least one second preset feature information according to the weight and the score of each second preset feature information.
Optionally, the second determining module 28 is specifically configured to:
if the score sum is equal to a preset threshold value, indicating that the probability that the refer of the log data is a phishing website is high, and manually detecting the refer; if the score sum is greater than a preset threshold, it indicates that the refer is not a phishing website.
The embodiment of the invention provides electronic equipment for actively detecting a phishing website, which periodically acquires all log data from Hadoop and message system big data, so that whether all log data are the phishing website is periodically checked, the active detection of the phishing website is improved, the log data of the whole network are detected, the detection is more comprehensive, the range is wider, the detection timeliness is improved, the phishing website can be detected in a wider range in time, the detection efficiency, strength and timeliness are improved, and the network security is improved; in addition, abnormal log data in all log data are deleted, log data in a refer white list in the remaining normal log data are deleted, and log data in a domain name white list are deleted, so that the log data which are obviously not phishing websites are screened and deleted, the processing amount of the log data to be detected is reduced, and the detection efficiency is improved; for the rest log data, judging whether the referrer of each rest log data meets a first preset condition according to the score of at least one first preset characteristic information, and judging whether the referrer of the log data meeting the first preset condition meets a second preset condition according to the score of at least one second preset characteristic information and the referrer of the log data meeting the first preset condition, so that the rest log data are detected through two judging steps If the first preset condition is not met, the detection accuracy of the phishing website is improved, so that the detection efficiency is further improved; meanwhile, the refer of the log data meeting the first preset condition is further detected through a second preset condition, so that the detection accuracy of the phishing website is improved, the two-step judgment is based on different preset characteristic information, the judgment is firstly carried out through obvious preset characteristic information, and the judgment of the second preset condition is carried out on the refer of the log data meeting the first preset condition, so that the refer of the log data can be comprehensively detected, the detection accuracy and efficiency are greatly improved, and the timely detection of the data possibly of the phishing website in the log data is facilitated; and finally, manually detecting the refer of the log data meeting the second preset condition, so that the occurrence of some misjudgment conditions is avoided, the detection efficiency and accuracy are further improved, phishing websites can be found in time, and the network security is improved.
All the above-mentioned optional technical solutions can be combined arbitrarily to form the optional embodiments of the present invention, and are not described herein again.
It should be noted that: in the method for actively detecting a phishing website, the electronic device provided by the embodiment is only exemplified by the division of the functional modules, and in practical application, the function distribution can be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the electronic device provided by the above embodiment and the method embodiment for actively detecting a phishing website belong to the same concept, and specific implementation processes thereof are described in the method embodiment and are not described herein again.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (12)

1. A method for actively detecting phishing websites, the method comprising:
acquiring all log data from a Hadoop and message system regularly, deleting abnormal log data in the log data, and keeping normal log data, wherein the abnormal log data is log data indicating that the log source is a non-phishing website;
obtaining the refer and the domain name of the normal log data, deleting the log data in a refer white list in all refer, and deleting the log data in a domain name white list in all domain names;
obtaining a score of at least one first preset feature information of a referrer of the remaining log data, wherein the first preset feature information comprises IP information, domain name acquisition information and domain name safety information, judging whether the referrer of each remaining log data meets a first preset condition according to the score of the at least one first preset feature information, the first preset condition comprises that the score of each first preset feature information is 0 and meets the first preset condition, indicating that the referrer of the log data has the possibility of being a phishing website, obtaining the score sum of at least one second preset feature information of the referrer of the log data meeting the first preset condition, wherein the second preset feature information comprises whois information, domain name detection information, threat intelligence data information and page detection information, and judging whether the referrer of the log data meeting the first preset condition meets the second preset condition according to the score sum, the second preset condition comprises that the score sum is equal to a preset threshold value, the second preset condition is met, and the refer of the log data is indicated to be a phishing website with high possibility;
and manually detecting the refer of the log data meeting the second preset condition.
2. The method according to claim 1, wherein the step of obtaining the score of at least one first preset characteristic information of the refer of the remaining log data is performed by at least one of:
the obtaining of the score of the IP information comprises:
according to the IP information, calculating the score of the IP information through the intranet and the third-party website;
the IP information comprises an IP risk value, an IP service provider, IP historical analysis data, an IP home location, IP website data and IP website downloading coefficients;
the obtaining of the score of the domain name listing information includes:
obtaining the score of the domain name recording information according to the score of the authoritative website;
obtaining the score of the domain name security information includes:
and acquiring the security score of the domain name through the security interface by calling different security interfaces.
3. The method according to any one of claims 1 to 2, wherein the obtaining of the score sum of at least one second preset characteristic information of the refer of the log data meeting the first preset condition comprises:
obtaining a score of at least one second preset characteristic information of the refer of the log data meeting the first preset condition;
calculating the sum of the scores of at least one second preset feature information of the refer of the log data meeting the first preset condition according to the obtained score of each second preset feature information;
wherein the step of obtaining a score of at least one second preset feature information of the refer of the log data meeting the first preset condition is performed by at least one of:
obtaining the score of the whois information comprises:
calculating the score of the whois information according to the whois information, wherein the whois information comprises the registration time, the registration mailbox, the registrant, the update time, the domain name provider and the DNS server of the refer;
obtaining the score of the domain name detection comprises:
the domain name of the refer is split, and the comprehensive score of the refer is obtained according to the split length, port, sub-domain name and suffix;
obtaining the score of the threat intelligence data information comprises:
obtaining, by a threat intelligence system, a threat score of the UR L of the detected refer;
obtaining the score of the page detection includes:
and acquiring page scores according to the acquired page structure by analyzing the page content.
4. The method of claim 3, further comprising:
distributing the score weight of the at least one second preset feature information according to the website credit, the security and the value;
and calculating the sum of the scores of the at least one second preset characteristic information according to the weight and the score of each second preset characteristic information.
5. The method according to claim 4, wherein the determining whether the refer of the log data meeting the first preset condition meets a second preset condition according to the score sum comprises:
if the score sum is equal to a preset threshold value, indicating that the refer of the log data is a phishing website with high possibility, and manually detecting the refer; and if the score sum is larger than a preset threshold value, indicating that the refer is not a phishing website.
6. An electronic device, characterized in that the device comprises:
the all log data acquisition module is used for acquiring all log data from the Hadoop and message system periodically;
the first deleting module is used for deleting abnormal log data in the log, and keeping normal log data, wherein the abnormal log data is log data indicating that the log source is a non-phishing website;
the refer and domain name acquisition module is used for acquiring refer and domain names of the normal log data;
the second deleting module is used for deleting the log data in the refer white list in all the refer and deleting the log data in the domain name white list in all the domain names;
the score acquisition module of the first preset characteristic information acquires a score of at least one first preset characteristic information of the refer of the residual log data;
the first judging module is used for judging whether the refer of each residual log data meets a first preset condition or not according to the score of the at least one piece of first preset characteristic information;
the score sum acquisition module of the second preset feature information is used for acquiring the score sum of at least one second preset feature information of the refer of the log data meeting the first preset condition;
the second judgment module is used for judging whether the refer of the log data meeting the first preset condition meets a second preset condition or not according to the score;
and the manual detection module is used for manually detecting the refer of the log data meeting the second preset condition.
7. The apparatus according to claim 6, wherein the first preset feature information includes IP information, domain name listing information, and domain name security information; the second preset characteristic information comprises whois information, domain name detection information, threat intelligence data information and page detection information.
8. The device according to claim 7, wherein the score obtaining module of the first preset feature information is specifically configured to:
obtaining the score of the IP information specifically comprises the following steps:
according to the IP information, calculating the score of the IP information through the intranet and the third-party website;
the IP information comprises an IP risk value, an IP service provider, IP historical analysis data, an IP home location, IP website data and IP website downloading coefficients;
obtaining the score of the domain name recording information, specifically comprising:
obtaining the score of the domain name recording information according to the score of the authoritative website;
obtaining the score of the domain name safety information specifically comprises the following steps:
and acquiring the security score of the domain name through the security interface by calling different security interfaces.
9. The apparatus according to claim 8, wherein the first preset condition includes that a score of each of the first preset feature information is 0;
the first preset condition is met, indicating that the refer of the log data has a possibility of being a phishing website.
10. The device according to any one of claims 7 to 9, wherein the score and acquisition module for the second preset feature information comprises a score acquisition submodule and a score and calculation submodule for each second preset feature information;
the score acquisition submodule of each second preset feature information is used for acquiring a score of at least one second preset feature information of the refer of the log data meeting the first preset condition;
the score sum calculation sub-module is specifically configured to calculate, according to the obtained score of each second preset feature information, a score sum of at least one second preset feature information of the refer of the log data meeting the first preset condition;
the score obtaining submodule of each second preset feature information is specifically configured to:
obtaining the score of the whois information, including:
calculating the score of the whois information according to the whois information, wherein the whois information comprises the registration time, the registration mailbox, the registrant, the update time, the domain name provider and the DNS server of the refer;
obtaining the score of the domain name detection, including:
the domain name of the refer is split, and the comprehensive score of the refer is obtained according to the split length, port, sub-domain name and suffix;
obtaining a score of the threat intelligence data information, comprising:
obtaining, by a threat intelligence system, a threat score of the UR L of the detected refer;
obtaining the score of page detection, including:
and acquiring page scores according to the acquired page structure by analyzing the page content.
11. The apparatus according to claim 10, wherein the score and acquisition module of the second preset feature information further comprises a weight acquisition sub-module:
the weight obtaining submodule is used for distributing the score weight of the at least one second preset characteristic information according to the website credit, the security and the value;
the score sum calculation sub-module is further configured to calculate a score sum of the at least one second preset feature information according to the weight and the score of each second preset feature information.
12. The apparatus according to claim 11, wherein the second determining module is specifically configured to:
if the score sum is equal to a preset threshold value, indicating that the refer of the log data is a phishing website with high possibility, and manually detecting the refer; and if the score sum is larger than a preset threshold value, indicating that the refer is not a phishing website.
CN201710834120.5A 2017-09-15 2017-09-15 Method for actively detecting phishing website and electronic equipment Active CN107659564B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710834120.5A CN107659564B (en) 2017-09-15 2017-09-15 Method for actively detecting phishing website and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710834120.5A CN107659564B (en) 2017-09-15 2017-09-15 Method for actively detecting phishing website and electronic equipment

Publications (2)

Publication Number Publication Date
CN107659564A CN107659564A (en) 2018-02-02
CN107659564B true CN107659564B (en) 2020-07-31

Family

ID=61130147

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710834120.5A Active CN107659564B (en) 2017-09-15 2017-09-15 Method for actively detecting phishing website and electronic equipment

Country Status (1)

Country Link
CN (1) CN107659564B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102523210A (en) * 2011-12-06 2012-06-27 中国科学院计算机网络信息中心 Phishing website detection method and device
CN102546618A (en) * 2011-12-29 2012-07-04 北京神州绿盟信息安全科技股份有限公司 Method, device, system and website for detecting fishing website
CN102685145A (en) * 2012-05-28 2012-09-19 西安交通大学 Domain name server (DNS) data packet-based bot-net domain name discovery method
CN102833262A (en) * 2012-09-04 2012-12-19 珠海市君天电子科技有限公司 Whois information-based phishing website gathering, identification method and system
CN102957693A (en) * 2012-10-25 2013-03-06 北京奇虎科技有限公司 Method and device for judging phishing websites
CN103067387A (en) * 2012-12-27 2013-04-24 中国建设银行股份有限公司 Monitoring system and monitoring method for anti phishing
CN103428186A (en) * 2012-05-24 2013-12-04 ***通信集团公司 Method and device for detecting phishing website
CN103607385A (en) * 2013-11-14 2014-02-26 北京奇虎科技有限公司 Method and apparatus for security detection based on browser
CN104899508A (en) * 2015-06-17 2015-09-09 中国互联网络信息中心 Multistage phishing website detecting method and system
CN106302440A (en) * 2016-08-11 2017-01-04 国家计算机网络与信息安全管理中心 A kind of method obtaining suspicious fishing website by all kinds of means
CN106888220A (en) * 2017-04-12 2017-06-23 恒安嘉新(北京)科技股份公司 A kind of detection method for phishing site and equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2877956B1 (en) * 2012-07-24 2019-07-17 Webroot Inc. System and method to provide automatic classification of phishing sites

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102523210A (en) * 2011-12-06 2012-06-27 中国科学院计算机网络信息中心 Phishing website detection method and device
CN102546618A (en) * 2011-12-29 2012-07-04 北京神州绿盟信息安全科技股份有限公司 Method, device, system and website for detecting fishing website
CN103428186A (en) * 2012-05-24 2013-12-04 ***通信集团公司 Method and device for detecting phishing website
CN102685145A (en) * 2012-05-28 2012-09-19 西安交通大学 Domain name server (DNS) data packet-based bot-net domain name discovery method
CN102833262A (en) * 2012-09-04 2012-12-19 珠海市君天电子科技有限公司 Whois information-based phishing website gathering, identification method and system
CN102957693A (en) * 2012-10-25 2013-03-06 北京奇虎科技有限公司 Method and device for judging phishing websites
CN103067387A (en) * 2012-12-27 2013-04-24 中国建设银行股份有限公司 Monitoring system and monitoring method for anti phishing
CN103607385A (en) * 2013-11-14 2014-02-26 北京奇虎科技有限公司 Method and apparatus for security detection based on browser
CN104899508A (en) * 2015-06-17 2015-09-09 中国互联网络信息中心 Multistage phishing website detecting method and system
CN106302440A (en) * 2016-08-11 2017-01-04 国家计算机网络与信息安全管理中心 A kind of method obtaining suspicious fishing website by all kinds of means
CN106888220A (en) * 2017-04-12 2017-06-23 恒安嘉新(北京)科技股份公司 A kind of detection method for phishing site and equipment

Also Published As

Publication number Publication date
CN107659564A (en) 2018-02-02

Similar Documents

Publication Publication Date Title
US11343269B2 (en) Techniques for detecting domain threats
Ding et al. A keyword-based combination approach for detecting phishing webpages
US10778702B1 (en) Predictive modeling of domain names using web-linking characteristics
US10178121B2 (en) Domain reputation evaluation process and method
US9027134B2 (en) Social threat scoring
KR101702614B1 (en) Online fraud detection dynamic scoring aggregation systems and methods
US8996669B2 (en) Internet improvement platform with learning module
JP5941163B2 (en) Spam detection system and method using frequency spectrum of character string
CN109905288B (en) Application service classification method and device
Aldwairi et al. Malurls: A lightweight malicious website classification based on url features
CN102957664A (en) Method and device for identifying phishing websites
CN102833262A (en) Whois information-based phishing website gathering, identification method and system
CN113810395B (en) Threat information detection method and device and electronic equipment
Zhao et al. Malicious domain names detection algorithm based on lexical analysis and feature quantification
Hu et al. Multi-country study of third party trackers from real browser histories
US11888873B2 (en) Attack surface identification
Fang et al. A proactive discovery and filtering solution on phishing websites
Khade et al. Detection of phishing websites using data mining techniques
Mishsky et al. A topology based flow model for computing domain reputation
TW201701182A (en) Method of detecting domain name of relay station of suspicious botnet for determining whether a domain name is a relay station of a suspicious botnet according to the number of search results returned by a search engine
US20090228438A1 (en) Method and Apparatus for Identifying if Two Websites are Co-Owned
CN107659564B (en) Method for actively detecting phishing website and electronic equipment
Jo et al. You're not who you claim to be: Website identity check for phishing detection
Korczynski et al. Statistical Analysis of DNS Abuse in gTLDs Final Report
CN114765599A (en) Sub-domain name acquisition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20221228

Address after: 510123 building 6, No. 20, Huahai street, Fangcun, Liwan District, Guangzhou, Guangdong (office only)

Patentee after: GUANGZHOU PINWEI SOFTWARE Co.,Ltd.

Address before: 510000 room 01, No.314, Fangcun Avenue middle, Liwan District, Guangzhou City, Guangdong Province

Patentee before: GUANGZHOU WEIPINHUI RESEARCH INSTITUTE CO.,LTD.

TR01 Transfer of patent right