WO2017167208A1 - 识别恶意网站的方法、装置及计算机存储介质 - Google Patents

识别恶意网站的方法、装置及计算机存储介质 Download PDF

Info

Publication number
WO2017167208A1
WO2017167208A1 PCT/CN2017/078650 CN2017078650W WO2017167208A1 WO 2017167208 A1 WO2017167208 A1 WO 2017167208A1 CN 2017078650 W CN2017078650 W CN 2017078650W WO 2017167208 A1 WO2017167208 A1 WO 2017167208A1
Authority
WO
WIPO (PCT)
Prior art keywords
website
information
hyperlink
malicious
link address
Prior art date
Application number
PCT/CN2017/078650
Other languages
English (en)
French (fr)
Inventor
刘健
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to KR1020187014910A priority Critical patent/KR102090982B1/ko
Publication of WO2017167208A1 publication Critical patent/WO2017167208A1/zh
Priority to US15/967,232 priority patent/US10834105B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/554Detecting local intrusion or implementing counter-measures involving event detection and direct action
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1466Active attacks involving interception, injection, modification, spoofing of data unit addresses, e.g. hijacking, packet injection or TCP sequence number attacks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/034Test or assess a computer or a system

Definitions

  • the present application relates to the field of the Internet, and in particular, to a method, an apparatus, and a computer storage medium for identifying a malicious website.
  • the malicious website is identified by the cloud security server according to the website address information of the website, so when the malicious party finds that a certain website information is intercepted, the malicious element can perform malicious behavior through the new website information.
  • the record of the new URL information is not included in the information database, so the cloud security server does not recognize the website as a malicious website.
  • the cloud security server needs to download the page content of the website, and analyze the page content of the website. If the analysis result indicates that the website is a malicious website, the record of the new website information is stored in the information database, so that The next time you can identify the website corresponding to the new URL information as a malicious website.
  • the cloud security server needs to perform the above-mentioned download page content operation and analyze the page content operation, which wastes a lot of bandwidth. And processing resources, and taking a long time, is not conducive to quickly and effectively identify malicious websites.
  • the present application provides a method and apparatus for identifying a malicious website.
  • the technical solution is as follows:
  • a method of identifying a malicious website comprising:
  • the first website is identified based on the first website information carried in the malicious website query request to obtain a recognition result; wherein the first website information is the website information of the first website ;
  • an apparatus for identifying a malicious website comprising:
  • a first identifying module configured to: when the malicious website query request is received, identify, according to the first website information carried in the malicious website query request, the first website to obtain a recognition result; wherein the first website information is The URL information of the first website;
  • An obtaining module configured to obtain at least one hyperlink information from page content of the first website based on the recognition result of the first website
  • a second identification module configured to respectively identify at least one second website based on the at least one hyperlink information, where the at least one second website is a website to which the at least one hyperlink information is respectively linked.
  • the present application also provides a computer storage medium comprising a set of instructions that, when executed, cause at least one processor to perform operations comprising:
  • the first website is identified based on the first website information carried in the malicious website query request to obtain a recognition result; wherein the first website information is the website information of the first website ;
  • the technical solution provided by the present application has the beneficial effects that: in the present application, when receiving a malicious website query request, the first website is identified based on the first website information, and is from the page content of the first website. Obtaining at least one hyperlink information, and further identifying, according to the at least one hyperlink information, the second website linked by the at least one hyperlink information, that is, being able to identify the first website while identifying the first website. The second website linked by the hyperlink information in the page content of the website improves the efficiency of identifying the malicious website.
  • FIG. 1 is a flow chart of a method for identifying a malicious website provided by the present application
  • FIG. 2 is a flow chart of another method for identifying a malicious website provided by the present application.
  • FIG. 3 is a schematic structural diagram of a device for identifying a malicious website provided by the present application.
  • FIG. 4 is a schematic structural diagram of another device for identifying a malicious website provided by the present application.
  • FIG. 5 is a schematic structural diagram of a hardware component of the present application.
  • the application scenario of the present application will be introduced. While users download various types of materials and conduct online shopping through the Internet, various Trojan viruses and phishing websites may also steal user information such as user accounts and passwords, which may endanger user information security.
  • the cloud security server needs to download the page content of the website corresponding to the website information, and analyze the page content of the website, thereby reducing the efficiency of identifying the malicious website. Accordingly, the present application provides a method of identifying a malicious website that can save bandwidth and process resources and improve the efficiency of identifying malicious websites.
  • FIG. 1 is a flowchart of a method for identifying a malicious website according to the present application. Referring to FIG. 1, the method includes:
  • Step 101 When the malicious website query request is received, the first website is identified based on the first website information carried in the malicious website query request to obtain a recognition result; wherein the first website information is the first website URL information.
  • Step 102 Acquire at least one hyperlink information from page content of the first website based on the recognition result of the first website.
  • Step 103 Identify at least one second website according to the at least one hyperlink information, where the at least one second website is a website to which the at least one hyperlink information is respectively linked.
  • the first website when receiving the malicious website query request, the first website is identified based on the first website information, and at least one hyperlink information is obtained from the page content of the first website, and further based on the at least one super Linking information, respectively identifying the second website linked by the at least one hyperlink information, that is, being able to identify the first website and identifying the second website linked by the hyperlink information in the page content of the first website , improved recognition of malicious The efficiency of the website.
  • obtaining at least one hyperlink information from the page content of the first website including:
  • the target hyperlink start tag is any hyperlink start tag in the at least one hyperlink start tag, and the target hyperlink end tag corresponds to the target hyperlink start tag.
  • the at least one second website is separately identified based on the at least one hyperlink information, including:
  • the method further includes:
  • the second website linked by the hyperlink information is identified according to a preset condition.
  • the second website linked by the hyperlink information is identified according to a preset condition, including:
  • the third website information is the website information of the third website, and the link address information is referenced by the third website;
  • the second website linked by the hyperlink information is identified according to a preset condition, including:
  • the second website is determined to be a malicious website.
  • the method further includes:
  • the link address information corresponding to the second website is stored in the malicious information database.
  • the method further includes:
  • the link address information corresponding to the second website, the first website information, and The second context information is stored in the malicious link index database, and the second context information is text information of the page content of the first website that is located in the same display area as the link address information corresponding to the second website.
  • FIG. 2 is a flowchart of another method for identifying a malicious website according to the present application.
  • the method for identifying a malicious website is used in a server, and the method includes:
  • Step 201 When receiving a malicious website query request, the first website is identified based on the first website information, and the malicious website query request carries the first website information, the first website The information is the URL information of the first website.
  • the server may identify the first website based on the first website information when receiving the malicious website query request sent by the terminal.
  • the terminal may be a device that can access the Internet, such as a mobile phone or a computer, and the terminal may send the malicious website query request to the server when accessing the first website or when the user inputs the first website information, of course, in actual application.
  • the terminal may also send a malicious website query request to the server at other occasions, which is not specifically limited in this application.
  • the first URL information may be an Internet Protocol (IP) address or a domain name address.
  • IP Internet Protocol
  • the first URL information may also be other addresses, and the application does not do this. Specifically limited.
  • the server when the server identifies the first website based on the first website information, it may determine whether the first website information exists in the malicious information database stored by the server, and when the first website information exists in the malicious information database, The first website is identified as a malicious website; when the first website does not exist in the malicious information database, the page content of the first website is downloaded, and then the first website is identified according to the page content of the first website.
  • the malicious information database may be stored by the server before receiving the malicious website query request, and the malicious information database may include a plurality of malicious website information, as shown in Table 1 below, the website information 1, the website information 2, and the website information. 3, etc. are malicious URL information.
  • the present application only uses a plurality of malicious web address information included in the malicious information database shown in Table 1 above as an example, and the above Table 1 does not limit the present application.
  • the server identifies the first website according to the content of the page of the first website, it may be determined whether a specific word is included in the page content of the first website, and when the specific word appears, the first A website is identified as a malicious website.
  • the method for the server to identify the first website according to the content of the page of the first website may also refer to the prior art, and the present application will not be repeated.
  • the server determines that the website information 1 exists in the malicious information database described in Table 1 above, and therefore, the website information 1 is corresponding.
  • the first website 1 is identified as a malicious website; when the server receives the malicious website query request 2, and the malicious website query request carries the website information 4, the server determines that the malicious information database described in Table 1 does not have the website information. 4. Therefore, the page content of the first website 2 corresponding to the website information is downloaded, and then the first website 2 is identified according to the page content of the first website 2.
  • Step 202 After identifying the first website, obtain at least one hyperlink information from the page content of the first website.
  • the page of the first website may be Get at least one hyperlink information in the content.
  • the server when the first web address information exists in the malicious information database, the server does not download the page content of the first website, and can also identify the first website as a malicious website, and therefore, when the server is based on The malicious information database identifies the first website as evil After the website is intentionally, the page content corresponding to the first website may also be downloaded, and then the website linked by the hyperlink information included in the page content of the first website may be identified.
  • the obtaining the at least one hyperlink information from the page content of the first website may be: acquiring at least one hyperlink start tag and at least one hyperlink end tag from the page content of the first website, from the In the page content of the first website, the target hyperlink start tag, the target hyperlink end tag, and the information between the target hyperlink start tag and the target hyperlink end tag are determined as hyperlink information, and the target hyperlink starts.
  • the tag is any hyperlink start tag in the at least one hyperlink start tag, and the target hyperlink end tag corresponds to the target hyperlink start tag.
  • the hyperlink start tag and the hyperlink end tag corresponding to the hyperlink start tag are used to describe the hyperlink start tag, the hyperlink end tag, and the hyperlink corresponding to the hyperlink start tag and the hyperlink start tag.
  • the content between the end tags is hyperlink information.
  • the hyperlink start tag can be ⁇ a plurality of tag attributes>
  • the hyperlink end tag can be ⁇ /a>.
  • the label and the hyperlink end label may also be represented by other forms, which are not specifically limited in this application.
  • the hyperlink start tag may include a plurality of tag attributes, for example, a target attribute is used to describe a manner of opening a website linked by the hyperlink information, and a hypertext reference.
  • the (href, Hypertext Reference) attribute is used to describe the link address information of the website to which the hyperlink information is linked.
  • the plurality of label attributes may further include other attributes, which is not specifically limited in this application.
  • the operation of obtaining at least one hyperlink information from the page content of the first website may also refer to the prior art, and the present application will not be repeated.
  • the hyperlink end tag is ⁇ /a>
  • the hyperlink start tag generally includes the link address information of the website linked by the hyperlink information, and therefore, in order to improve the efficiency of obtaining the hyperlink information, the server can only obtain the hyperlink start tag, and thus the acquisition will be performed.
  • the hyperlink start tag is determined as hyperlink information.
  • Step 203 Identify, according to the at least one hyperlink information, at least one second website, where the at least one second website is a website to which the at least one hyperlink information is respectively linked.
  • the user can access the website linked by the hyperlink information through the hyperlink information, and when the website linked by the hyperlink information is a malicious website, the malicious website may endanger the security of the user information, and therefore, in order to improve the identification.
  • the efficiency of the malicious website may further identify the at least one second website based on the at least one hyperlink information after identifying the first website.
  • the operation of identifying the at least one second website based on the at least one hyperlink information may be: obtaining, for each hyperlink information in the at least one hyperlink information, link address information from the hyperlink information. Determining whether the link address information exists in the stored malicious information database. When the link address information exists in the malicious information database, determining that the second website linked by the hyperlink information is a malicious website.
  • the hyperlink information includes the link address information, it may also include other information, such as a link address description and the like, wherein the link address information is the web address information corresponding to the second website, and other information is used for The link address information is explained.
  • link address information may be an IP address or a domain name address.
  • link address information may also be other addresses, which is not specifically limited in this application.
  • the second website linked to the hyperlink information may be further configured according to a preset condition. Identify.
  • the operation of identifying the second website linked by the hyperlink information may include the following two methods according to a preset condition.
  • the number of third website information is obtained from the stored malicious link index database, and the third website information is the website information of the third website, and the link address information is referenced by the third website, when the third When the number of the URL information is greater than the first preset value, the second website is determined to be a malicious website.
  • the third website is a malicious website, and the third website information may be an IP address or a domain name address. Of course, in the actual application, the third website information may also be other addresses, which is not specifically limited in this application.
  • the malicious link index library includes at least the third website information and the link address information referenced by the third website.
  • the malicious link index The library may also include other content.
  • the malicious link index library may further include a third website type, which is not specifically limited in this application.
  • the third website refers to the link address information, it often refers to the link address information of the website with the same type as the third website.
  • the second website also It may be a malicious website.
  • the server may obtain the number of the third website information that references the link address information from the stored malicious link index database, when the number of the third website information is greater than the first preset value The second website is determined to be a malicious website.
  • the server may set the first preset value according to the needs of the actual application before acquiring the third web address information.
  • the first preset value may be set to 2.
  • the first preset value may also be other values, which is not specifically limited in this application.
  • the server obtains the link address information 1 from the hyperlink information
  • the server obtains the reference to the link address information 1 from the malicious link index library as shown in Table 2 below.
  • the number of the third website information is 3, and the third website information number 3 is determined to be greater than the first preset value 2. Therefore, the second website 1 corresponding to the link address information 1 is identified as a malicious website.
  • Link address information Third URL information Link address information 1 Third URL information 1 Link address information 1 Third URL information 2 Link address information 1 Third URL Information 3 Link address information 2 Third URL Information 4 Link address information 3 Third URL information 2 Link address information 3 Third URL information 5 ?? ;
  • the server may, after obtaining the third webpage information number, the third webpage information corresponding to the third website that has been identified as the malicious website and the link address letter referenced by the third website.
  • the information is stored in the malicious link index library.
  • the link address information may also be stored in the malicious information database, when the server needs to When the second website performs identification, the second website may be identified as a malicious website according to the link address information in the malicious information database, thereby improving the efficiency of identifying the malicious website.
  • the number of the third website information that references the first website information may be obtained from the malicious link index database in the above manner, and the number of the third website information obtained is greater than the first When the value is set, the first website is identified as a malicious website.
  • the number of specific words included in the first context information is obtained from the stored malicious link index database, and when the number of the specific words is greater than the second preset value, the second website is determined to be a malicious website.
  • the malicious link index database may further include first context information, where the first context information is in the same display area as the link address information of the page content of the third website.
  • the text information so the server can obtain the number of specific words included in the first context information from the stored malicious link index library, and when the number of the specific words is greater than the second preset value, determine the second website as Malicious website.
  • the server may set a second preset value according to actual application requirements.
  • the second preset value may be 3, and of course, the second preset value may also be other.
  • Numerical values are not specifically limited in this application.
  • the server when the second preset value is 3 and the specific word is "gaming", when the server obtains the link address information 1 from the hyperlink information, the server is from a malicious link as shown in Table 3 below.
  • the number of specific words included in the first context information is 4, and the number 4 of the specific words is determined to be greater than the second preset value 3. Therefore, the second website is identified as a malicious website.
  • the server may compare the third website information corresponding to the third website that has been identified as the malicious website, the link address information referenced by the third website, and the first context before acquiring the specific number of words included in the first context information.
  • Information is stored in a malicious link index library.
  • the server may further identify the second website linked by the hyperlink information according to a preset condition by using other methods, for example, the server may obtain the third website information from the stored malicious link index database at the same time. And identifying the number of the specific words included in the first context information, and identifying the second website as a malicious website when the number of the third website is greater than the first preset value and the number of the specific words is greater than the second predetermined value, the application does not Make specific limits.
  • Step 204 For any second website of the at least one second website, when the second website is not a malicious website and the first website is a malicious website, the link address corresponding to the second website is The first URL information and the second context information are stored in the malicious link index library.
  • the server may store the link address information, the first website information, and the second context information corresponding to the second website to the malicious website.
  • the link index library after the server identifies the website that references the link address information corresponding to the second website again, the second website may be further identified to determine whether the second website is a malicious website. .
  • the second context information is text information in which the link address information corresponding to the second website in the page content of the first website is located in the same display area.
  • the server identifies the second website and any of the at least one second website
  • the first website is identified as a malicious website
  • the second website is not recognized as a malicious website.
  • the server should also store the first website or the second website identified as a malicious website in the malicious information database.
  • the server may store the first URL information and the link address information corresponding to the second website in the malicious information database; for the second result, the server may correspond to the second website.
  • the link address information is stored in the malicious information database; for the fourth result, the server may not store the first web address information and the link address information corresponding to the second website.
  • the first website when receiving a malicious website query request, based on the first website information, The first website identifies, and obtains at least one hyperlink information from the page content of the first website, and further identifies, according to the at least one hyperlink information, the second website linked by the at least one hyperlink information, that is, The identification of the first website can also identify the second website linked by the hyperlink information in the page content of the first website, thereby improving the efficiency of identifying the malicious website.
  • the server identifies the second website, it is not required to download the page content of the second website, but based on the stored malicious information database and the data in the malicious link index database, and select different preset conditions according to different needs of the actual application.
  • the identification of the second website enhances the flexibility of identifying malicious websites while improving the flexibility of identifying malicious websites.
  • FIG. 3 is a schematic diagram of a device for identifying a malicious website according to the present application.
  • the device includes a first identification module 301, an obtaining module 302, and a second identifying module 303.
  • the first identification module 301 is configured to: when receiving the malicious website query request, identify the first website according to the first website information, where the malicious website query request carries the first website information, where the first website information is URL information of the first website;
  • the obtaining module 302 is configured to obtain at least one hyperlink information from the page content of the first website after the first website is identified;
  • the second identification module 303 is configured to respectively identify at least one second website based on the at least one hyperlink information, where the at least one second website is a website to which the at least one hyperlink information is respectively linked.
  • the obtaining module includes:
  • a first obtaining unit configured to obtain at least one hyperlink start tag and at least one hyperlink end tag from the page content of the first website
  • a first determining unit configured to determine, from a page content of the first website, a target hyperlink start tag, a target hyperlink end tag, and information between the target hyperlink start tag and the target hyperlink end tag as Hyperlink information, the target hyperlink start tag is any hyperlink start tag in the at least one hyperlink start tag, and the target hyperlink end tag is The target hyperlink start tag corresponds.
  • the second identification module includes:
  • a second obtaining unit configured to obtain link address information from the hyperlink information for each hyperlink information in the at least one hyperlink information
  • a determining unit configured to determine whether the link address information exists in the stored malicious information database
  • the second determining unit is configured to determine, when the link address information exists in the malicious information database, that the second website linked by the hyperlink information is a malicious website.
  • the second identification module further includes:
  • the identifying unit is configured to: when the link address information does not exist in the information database, identify the second website linked by the hyperlink information according to a preset condition.
  • the identification unit comprises:
  • a first obtaining subunit configured to obtain, from the stored malicious link index library, a third webpage information, where the third webpage information is webpage information of the third website, and the link address information is referenced by the third website;
  • the first determining subunit is configured to determine the second website as a malicious website when the number of the third website information is greater than the first preset value.
  • the identification unit comprises:
  • a second obtaining sub-unit configured to obtain, from the stored malicious link index library, a number of specific words included in the first context information, where the first context information is the same as the link address information in the page content of the third website Regional textual information;
  • a second determining subunit configured to determine the second website as a malicious website when the number of the specific words is greater than a second preset value.
  • the device further includes:
  • a first storage module configured to be used for any second website of the at least one second website
  • the link address information corresponding to the second website is stored in the malicious information database.
  • the device further includes:
  • a second storage module configured to: for any second website in the at least one second website, when the second website is not a malicious website and the first website is a malicious website, the link address information corresponding to the second website is The first URL information and the second context information are stored in the malicious link index database, where the second context information is a text of the page content of the first website that is located in the same display area as the link address information corresponding to the second website. information.
  • the first website when receiving a malicious website query request, the first website is identified based on the first website information, and at least one hyperlink information is obtained from the page content of the first website, and further And identifying, by the at least one hyperlink information, the second website linked by the at least one hyperlink information, that is, the first website can be identified, and the hyperlink information in the page content of the first website can be identified.
  • the linked second website improves the efficiency of identifying malicious websites.
  • FIG. 4 is a schematic structural diagram of another apparatus for identifying a malicious website according to an embodiment of the present application.
  • the device can be a server, which can be a server in a cluster of background servers. Referring to Figure 4, specifically:
  • the server 400 includes a central processing unit (CPU) 401, a system memory 404 including a random access memory (RAM) 402 and a read-only memory (ROM) 403, and a connection system memory 404 and central processing.
  • Server 400 also includes a basic input/output system (I/O system) 406 that facilitates the transfer of information between various devices within the computer, and a mass storage device 407 for storing operating system 413, applications 414, and other program modules 415. .
  • I/O system basic input/output system
  • the basic input/output system 406 includes a display 408 for displaying information and an input device 409 such as a mouse, keyboard for inputting information by the user. Both display 408 and input device 409 are coupled to central processing unit 401 via an input and output controller 410 coupled to system bus 405.
  • the basic input/output system 406 can also include an input output controller 410 for receiving and processing input from a plurality of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input and output controller 410 also provides output to a display screen, printer, or other type of output device.
  • the mass storage device 407 is connected to the central processing unit 401 by a mass storage controller (not shown) connected to the system bus 405.
  • the mass storage device 407 and its associated computer readable medium provide non-volatile storage for the server 400. That is, the mass storage device 407 can include a computer readable medium (not shown) such as a hard disk or a CD-ROM drive.
  • Computer readable media may comprise computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, ROM, EPROM, EEPROM, flash memory or other solid state storage technologies, CD-ROM, DVD or other optical storage, tape cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices.
  • RAM random access memory
  • ROM read only memory
  • EPROM Erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • the server 400 can also be operated by a remote computer connected to the network through a network such as the Internet. That is, the server 400 can be connected to the network 412 through a network interface unit 411 connected to the system bus 405, or can be connected to other types of networks or remote computer systems (not shown) using the network interface unit 411.
  • the above memory also includes one or more programs, one or more programs being stored in the memory and configured to be executed by the CPU.
  • the one or more programs include
  • the instructions for identifying a malicious website method as described below provided by the present application include:
  • the first website is identified based on the first website information, and the first website information is carried in the malicious website query request, and the first website information is the website information of the first website.
  • At least one second website is a website to which the at least one hyperlink information is respectively linked.
  • obtaining at least one hyperlink information from the page content of the first website including:
  • the target hyperlink start tag is any hyperlink start tag in the at least one hyperlink start tag, and the target hyperlink end tag corresponds to the target hyperlink start tag.
  • the at least one second website is separately identified based on the at least one hyperlink information, including:
  • the method further includes:
  • the second website linked by the hyperlink information is identified according to a preset condition.
  • the second website linked by the hyperlink information is identified according to a preset condition, including:
  • the third website information is the website information of the third website, and the link address information is referenced by the third website;
  • the second website is determined to be a malicious website.
  • the second website linked by the hyperlink information is identified according to a preset condition, including:
  • the second website is determined to be a malicious website.
  • the method further includes:
  • the link address information corresponding to the second website is stored in the malicious information database.
  • the method further includes:
  • the link address information corresponding to the second website, the first website information, and The second context information is stored in the malicious link index database, where the second context information is the same as the link address information corresponding to the second website in the page content of the first website.
  • a text message showing the area is stored in the malicious link index database, where the second context information is the same as the link address information corresponding to the second website in the page content of the first website.
  • the first website when receiving the malicious website query request, the first website is identified based on the first website information, and at least one hyperlink information is obtained from the page content of the first website, and further based on the at least one super Linking information, respectively identifying the second website linked by the at least one hyperlink information, that is, being able to identify the first website and identifying the second website linked by the hyperlink information in the page content of the first website , improving the efficiency of identifying malicious websites.
  • the device for identifying a malicious website only uses the division of the above functional modules when identifying a malicious website.
  • the foregoing functions may be allocated by different functional modules according to requirements.
  • the internal structure of the device is divided into different functional modules to perform all or part of the functions described above.
  • the device for identifying a malicious website provided by the foregoing embodiment is the same as the method for identifying a malicious website. For details of the implementation process, refer to the method embodiment, and details are not described herein again.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.
  • an apparatus provided by this embodiment is an example of a hardware entity, as shown in FIG. 5, including a processor, a storage medium, and at least one external communication interface; the processor, the storage medium, and an external communication interface. Both are connected via a bus.
  • the processor of the apparatus of the present application performs the following processing:
  • the first website is identified based on the first website information carried in the malicious website query request to obtain a recognition result; wherein the first website information is the website information of the first website ;
  • a person skilled in the art can understand that all or part of the process of implementing the above embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium, such as the present application.
  • the program may be stored in a storage medium of a computer system and executed by at least one processor in the computer system to implement a process comprising an embodiment of the methods as described above.
  • the storage medium may be a magnetic disk, an optical disk, a ROM, a RAM, or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种识别恶意网站的方法及装置,属于互联网领域。所述方法包括:当接收到恶意网站查询请求时,基于所述恶意网站查询请求中携带的第一网址信息,对第一网站进行识别得到识别结果;其中,所述第一网址信息为该第一网站的网址信息,基于对该第一网站的识别结果,从该第一网站的页面内容中获取至少一个超链接信息,基于所述至少一个超链接信息,分别对至少一个第二网站进行识别,所述至少一个第二网站为所述至少一个超链接信息分别所链接的网站。本申请能够在对第一网站进行识别的同时,还能够识别第一网站的页面内容中超链接信息所链接的第二网站,提高了识别恶意网站的效率。

Description

识别恶意网站的方法、装置及计算机存储介质
相关申请的交叉引用
本申请基于申请号为201610186975.7、申请日为2016年03月29日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及互联网领域,特别涉及一种识别恶意网站的方法、装置及计算机存储介质。
背景技术
互联网技术的快速发展给人们的生活带来越来越多的便利,比如,人们可以通过互联网下载各类资料、进行网络购物等。与此同时,出现了将各类木马病毒伪装成正常文件来肆意传播、钓鱼网站模仿正常网站盗取用户账号和密码等恶意行为,因此,识别恶意网站的方法受到了广泛地关注。
其中,相关技术中是通过云安全服务器根据网站的网址信息来识别恶意网站,所以,当恶意分子发现某个网址信息被拦截时,该恶意分子可以通过新网址信息来进行恶意行为。此时,信息数据库中不包括该新网址信息的记录,所以,云安全服务器就不会识别出该网站为恶意网站。为了解决该问题,云安全服务器需要下载该网站的页面内容,并对该网站的页面内容进行分析,如果分析结果指示该网站为恶意网站,则在信息数据库中存储该新网址信息的记录,以便下次可以识别出该新网址信息对应的网站为恶意网站。
然而,对于信息数据库中不存在的每个新网址,该云安全服务器都需要执行上述的下载页面内容操作和分析页面内容操作,会浪费大量的带宽 和处理资源,并且花费的时间较长,不利于快速有效地识别恶意网站。
发明内容
为了解决现有技术的问题,本申请提供了一种识别恶意网站的方法及装置。所述技术方案如下:
一方面,提供了一种识别恶意网站的方法,所述方法包括:
当接收到恶意网站查询请求时,基于所述恶意网站查询请求中携带的第一网址信息,对第一网站进行识别得到识别结果;其中,所述第一网址信息为该第一网站的网址信息;
基于对该第一网站的识别结果,从该第一网站的页面内容中获取至少一个超链接信息;
基于所述至少一个超链接信息,分别对至少一个第二网站进行识别,所述至少一个第二网站为所述至少一个超链接信息分别所链接的网站。
另一方面,提供了一种识别恶意网站的装置,所述装置包括:
第一识别模块,用于当接收到恶意网站查询请求时,基于所述恶意网站查询请求中携带的第一网址信息,对第一网站进行识别得到识别结果;其中,所述第一网址信息为该第一网站的网址信息;
获取模块,用于基于对该第一网站的识别结果,从该第一网站的页面内容中获取至少一个超链接信息;
第二识别模块,用于基于所述至少一个超链接信息,分别对至少一个第二网站进行识别,所述至少一个第二网站为所述至少一个超链接信息分别所链接的网站。
再一方面,本申请还提供一种计算机存储介质,该存储介质包括一组指令,当执行所述指令时,引起至少一个处理器执行包括以下的操作:
当接收到恶意网站查询请求时,基于所述恶意网站查询请求中携带的第一网址信息,对第一网站进行识别得到识别结果;其中,所述第一网址信息为该第一网站的网址信息;
基于对该第一网站的识别结果,从该第一网站的页面内容中获取至少一个超链接信息;
基于所述至少一个超链接信息,分别对至少一个第二网站进行识别,所述至少一个第二网站为所述至少一个超链接信息分别所链接的网站。
本申请提供的技术方案带来的有益效果是:在本申请中,当接收到恶意网站查询请求时,基于第一网址信息,对第一网站进行识别,并从该第一网站的页面内容中获取至少一个超链接信息,进而基于该至少一个超链接信息,分别对该至少一个超链接信息所链接的第二网站进行识别,即能够在对第一网站进行识别的同时,还能够识别第一网站的页面内容中超链接信息所链接的第二网站,提高了识别恶意网站的效率。
附图说明
为了更清楚地说明本申请中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请提供的一种识别恶意网站方法流程图;
图2是本申请提供的另一种识别恶意网站方法流程图;
图3是本申请提供的一种识别恶意网站装置结构示意图;
图4是本申请提供的另一种识别恶意网站装置结构示意图;
图5为本申请一种硬件组成结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
在对本申请进行详细的解释说明之前,先对本申请的应用场景予以介绍。在用户通过互联网下载各类资料、进行网络购物的同时,各类木马病毒和钓鱼网站也有可能盗取用户账号和密码等用户信息,危害用户信息安全。相关技术中,对于信息数据库中每个不存在的网址信息,该云安全服务器都需要下载该网址信息对应网站的页面内容,并对该网站的页面内容进行分析,降低了识别恶意网站的效率。因此,本申请提供了一种识别恶意网站的方法,能够节省带宽和处理资源,并提高识别恶意网站的效率。
图1为本申请提供的一种识别恶意网站方法流程图,参见图1,该方法包括:
步骤101:当接收到恶意网站查询请求时,基于所述恶意网站查询请求中携带的第一网址信息,对第一网站进行识别得到识别结果;其中,所述第一网址信息为该第一网站的网址信息。
步骤102:基于对该第一网站的识别结果,从该第一网站的页面内容中获取至少一个超链接信息。
步骤103:基于该至少一个超链接信息,分别对至少一个第二网站进行识别,该至少一个第二网站为该至少一个超链接信息分别所链接的网站。
在本申请中,当接收到恶意网站查询请求时,基于第一网址信息,对第一网站进行识别,并从该第一网站的页面内容中获取至少一个超链接信息,进而基于该至少一个超链接信息,分别对该至少一个超链接信息所链接的第二网站进行识别,即能够在对第一网站进行识别的同时,还能够识别第一网站的页面内容中超链接信息所链接的第二网站,提高了识别恶意 网站的效率。
可选地,从该第一网站的页面内容中获取至少一个超链接信息,包括:
从该第一网站的页面内容中,获取至少一个超链接开始标签和至少一个超链接结束标签;
从该第一网站的页面内容中,将目标超链接开始标签、目标超链接结束标签、以及该目标超链接开始标签与该目标超链接结束标签之间的信息确定为超链接信息,该目标超链接开始标签为该至少一个超链接开始标签中的任一超链接开始标签,该目标超链接结束标签与该目标超链接开始标签对应。
可选地,基于该至少一个超链接信息,分别对至少一个第二网站进行识别,包括:
对于该至少一个超链接信息中的每个超链接信息,从该超链接信息中获取链接地址信息;
判断存储的恶意信息数据库中是否存在该链接地址信息;
当该恶意信息数据库中存在该链接地址信息时,确定该超链接信息所链接的第二网站为恶意网站。
可选地,判断存储的信息数据库中是否存在该链接地址信息之后,还包括:
当该信息数据库中不存在该链接地址信息时,根据预设条件,对该超链接信息所链接的第二网站进行识别。
可选地,根据预设条件,对该超链接信息所链接的第二网站进行识别,包括:
从存储的恶意链接索引库中,获取第三网址信息数目,第三网址信息为第三网站的网址信息,且该链接地址信息被该第三网站所引用;
当该第三网址信息数目大于第一预设数值时,将该第二网站确定为恶 意网站。
可选地,根据预设条件,对该超链接信息所链接的第二网站进行识别,包括:
从存储的恶意链接索引库中,获取第一上下文信息中包括的特定词语数目,该第一上下文信息为第三网站的页面内容中与该链接地址信息位于同一显示区域的文字信息;
当该特定词语数目大于第二预设数值时,将该第二网站确定为恶意网站。
可选地,基于该至少一个超链接信息,分别对至少一个第二网站进行识别之后,还包括:
对于该至少一个第二网站中任一第二网站,当该第二网站为恶意网站时,将该第二网站对应的链接地址信息存储到该恶意信息数据库中。
可选地,基于该至少一个超链接信息,分别对至少一个第二网站进行识别之后,还包括:
对于该至少一个第二网站中任一第二网站,当该第二网站不为恶意网站且该第一网站为恶意网站时,将该第二网站对应的链接地址信息、该第一网址信息和第二上下文信息存储到该恶意链接索引库中,该第二上下文信息为该第一网站的页面内容中与该第二网站对应的链接地址信息位于同一显示区域的文字信息。
上述所有可选技术方案,均可按照任意结合形成本申请的可选实施例,本申请对此不再一一赘述。
图2为本申请提供的另一种识别恶意网站方法流程图,参见图2,该识别恶意网站方法用于服务器中,该方法包括:
步骤201:当接收到恶意网站查询请求时,基于第一网址信息,对第一网站进行识别,该恶意网站查询请求中携带该第一网址信息,该第一网址 信息为该第一网站的网址信息。
由于在用户通过互联网下载各类资料、进行网络购物的同时,各类木马病毒和钓鱼网站也有可能盗取用户账号和密码等用户信息,危害用户信息安全。因此,为了保护用户信息安全,该服务器可以在接收到终端发送的恶意网站查询请求时,基于第一网址信息,对第一网站进行识别。
其中,该终端可以是手机、电脑等能够接入互联网的设备,该终端可以在访问第一网站或者在用户输入第一网址信息时,向该服务器发送该恶意网站查询请求,当然,在实际应用中,该终端也可以在其他时机向服务器发送恶意网站查询请求,本申请对此不做具体限定。
需要说明的是,第一网址信息可以是网络协议(IP,Internet Protocol)地址,也可以是域名地址,当然,在实际应用中,第一网址信息还可以是其它地址,本申请对此不做具体限定。
进一步地,当该服务器基于第一网址信息,对第一网站进行识别时,可以判断该服务器存储的恶意信息数据库中是否存在第一网址信息,当该恶意信息数据库中存在第一网址信息时,将第一网站识别为恶意网站;当该恶意信息数据库中不存在第一网址时,下载第一网站的页面内容,进而根据第一网站的页面内容,对第一网站进行识别。
其中,该恶意信息数据库可以由该服务器在接收该恶意网站查询请求之前存储,该恶意信息数据库中可以包括多个恶意网址信息,如下述表1所示,网址信息1、网址信息2、网址信息3等都是恶意网址信息。
表1
网址信息1
网址信息2
网址信息3
……
需要说明的是,本申请仅以上述表1所示的恶意信息数据库中包括的多个恶意网址信息为例进行说明,上述表1并不对本申请构成限定。
还需要说明的是,当该服务器根据该第一网站的页面内容,对第一网站进行识别时,可以判断该第一网站的页面内容中是否包括特定词语,当出现该特定词语时,将第一网站识别为恶意网站。另外,该服务器根据该第一网站的页面内容对第一网站进行识别的方法,还可以参考现有技术,本申请不再一一赘述。
例如,当该服务器接收到恶意网站查询请求1,且恶意网站查询请求中携带网址信息1时,该服务器判断上述表1所述的恶意信息数据库中存在网址信息1,因此,将网址信息1对应的第一网站1识别为恶意网站;当该服务器接收到恶意网站查询请求2,且恶意网站查询请求中携带网址信息4时,该服务器判断上述表1所述的恶意信息数据库中不存在网址信息4,因此,下载网址信息对应的第一网站2的页面内容,进而根据第一网站2的页面内容,对第一网站2进行识别。
步骤202:对该第一网站进行识别之后,从该第一网站的页面内容中获取至少一个超链接信息。
由于第一网站的页面内容中除了图片和文字等内容之外,经常还包括超链接信息,用户可以通过超链接信息访问到该超链接信息所链接的网站,如果该超链接信息所链接的网站是恶意网站,同样会危害用户信息安全,因此,为了对该超链接信息所链接的网站进行识别,进一步确保用户信息安全,在对该第一网站进行识别之后,可以从该第一网站的页面内容中获取至少一个超链接信息。
需要说明的是,由前述可知,当该恶意信息数据库中存在第一网址信息时,该服务器不下载第一网站的页面内容,也能够将第一网站识别为恶意网站,因此,当该服务器根据该恶意信息数据库,将第一网站识别为恶 意网站之后,也可以下载第一网站对应的页面内容,进而对第一网站的页面内容中包括的超链接信息所链接的网站进行识别。
进一步地,从该第一网站的页面内容中获取至少一个超链接信息的操作可以为:从该第一网站的页面内容中,获取至少一个超链接开始标签和至少一个超链接结束标签,从该第一网站的页面内容中,将目标超链接开始标签、目标超链接结束标签、以及该目标超链接开始标签与该目标超链接结束标签之间的信息确定为超链接信息,该目标超链接开始标签为该至少一个超链接开始标签中的任一超链接开始标签,该目标超链接结束标签与该目标超链接开始标签对应。
其中,超链接开始标签和与该超链接开始标签对应的超链接结束标签用于说明该超链接开始标签、该超链接结束标签、以及该超链接开始标签与该超链接开始标签对应的超链接结束标签之间的内容为超链接信息,比如,该超链接开始标签可以是<a多个标签属性>,该超链接结束标签可以为</a>,当然,在实际应用中,超链接开始标签和超链接结束标签还可以通过其他形式进行表示,本申请对此不做具体限定。
需要说明的是,为了准确说明该超链接信息,该超链接开始标签可以包括多个标签属性,比如,目标(target)属性用于说明打开该超链接信息所链接的网站的方式,超文本引用(href,Hypertext Reference)属性用于说明该超链接信息所链接网站的链接地址信息,当然,在实际应用中,该多个标签属性还可以包括其他属性,本申请对此不做具体限定。
还需要说明的是,除上述方法外,从该第一网站的页面内容中获取至少一个超链接信息的操作,还可以参考现有技术,本申请不再一一赘述。
例如,当超链接开始标签为<a target="target属性值"href="href属性值">,超链接结束标签为</a>,第一网站1的页面内容为“<li<a target="_blank"href="www.123.com">这是123网</a></li>”时,该服务器从第一网站1的 页面内容中获取<a target="_blank"href="www.123.com">、</a>以及<a target="_blank"href="www.123.com">与</a>之间的内容“<a target="_blank"href="www.123.com">这是123网</a>”为超链接信息1。
进一步地,由前述可知,超链接开始标签通常都包括该超链接信息所链接网站的链接地址信息,因此,为了提高获取超链接信息的效率,该服务器可以只获取超链接开始标签,进而将获取的超链接开始标签确定为超链接信息。
步骤203:基于该至少一个超链接信息,分别对至少一个第二网站进行识别,该至少一个第二网站为该至少一个超链接信息分别所链接的网站。
由前述可知,用户可以通过超链接信息访问该超链接信息所链接的网站,而当该超链接信息所链接的网站为恶意网站时,该恶意网站可能会危害用户信息安全,因此,为了提高识别恶意网站的效率,在对第一网站进行识别之后,还可以基于该至少一个超链接信息,分别对至少一个第二网站进行识别。
其中,基于该至少一个超链接信息,分别对至少一个第二网站进行识别的操作可以为:对于该至少一个超链接信息中的每个超链接信息,从该超链接信息中获取链接地址信息,判断存储的恶意信息数据库中是否存在该链接地址信息,当该恶意信息数据库中存在该链接地址信息时,确定该超链接信息所链接的第二网站为恶意网站。
需要说明的是,由于超链接信息在包括链接地址信息的同时,还可能包括其它信息,比如,链接地址说明等信息,其中,链接地址信息为第二网站对应的网址信息,其它信息用于对该链接地址信息进行说明。
还需要说明的是,该链接地址信息可以是IP地址,也可以是域名地址,当然,在实际应用中,该链接地址信息还可以是其它地址,本申请对此不做具体限定。
例如,在超链接信息1“<a href="www.123.com">这是123网>/a>”中,“www.123.com”为第二网站对应的链接地址信息,“这是123网”为链接地址说明,用于对该链接地址信息进行说明。
进一步地,判断存储的恶意信息数据库中是否存在该链接地址信息之后,当该恶意信息数据库中不存在该链接地址信息时,还可以根据预设条件,对该超链接信息所链接的第二网站进行识别。
其中,根据预设条件,对该超链接信息所链接的第二网站进行识别的操作可以包括下述两种方式。
第一种方式,从存储的恶意链接索引库中,获取第三网址信息数目,第三网址信息为第三网站的网址信息,且该链接地址信息被该第三网站所引用,当该第三网址信息数目大于第一预设数值时,将该第二网站确定为恶意网站。
其中,第三网站为恶意网站,第三网址信息可以是IP地址,也可以是域名地址,当然,在实际应用中,第三网址信息还可以是其它地址,本申请对此不做具体限定。
需要说明的是,该恶意链接索引库至少包括第三网址信息和第三网站引用的链接地址信息,当然在实际应用中,为了提高识别第二网站是否为恶意网站的准确性,该恶意链接索引库还可以包括其它内容,比如,为了指示第三网站所属的类型,该恶意链接索引库还可以包括第三网站类型,本申请对此不做具体限定。
还需要说明的是,由于第三网站在引用链接地址信息时,常会引用与第三网站类型相同的网站的链接地址信息,当多个恶意网站都引用该链接地址信息时,该第二网站也可能为恶意网站,比如,在***类网站为恶意网站的情况下,当多个***类网站都引用了该链接地址信息时,第二网站也可能为***类网站,进而第二网站也可能是恶意网站,因此,为了提高 识别第二网站是否为恶意网站的准确率,该服务器可以从存储的恶意链接索引库中,获取引用该链接地址信息的第三网址信息数目,当该第三网址信息数目大于第一预设数值时,将该第二网站确定为恶意网站。
进一步地,该服务器可以在获取第三网址信息数目之前,根据实际应用的需要,设置第一预设数值,比如,可以将第一预设数值设置为2。当然,第一预设数值还可以是其它值,本申请对此不做具体限定。
例如,当第一预设数值为2,该服务器从超链接信息中获取到链接地址信息1时,该服务器从如下述表2所示的恶意链接索引库中,获取引用链接地址信息1的第三网址信息数目为3,确定第三网址信息数目3大于第一预设数值2,因此,将链接地址信息1对应的第二网站1识别为恶意网站。
表2
链接地址信息 第三网址信息
链接地址信息1 第三网址信息1
链接地址信息1 第三网址信息2
链接地址信息1 第三网址信息3
链接地址信息2 第三网址信息4
链接地址信息3 第三网址信息2
链接地址信息3 第三网址信息5
…… ……
需要说明的是,本申请仅以上述表2所示的恶意链接索引库中包括的链接地址信息、第三网址信息为例进行说明,上述表2并不对本申请构成限定。
进一步地,该服务器可以在获取第三网址信息数目之前,将已识别为恶意网站的第三网站对应的第三网址信息和第三网站所引用的链接地址信 息存储在恶意链接索引库中。
进一步地,在该第三网址信息数目大于第一预设数值,并将该第二网站确定为恶意网站之后,还可以将该链接地址信息存储到恶意信息数据库中,当该服务器需要再次对该第二网站进行识别时,可以根据该恶意信息数据库中的链接地址信息,将该第二网站识别为恶意网站,提高了识别恶意网站的效率。
另外,在步骤201中识别第一网站时,也可以通过上述方式从恶意链接索引库中,获取引用了第一网址信息的第三网址信息数目,当获取的第三网址信息数目大于第一预设数值时,将第一网站识别为恶意网站。
第二种方式,从存储的恶意链接索引库中,获取第一上下文信息中包括的特定词语数目,当该特定词语数目大于第二预设数值时,将该第二网站确定为恶意网站。
其中,由于第三网站在引用该链接地址信息时,通常还会在与该链接地址信息位于同一显示区域,通过文字信息对该链接地址信息进行说明,或者对该第二网站类型进行说明,因此,为了增加识别第二网站是否为恶意网站的准确率,恶意链接索引库中还可以包括第一上下文信息,第一上下文信息即为第三网站的页面内容中与该链接地址信息位于同一显示区域的文字信息,所以该服务器可以从存储的恶意链接索引库中,获取第一上下文信息中包括的特定词语数目,进而在该特定词语数目大于第二预设数值时,将该第二网站确定为恶意网站。
需要说明的是,该服务器可以在获取第一上下文信息之前,根据实际应用需要,设置第二预设数值,比如,第二预设数值可以为3,当然,第二预设数值还可以为其它数值,本申请对此不做具体限定。
例如,当第二预设数值为3,特定词语为“***”,该服务器从超链接信息中获取到链接地址信息1时,该服务器从如下述表3所示的恶意链接 索引库中,获取第一上下文信息中包括的特定词语数目为4,确定该特定词语数目4大于第二预设数值3,因此,将第二网站识别为恶意网站。
表3
链接地址信息 第三网址信息 第一上下文信息
链接地址信息1 第三网址信息1 ***地理
链接地址信息1 第三网址信息2 ***
链接地址信息1 第三网址信息3 ******
链接地址信息2 第三网址信息4 教育科学
链接地址信息3 第三网址信息2 天文新闻
链接地址信息3 第三网址信息5 生活百科
…… ……  
需要说明的是,本申请仅以上述表3所示的恶意链接索引库中包括的链接地址信息、第三网址信息和第一上下文信息为例进行说明,上述表3并不对本申请构成限定。
进一步地,该服务器可以在获取第一上下文信息中包括的特定词语数目之前,将已识别为恶意网站的第三网站对应的第三网址信息、第三网站所引用的链接地址信息和第一上下文信息存储在恶意链接索引库中。
另外,该服务器还可以通过其它方式,根据预设条件,对该超链接信息所链接的第二网站进行识别,比如,该服务器可以同时从存储的恶意链接索引库中,获取第三网址信息数目和第一上下文信息中包括的特定词语数目,并在第三网址数目大于第一预设数值且特定词语数目大于第二预设数值时,将第二网站识别为恶意网站,本申请对此不做具体限定。
步骤204:对于该至少一个第二网站中任一第二网站,当该第二网站不为恶意网站且该第一网站为恶意网站时,将该第二网站对应的链接地址信 息、该第一网址信息和第二上下文信息存储到所述恶意链接索引库中。
由于当第一网站为恶意网站时,第一网站所引用的第二网站也很可能是恶意网站,因此,为了进一步提高识别第二网站是否为恶意网站的准确性,当该服务器通过步骤201至步骤203将第一网站识别为恶意网站,但未将第二网站识别为恶意网站时,该服务器可以将该第二网站对应的链接地址信息、该第一网址信息和第二上下文信息存储到恶意链接索引库中,之后,当该服务器再次对引用了该第二网站对应的链接地址信息的网站进行识别时,还可以对该第二网站进行进一步识别,以确定该第二网站是否为恶意网站。
需要说明的是,第二上下文信息为第一网站的页面内容中与第二网站对应的链接地址信息位于同一显示区域的文字信息。
另外,当该服务器对第一网站和至少一个第二网站中的任一第二网站进行识别时,除上述将第一网站识别为恶意网站,但未将第二网站识别为恶意网站的结果外,可能会包括下述三种结果:第一种结果,将第一网站和第二网站均识别为恶意网站;第二种结果,将第二网站识别为恶意网站,将第一网站识别为非恶意网站;第三种结果,将第一网站和第二网站均识别为非恶意网站。
需要说明的是,该服务器在对第一网站和至少一个第二网站中的任一第二网站进行识别之后,还应该将识别为恶意网站的第一网站或者第二网站存储在该恶意信息数据库中。具体地,对于第一种结果,该服务器可以将第一网址信息和第二网站对应的链接地址信息均存储在该恶意信息数据库中;对于第二种结果,该服务器可以将第二网站对应的链接地址信息存储到该恶意信息数据库中;对于第四种结果,该服务器可以不存储第一网址信息和第二网站对应的链接地址信息。
在本申请中,当接收到恶意网站查询请求时,基于第一网址信息,对 第一网站进行识别,并从该第一网站的页面内容中获取至少一个超链接信息,进而基于该至少一个超链接信息,分别对该至少一个超链接信息所链接的第二网站进行识别,即能够在对第一网站进行识别的同时,还能够识别第一网站的页面内容中超链接信息所链接的第二网站,提高了识别恶意网站的效率。另外,该服务器识别第二网站时,不需要下载第二网站的页面内容,而是基于存储的恶意信息数据库和恶意链接索引库中的数据,根据实际应用的不同需求,选择不同的预设条件对第二网站进行识别,在提高了识别恶意网站的效率的同时,也提高了识别恶意网站的灵活性。
图3为本申请提供的一种识别恶意网站装置示意图,参见图3,该装置包括第一识别模块301、获取模块302和第二识别模块303。
第一识别模块301,用于当接收到恶意网站查询请求时,基于第一网址信息,对第一网站进行识别,该恶意网站查询请求中携带该第一网址信息,该第一网址信息为该第一网站的网址信息;
获取模块302,用于对该第一网站进行识别之后,从该第一网站的页面内容中获取至少一个超链接信息;
第二识别模块303,用于基于该至少一个超链接信息,分别对至少一个第二网站进行识别,该至少一个第二网站为该至少一个超链接信息分别所链接的网站。
可选地,该获取模块包括:
第一获取单元,用于从该第一网站的页面内容中,获取至少一个超链接开始标签和至少一个超链接结束标签;
第一确定单元,用于从该第一网站的页面内容中,将目标超链接开始标签、目标超链接结束标签、以及该目标超链接开始标签与该目标超链接结束标签之间的信息确定为超链接信息,该目标超链接开始标签为该至少一个超链接开始标签中的任一超链接开始标签,该目标超链接结束标签与 该目标超链接开始标签对应。
可选地,该第二识别模块包括:
第二获取单元,用于对于该至少一个超链接信息中的每个超链接信息,从该超链接信息中获取链接地址信息;
判断单元,用于判断存储的恶意信息数据库中是否存在该链接地址信息;
第二确定单元,用于当该恶意信息数据库中存在该链接地址信息时,确定该超链接信息所链接的第二网站为恶意网站。
可选地,该第二识别模块还包括:
识别单元,用于当该信息数据库中不存在该链接地址信息时,根据预设条件,对该超链接信息所链接的第二网站进行识别。
可选地,该识别单元包括:
第一获取子单元,用于从存储的恶意链接索引库中,获取第三网址信息数目,第三网址信息为第三网站的网址信息,且该链接地址信息被该第三网站所引用;
第一确定子单元,用于当该第三网址信息数目大于第一预设数值时,将该第二网站确定为恶意网站。
可选地,该识别单元包括:
第二获取子单元,用于从存储的恶意链接索引库中,获取第一上下文信息中包括的特定词语数目,该第一上下文信息为第三网站的页面内容中与该链接地址信息位于同一显示区域的文字信息;
第二确定子单元,用于当该特定词语数目大于第二预设数值时,将该第二网站确定为恶意网站。
可选地,该装置还包括:
第一存储模块,用于对于该至少一个第二网站中任一第二网站,当该 第二网站为恶意网站时,将该第二网站对应的链接地址信息存储到该恶意信息数据库中。
可选地,该装置还包括:
第二存储模块,用于对于该至少一个第二网站中任一第二网站,当该第二网站不为恶意网站且该第一网站为恶意网站时,将该第二网站对应的链接地址信息、该第一网址信息和第二上下文信息存储到该恶意链接索引库中,该第二上下文信息为该第一网站的页面内容中与该第二网站对应的链接地址信息位于同一显示区域的文字信息。
综上所述,在本申请中,当接收到恶意网站查询请求时,基于第一网址信息,对第一网站进行识别,并从该第一网站的页面内容中获取至少一个超链接信息,进而基于该至少一个超链接信息,分别对该至少一个超链接信息所链接的第二网站进行识别,即能够在对第一网站进行识别的同时,还能够识别第一网站的页面内容中超链接信息所链接的第二网站,提高了识别恶意网站的效率。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
图4为本申请一个实施例提供的另一种识别恶意网站装置结构示意图。该装置可以是服务器,该服务器可以是后台服务器集群中的服务器。参照图4,具体来讲:
服务器400包括中央处理单元(CPU)401、包括随机存取存储器(Random Access Memory,RAM)402和只读存储器(Read-Only Memory,ROM)403的***存储器404,以及连接***存储器404和中央处理单元401的***总线405。服务器400还包括帮助计算机内的各个器件之间传输信息的基本输入/输出***(I/O***)406,和用于存储操作***413、应用程序414和其他程序模块415的大容量存储设备407。
基本输入/输出***406包括有用于显示信息的显示器408和用于用户输入信息的诸如鼠标、键盘之类的输入设备409。其中显示器408和输入设备409都通过连接到***总线405的输入输出控制器410连接到中央处理单元401。基本输入/输出***406还可以包括输入输出控制器410以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器410还提供输出到显示屏、打印机或其他类型的输出设备。
大容量存储设备407通过连接到***总线405的大容量存储控制器(未示出)连接到中央处理单元401。大容量存储设备407及其相关联的计算机可读介质为服务器400提供非易失性存储。也就是说,大容量存储设备407可以包括诸如硬盘或者CD-ROM驱动器之类的计算机可读介质(未示出)。
不失一般性,计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、EPROM、EEPROM、闪存或其他固态存储其技术,CD-ROM、DVD或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知计算机存储介质不局限于上述几种。上述的***存储器404和大容量存储设备407可以统称为存储器。
根据本申请的各种实施例,服务器400还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即服务器400可以通过连接在***总线405上的网络接口单元411连接到网络412,或者说,也可以使用网络接口单元411来连接到其他类型的网络或远程计算机***(未示出)。
上述存储器还包括一个或者一个以上的程序,一个或者一个以上程序存储于存储器中,被配置由CPU执行。所述一个或者一个以上程序包含用 于进行本申请提供的如下所述的识别恶意网站方法的指令,包括:
当接收到恶意网站查询请求时,基于第一网址信息,对第一网站进行识别,该恶意网站查询请求中携带该第一网址信息,该第一网址信息为该第一网站的网址信息。
对该第一网站进行识别之后,从该第一网站的页面内容中获取至少一个超链接信息。
基于该至少一个超链接信息,分别对至少一个第二网站进行识别,该至少一个第二网站为该至少一个超链接信息分别所链接的网站。
可选地,从该第一网站的页面内容中获取至少一个超链接信息,包括:
从该第一网站的页面内容中,获取至少一个超链接开始标签和至少一个超链接结束标签;
从该第一网站的页面内容中,将目标超链接开始标签、目标超链接结束标签、以及该目标超链接开始标签与该目标超链接结束标签之间的信息确定为超链接信息,该目标超链接开始标签为该至少一个超链接开始标签中的任一超链接开始标签,该目标超链接结束标签与该目标超链接开始标签对应。
可选地,基于该至少一个超链接信息,分别对至少一个第二网站进行识别,包括:
对于该至少一个超链接信息中的每个超链接信息,从该超链接信息中获取链接地址信息;
判断存储的恶意信息数据库中是否存在该链接地址信息;
当该恶意信息数据库中存在该链接地址信息时,确定该超链接信息所链接的第二网站为恶意网站。
可选地,判断存储的信息数据库中是否存在该链接地址信息之后,还包括:
当该信息数据库中不存在该链接地址信息时,根据预设条件,对该超链接信息所链接的第二网站进行识别。
可选地,根据预设条件,对该超链接信息所链接的第二网站进行识别,包括:
从存储的恶意链接索引库中,获取第三网址信息数目,第三网址信息为第三网站的网址信息,且该链接地址信息被该第三网站所引用;
当该第三网址信息数目大于第一预设数值时,将该第二网站确定为恶意网站。
可选地,根据预设条件,对该超链接信息所链接的第二网站进行识别,包括:
从存储的恶意链接索引库中,获取第一上下文信息中包括的特定词语数目,该第一上下文信息为第三网站的页面内容中与该链接地址信息位于同一显示区域的文字信息;
当该特定词语数目大于第二预设数值时,将该第二网站确定为恶意网站。
可选地,基于该至少一个超链接信息,分别对至少一个第二网站进行识别之后,还包括:
对于该至少一个第二网站中任一第二网站,当该第二网站为恶意网站时,将该第二网站对应的链接地址信息存储到该恶意信息数据库中。
可选地,基于该至少一个超链接信息,分别对至少一个第二网站进行识别之后,还包括:
对于该至少一个第二网站中任一第二网站,当该第二网站不为恶意网站且该第一网站为恶意网站时,将该第二网站对应的链接地址信息、该第一网址信息和第二上下文信息存储到该恶意链接索引库中,该第二上下文信息为该第一网站的页面内容中与该第二网站对应的链接地址信息位于同 一显示区域的文字信息。
在本申请中,当接收到恶意网站查询请求时,基于第一网址信息,对第一网站进行识别,并从该第一网站的页面内容中获取至少一个超链接信息,进而基于该至少一个超链接信息,分别对该至少一个超链接信息所链接的第二网站进行识别,即能够在对第一网站进行识别的同时,还能够识别第一网站的页面内容中超链接信息所链接的第二网站,提高了识别恶意网站的效率。
需要说明的是:上述实施例提供的识别恶意网站的装置在识别恶意网站时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的识别恶意网站的装置与识别恶意网站的方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内
在前述场景的基础上,本实施例所提供的装置作为硬件实体的一个示例如图5所示,包括处理器、存储介质以及至少一个外部通信接口;所述处理器、存储介质以及外部通信接口均通过总线连接。
本申请装置的处理器执行以下处理:
当接收到恶意网站查询请求时,基于所述恶意网站查询请求中携带的第一网址信息,对第一网站进行识别得到识别结果;其中,所述第一网址信息为该第一网站的网址信息;
基于对该第一网站的识别结果,从该第一网站的页面内容中获取至少一个超链接信息;
基于所述至少一个超链接信息,分别对至少一个第二网站进行识别,所述至少一个第二网站为所述至少一个超链接信息分别所链接的网站。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述程序可存储于一计算机可读取存储介质中,如本申请中,该程序可存储于计算机***的存储介质中,并被该计算机***中的至少一个处理器执行,以实现包括如上述各方法的实施例的流程。其中,所述存储介质可为磁碟、光盘、ROM或RAM等。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (17)

  1. 一种识别恶意网站的方法,所述方法包括:
    当接收到恶意网站查询请求时,基于所述恶意网站查询请求中携带的第一网址信息,对第一网站进行识别得到识别结果;其中,所述第一网址信息为该第一网站的网址信息;
    基于对该第一网站的识别结果,从该第一网站的页面内容中获取至少一个超链接信息;
    基于所述至少一个超链接信息,分别对至少一个第二网站进行识别,所述至少一个第二网站为所述至少一个超链接信息分别所链接的网站。
  2. 如权利要求1所述的方法,其中,所述从所述第一网站的页面内容中获取至少一个超链接信息,包括:
    从所述第一网站的页面内容中,获取至少一个超链接开始标签和至少一个超链接结束标签;
    从所述第一网站的页面内容中,将目标超链接开始标签、目标超链接结束标签、以及所述目标超链接开始标签与所述目标超链接结束标签之间的信息确定为超链接信息,所述目标超链接开始标签为所述至少一个超链接开始标签中的任一超链接开始标签,所述目标超链接结束标签与所述目标超链接开始标签对应。
  3. 如权利要求1所述的方法,其中,所述基于所述至少一个超链接信息,分别对至少一个第二网站进行识别,包括:
    对于所述至少一个超链接信息中的每个超链接信息,从所述超链接信息中获取链接地址信息;
    判断存储的恶意信息数据库中是否存在所述链接地址信息;
    当所述恶意信息数据库中存在所述链接地址信息时,确定所述超链接信息所链接的第二网站为恶意网站。
  4. 如权利要求3所述的方法,其中,所述判断存储的信息数据库中是否存在所述链接地址信息之后,还包括:
    当所述信息数据库中不存在所述链接地址信息时,根据预设条件,对所述超链接信息所链接的第二网站进行识别。
  5. 如权利要求所述4所述到的方法,其中,所述根据预设条件,对所述超链接信息所链接的第二网站进行识别,包括:
    从存储的恶意链接索引库中,获取第三网址信息数目,第三网址信息为第三网站的网址信息,且所述链接地址信息被所述第三网站所引用;
    当所述第三网址信息数目大于第一预设数值时,将所述第二网站确定为恶意网站。
  6. 如权利要求所述4所述到的方法,其中,所述根据预设条件,对所述超链接信息所链接的第二网站进行识别,包括:
    从存储的恶意链接索引库中,获取第一上下文信息中包括的特定词语数目,所述第一上下文信息为第三网站的页面内容中与所述链接地址信息位于同一显示区域的文字信息;
    当所述特定词语数目大于第二预设数值时,将所述第二网站确定为恶意网站。
  7. 如权利要求3-6任一权利要求所述的方法,其中,所述基于所述至少一个超链接信息,分别对至少一个第二网站进行识别之后,还包括:
    对于所述至少一个第二网站中任一第二网站,当所述第二网站为恶意网站时,将所述第二网站对应的链接地址信息存储到所述恶意信息数据库中。
  8. 如权利要求3-6任一权利要求所述的方法,其中,所述基于所述至少一个超链接信息,分别对至少一个第二网站进行识别之后,还包括:
    对于所述至少一个第二网站中任一第二网站,当所述第二网站不为恶 意网站且所述第一网站为恶意网站时,将所述第二网站对应的链接地址信息、所述第一网址信息和第二上下文信息存储到所述恶意链接索引库中,所述第二上下文信息为所述第一网站的页面内容中与所述第二网站对应的链接地址信息位于同一显示区域的文字信息。
  9. 一种识别恶意网站的装置,所述装置包括:
    第一识别模块,配置为当接收到恶意网站查询请求时,基于所述恶意网站查询请求中携带的第一网址信息,对第一网站进行识别得到识别结果;其中,所述第一网址信息为该第一网站的网址信息;
    获取模块,配置为基于对该第一网站的识别结果,从该第一网站的页面内容中获取至少一个超链接信息;
    第二识别模块,配置为基于所述至少一个超链接信息,分别对至少一个第二网站进行识别,所述至少一个第二网站为所述至少一个超链接信息分别所链接的网站。
  10. 如权利要求9所述的装置,其中,所述获取模块包括:
    第一获取单元,配置为从所述第一网站的页面内容中,获取至少一个超链接开始标签和至少一个超链接结束标签;
    第一确定单元,配置为从所述第一网站的页面内容中,将目标超链接开始标签、目标超链接结束标签、以及所述目标超链接开始标签与所述目标超链接结束标签之间的信息确定为超链接信息,所述目标超链接开始标签为所述至少一个超链接开始标签中的任一超链接开始标签,所述目标超链接结束标签与所述目标超链接开始标签对应。
  11. 如权利要求9所述的装置,其中,所述第二识别模块包括:
    第二获取单元,配置为对于所述至少一个超链接信息中的每个超链接信息,从所述超链接信息中获取链接地址信息;
    判断单元,配置为判断存储的恶意信息数据库中是否存在所述链接地 址信息;
    第二确定单元,配置为当所述恶意信息数据库中存在所述链接地址信息时,确定所述超链接信息所链接的第二网站为恶意网站。
  12. 如权利要求11所述的装置,其中,所述第二识别模块还包括:
    识别单元,配置为当所述信息数据库中不存在所述链接地址信息时,根据预设条件,对所述超链接信息所链接的第二网站进行识别。
  13. 如权利要求所述12所述到的装置,其中,所述识别单元包括:
    第一获取子单元,配置为从存储的恶意链接索引库中,获取第三网址信息数目,第三网址信息为第三网站的网址信息,且所述链接地址信息被所述第三网站所引用;
    第一确定子单元,配置为当所述第三网址信息数目大于第一预设数值时,将所述第二网站确定为恶意网站。
  14. 如权利要求所述12所述到的装置,其中,所述识别单元包括:
    第二获取子单元,配置为从存储的恶意链接索引库中,获取第一上下文信息中包括的特定词语数目,所述第一上下文信息为第三网站的页面内容中与所述链接地址信息位于同一显示区域的文字信息;
    第二确定子单元,配置为当所述特定词语数目大于第二预设数值时,将所述第二网站确定为恶意网站。
  15. 如权利要求11-14任一权利要求所述的装置,其中,所述装置还包括:
    第一存储模块,配置为对于所述至少一个第二网站中任一第二网站,当所述第二网站为恶意网站时,将所述第二网站对应的链接地址信息存储到所述恶意信息数据库中。
  16. 如权利要求11-14任一权利要求所述的装置,其中,所述装置还包括:
    第二存储模块,配置为对于所述至少一个第二网站中任一第二网站,当所述第二网站不为恶意网站且所述第一网站为恶意网站时,将所述第二网站对应的链接地址信息、所述第一网址信息和第二上下文信息存储到所述恶意链接索引库中,所述第二上下文信息为所述第一网站的页面内容中与所述第二网站对应的链接地址信息位于同一显示区域的文字信息。
  17. 一种计算机存储介质,该存储介质包括一组指令,当执行所述指令时,引起至少一个处理器执行包括以下的操作:
    当接收到恶意网站查询请求时,基于所述恶意网站查询请求中携带的第一网址信息,对第一网站进行识别得到识别结果;其中,所述第一网址信息为该第一网站的网址信息;
    基于对该第一网站的识别结果,从该第一网站的页面内容中获取至少一个超链接信息;
    基于所述至少一个超链接信息,分别对至少一个第二网站进行识别,所述至少一个第二网站为所述至少一个超链接信息分别所链接的网站。
PCT/CN2017/078650 2016-03-29 2017-03-29 识别恶意网站的方法、装置及计算机存储介质 WO2017167208A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020187014910A KR102090982B1 (ko) 2016-03-29 2017-03-29 악의 웹 사이트 식별 방법, 장치 및 컴퓨터 기억매체
US15/967,232 US10834105B2 (en) 2016-03-29 2018-04-30 Method and apparatus for identifying malicious website, and computer storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610186975.7A CN107239701B (zh) 2016-03-29 2016-03-29 识别恶意网站的方法及装置
CN201610186975.7 2016-03-29

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/967,232 Continuation US10834105B2 (en) 2016-03-29 2018-04-30 Method and apparatus for identifying malicious website, and computer storage medium

Publications (1)

Publication Number Publication Date
WO2017167208A1 true WO2017167208A1 (zh) 2017-10-05

Family

ID=59963514

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/078650 WO2017167208A1 (zh) 2016-03-29 2017-03-29 识别恶意网站的方法、装置及计算机存储介质

Country Status (4)

Country Link
US (1) US10834105B2 (zh)
KR (1) KR102090982B1 (zh)
CN (1) CN107239701B (zh)
WO (1) WO2017167208A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680700A (zh) * 2023-05-18 2023-09-01 北京天融信网络安全技术有限公司 一种风险检测方法、装置、设备及存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108737327B (zh) 2017-04-14 2021-11-16 阿里巴巴集团控股有限公司 拦截恶意网站的方法、装置、***和存储器
US10880330B2 (en) * 2017-05-19 2020-12-29 Indiana University Research & Technology Corporation Systems and methods for detection of infected websites
CN112153043A (zh) * 2020-09-22 2020-12-29 杭州安恒信息技术股份有限公司 一种网站安全检测方法、装置、电子设备和存储介质
CN113051876B (zh) * 2021-04-02 2024-04-23 杭州网易智企科技有限公司 恶意网址识别方法及装置、存储介质、电子设备
TWI777766B (zh) * 2021-09-10 2022-09-11 中華電信股份有限公司 偵測惡意網域查詢行為的系統及方法
CN114065092A (zh) * 2021-11-10 2022-02-18 奇安信科技集团股份有限公司 网站识别方法、装置、计算机设备和存储介质
CN115459946A (zh) * 2022-08-02 2022-12-09 广州市玄武无线科技股份有限公司 一种异常网页的识别方法、装置、设备和计算机存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102647417A (zh) * 2012-03-31 2012-08-22 奇智软件(北京)有限公司 网络访问的实现方法、装置和***、以及网络***
CN102833258A (zh) * 2012-08-31 2012-12-19 北京奇虎科技有限公司 网址访问方法及***
US8474048B2 (en) * 2008-07-21 2013-06-25 F-Secure Oyj Website content regulation
CN103428183A (zh) * 2012-05-23 2013-12-04 北京新媒传信科技有限公司 恶意网址的识别方法和装置
CN103701779A (zh) * 2013-12-13 2014-04-02 北京神州绿盟信息安全科技股份有限公司 一种二次访问网站的方法、装置及防火墙设备

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100704000B1 (ko) * 2006-04-18 2007-04-05 주식회사 소프트런 인터넷 접속 사이트 분석을 통한 피싱 방지 방법 및 그방법에 대한 컴퓨터 프로그램 소스를 저장한 기록매체
US7865953B1 (en) * 2007-05-31 2011-01-04 Trend Micro Inc. Methods and arrangement for active malicious web pages discovery
US9298824B1 (en) * 2010-07-07 2016-03-29 Symantec Corporation Focused crawling to identify potentially malicious sites using Bayesian URL classification and adaptive priority calculation
KR101430175B1 (ko) * 2011-09-23 2014-08-14 한전케이디엔주식회사 개인정보 유출 검색 시스템 및 방법
CN102332028B (zh) * 2011-10-15 2013-08-28 西安交通大学 一种面向网页的不良Web内容识别方法
CN102571768B (zh) * 2011-12-26 2014-11-26 北京大学 一种钓鱼网站检测方法
CN103685174B (zh) * 2012-09-07 2016-12-21 中国科学院计算机网络信息中心 一种不依赖样本的钓鱼网站检测方法
US8943588B1 (en) * 2012-09-20 2015-01-27 Amazon Technologies, Inc. Detecting unauthorized websites
CN103856442B (zh) * 2012-11-30 2016-08-17 腾讯科技(深圳)有限公司 一种黑链检测方法、装置和***
CN103902889A (zh) * 2012-12-26 2014-07-02 腾讯科技(深圳)有限公司 一种恶意消息云检测方法和服务器
CN103530562A (zh) * 2013-10-23 2014-01-22 腾讯科技(深圳)有限公司 一种恶意网站的识别方法和装置
CN104125209B (zh) 2014-01-03 2015-09-09 腾讯科技(深圳)有限公司 恶意网址提示方法和路由器
CN104811418B (zh) * 2014-01-23 2019-04-12 腾讯科技(深圳)有限公司 病毒检测的方法及装置
CN104766014B (zh) * 2015-04-30 2017-12-01 安一恒通(北京)科技有限公司 用于检测恶意网址的方法和***

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8474048B2 (en) * 2008-07-21 2013-06-25 F-Secure Oyj Website content regulation
CN102647417A (zh) * 2012-03-31 2012-08-22 奇智软件(北京)有限公司 网络访问的实现方法、装置和***、以及网络***
CN103428183A (zh) * 2012-05-23 2013-12-04 北京新媒传信科技有限公司 恶意网址的识别方法和装置
CN102833258A (zh) * 2012-08-31 2012-12-19 北京奇虎科技有限公司 网址访问方法及***
CN103701779A (zh) * 2013-12-13 2014-04-02 北京神州绿盟信息安全科技股份有限公司 一种二次访问网站的方法、装置及防火墙设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680700A (zh) * 2023-05-18 2023-09-01 北京天融信网络安全技术有限公司 一种风险检测方法、装置、设备及存储介质

Also Published As

Publication number Publication date
KR20180074774A (ko) 2018-07-03
CN107239701A (zh) 2017-10-10
US10834105B2 (en) 2020-11-10
KR102090982B1 (ko) 2020-03-19
CN107239701B (zh) 2020-06-26
US20180248898A1 (en) 2018-08-30

Similar Documents

Publication Publication Date Title
WO2017167208A1 (zh) 识别恶意网站的方法、装置及计算机存储介质
WO2019140828A1 (zh) 电子装置、分布式***日志查询方法及存储介质
US9491182B2 (en) Methods and systems for secure internet access and services
JP6599906B2 (ja) ログインアカウントのプロンプト
WO2021143497A1 (zh) 一种基于存证区块链的侵权存证方法、装置及设备
CN104125209B (zh) 恶意网址提示方法和路由器
US20150074289A1 (en) Detecting error pages by analyzing server redirects
US9304979B2 (en) Authorized syndicated descriptions of linked web content displayed with links in user-generated content
CN109768992B (zh) 网页恶意扫描处理方法及装置、终端设备、可读存储介质
WO2016201819A1 (zh) 检测恶意文件的方法和装置
US20150143215A1 (en) Method and system for accessing audio/video community virtual rooms
WO2018001078A1 (zh) 一种url匹配方法、装置及存储介质
CN106302595B (zh) 一种对服务器进行健康检查的方法及设备
WO2013143403A1 (zh) 一种访问网站的方法和***
CN110943961A (zh) 数据处理方法、设备以及存储介质
US9954880B2 (en) Protection via webpage manipulation
US20210311927A1 (en) Systems and methods for locating application specific data
WO2019076014A1 (zh) 网页生成方法、装置、终端设备及介质
CN108667840B (zh) 注入漏洞检测方法及装置
WO2015081848A1 (zh) 社交化扩展搜索方法及相应的装置、***
WO2014154095A1 (zh) 网站认证信息的显示方法及浏览器
CN108900554B (zh) Http协议资产检测方法、***、设备及计算机介质
WO2020168757A1 (zh) 网络***访问方法、装置、计算机设备及可读存储介质
CN110929185B (zh) 网站目录检测方法、装置、计算机设备及计算机存储介质
US8910281B1 (en) Identifying malware sources using phishing kit templates

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 20187014910

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17773238

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17773238

Country of ref document: EP

Kind code of ref document: A1