CN111488621A - Method and system for detecting falsified webpage, electronic equipment and storage medium - Google Patents

Method and system for detecting falsified webpage, electronic equipment and storage medium Download PDF

Info

Publication number
CN111488621A
CN111488621A CN201910074357.7A CN201910074357A CN111488621A CN 111488621 A CN111488621 A CN 111488621A CN 201910074357 A CN201910074357 A CN 201910074357A CN 111488621 A CN111488621 A CN 111488621A
Authority
CN
China
Prior art keywords
detected
link
webpage
web page
scoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910074357.7A
Other languages
Chinese (zh)
Inventor
杨荣海
黄志伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN201910074357.7A priority Critical patent/CN111488621A/en
Publication of CN111488621A publication Critical patent/CN111488621A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Bioethics (AREA)
  • Storage Device Security (AREA)

Abstract

The application discloses a method and a system for detecting a tampered webpage, an electronic device and a computer readable storage medium, wherein the method comprises the following steps: extracting a link to be detected from a webpage to be detected; scoring the link to be detected according to the characteristic information of the link to be detected to obtain a link score; judging whether the link score is larger than a preset threshold value or not; if the link score is larger than the preset threshold value, detecting whether the webpage corresponding to the link to be detected is a malicious webpage or not; and if the webpage corresponding to the link to be detected is detected to be a malicious webpage, judging the webpage to be detected to be a tampered webpage. According to the webpage tampering detection method, the links to be detected are filtered by using the link scores of the links to be detected, so that only the webpages corresponding to the links to be detected with the link scores larger than the preset threshold value are required to be detected, the workload of webpage detection is reduced, and the webpage tampering detection efficiency is improved.

Description

Method and system for detecting falsified webpage, electronic equipment and storage medium
Technical Field
The present application relates to the field of network security technologies, and in particular, to a method and a system for detecting a tampered web page, an electronic device, and a computer-readable storage medium.
Background
Black Hat SEO (Black Hat Search Engine Optimization) cheats the Search Engine by using a cheating means, so that the ranking of the target website in the Search Engine is illegally promoted. In order to increase the ranking of malicious websites in search engines, Black Hat SEO is a common and effective cheating way for attackers. Malicious texts and malicious links are inserted into the webpages by invading the high-weight websites in batches, so that the ranking of the malicious links in a search engine is improved, and benefits are obtained.
With the upgrade of detection technology, hackers began to adopt the more covert Black Hat SEO technology. An upgraded attack technique is: the number of sensitive words in the tampered webpage is reduced, and even only malicious links are inserted. In order to deal with tampering of malicious links, the existing detection technology mainly includes building a blacklist, matching web pages to be detected by using a large number of pre-collected malicious domain names, and judging whether the web pages to be detected have the malicious domain names in the blacklist. And if the matching is successful, indicating that the webpage to be detected is tampered. In such a method, each web page needs to be detected separately, so that the detection efficiency of tampering with the web page is low.
Therefore, how to solve the above problems is a great concern for those skilled in the art.
Disclosure of Invention
The application aims to provide a method and a system for detecting a tampered webpage, an electronic device and a computer readable storage medium, so that the detection efficiency of the tampered webpage is improved.
In order to achieve the above object, the present application provides a method for detecting a tampered web page, including:
extracting a link to be detected from a webpage to be detected;
scoring the link to be detected according to the characteristic information of the link to be detected to obtain a link score;
judging whether the link score is larger than a preset threshold value or not;
if the link score is larger than the preset threshold value, detecting whether the webpage corresponding to the link to be detected is a malicious webpage or not;
and if the webpage corresponding to the link to be detected is detected to be a malicious webpage, judging the webpage to be detected to be a tampered webpage.
Optionally, the feature information includes:
at least one of the lexical characteristics of the link to be detected, the text content corresponding to the link to be detected, and the position of the link to be detected.
Optionally, when the feature information is the lexical feature, scoring the link to be detected according to the feature information of the link to be detected includes:
and scoring the links to be detected according to the irregular letter combination number in the links to be detected.
Optionally, when the feature information is the text content, scoring the link to be detected according to the feature information of the link to be detected includes:
and scoring the link to be detected according to the sensitive vocabulary number in the text content.
Optionally, when the feature information is the location, scoring the link to be detected according to the feature information of the link to be detected includes:
and acquiring the position of the link to be detected, and grading the link to be detected according to the type of the position.
Optionally, when the feature information used when scoring the link to be detected is multiple, scoring the link to be detected according to the feature information of the link to be detected includes:
acquiring a weight coefficient preset for each type of the characteristic information, and acquiring a basic score corresponding to each type of the characteristic information;
and scoring the links to be detected according to a weighting calculation method by using the weight coefficients and the basic scores.
Optionally, the method further includes:
identifying the importance degree of the webpage to be detected;
and if the importance degree is higher than the preset importance degree, reducing the size of the preset threshold value.
Optionally, before the link to be detected is extracted from the web page to be detected, the method further includes:
acquiring a target webpage and extracting external link information of the target webpage;
matching the external link information by using a preset blacklist;
and if any external link information is successfully matched with the preset blacklist, determining the target webpage as the webpage to be detected.
Optionally, after the web page to be detected is determined as a tampered web page, the method further includes:
generating and displaying alarm information;
or recording the basic information of the tampered webpage into a log so as to obtain the tampered webpage by inquiring the log.
In order to achieve the above object, the present application provides a tamper webpage detection system, including:
the link extraction module is used for extracting the link to be detected from the webpage to be detected;
the link scoring module is used for scoring the link to be detected according to the characteristic information of the link to be detected to obtain a link score;
the score judging module is used for judging whether the link score is larger than a preset threshold value or not;
the webpage detection module is used for detecting whether the webpage corresponding to the link to be detected is a malicious webpage or not if the link score is larger than the preset threshold value;
and the webpage judging module is used for judging the webpage to be detected as a tampered webpage if the webpage corresponding to the link to be detected is detected to be a malicious webpage.
To achieve the above object, the present application provides an electronic device including:
a memory for storing a computer program;
a processor for implementing the steps of the falsified web page detection method according to any of the foregoing disclosures when executing the computer program.
To achieve the above object, the present application provides a computer-readable storage medium, having a computer program stored thereon, where the computer program is executed by a processor to implement the steps of the falsified web page detection method according to any one of the foregoing disclosures.
According to the scheme, the method for detecting the tampered webpage comprises the following steps: extracting a link to be detected from a webpage to be detected; scoring the link to be detected according to the characteristic information of the link to be detected to obtain a link score; judging whether the link score is larger than a preset threshold value or not; if the link score is larger than the preset threshold value, detecting whether the webpage corresponding to the link to be detected is a malicious webpage or not; and if the webpage corresponding to the link to be detected is detected to be a malicious webpage, judging the webpage to be detected to be a tampered webpage. According to the webpage tampering detection method, the links to be detected are filtered by using the link scores of the links to be detected, so that only the webpages corresponding to the links to be detected with the link scores larger than the preset threshold value are required to be detected, the workload of webpage detection is reduced, and the webpage tampering detection efficiency is improved.
The application also discloses a system for detecting the falsification of the webpage, an electronic device and a computer readable storage medium, and the technical effects can be realized.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a method for detecting a tampered web page disclosed in an embodiment of the present application;
FIG. 2 is a flowchart of another method for detecting a tampered web page disclosed in the embodiments of the present application;
fig. 3 is a flowchart of another method for detecting a tampered web page disclosed in the embodiment of the present application;
fig. 4 is a flowchart of another tamper webpage detection method disclosed in the embodiment of the present application;
fig. 5 is a flowchart of a specific method for detecting a tampered web page disclosed in the embodiment of the present application;
fig. 6 is a structural diagram of a tamper webpage detection system disclosed in an embodiment of the present application;
fig. 7 is a block diagram of an electronic device disclosed in an embodiment of the present application;
fig. 8 is a block diagram of another electronic device disclosed in the embodiments of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the prior art, in order to deal with tampering of malicious links, a blacklist is mainly constructed for detection: and matching the web page to be detected by using a large number of pre-collected malicious domain names, and judging whether the web page to be detected has the malicious domain name in the blacklist. And if the matching is successful, indicating that the webpage to be detected is tampered. In such a method, each web page needs to be detected separately, so that the detection efficiency of tampering with the web page is low.
Therefore, the embodiment of the application discloses a method for detecting a tampered webpage, and the detection efficiency of the tampered webpage is improved.
Referring to fig. 1, a flowchart of a method for detecting a tampered web page disclosed in an embodiment of the present application is shown in fig. 1, and includes:
s101: extracting a link to be detected from a webpage to be detected;
s102: grading the link to be detected according to the characteristic information of the link to be detected to obtain a link grade;
in this embodiment, the link to be detected is extracted from the web page to be detected, and the link is scored according to the characteristic information of the link to be detected, so as to obtain a corresponding link score.
S103: judging whether the link score is larger than a preset threshold value or not;
s104: if the link score is larger than a preset threshold value, detecting whether the webpage corresponding to the link to be detected is a malicious webpage or not;
further, the comparison result is obtained by comparing the link score with a preset threshold value. In this embodiment, if the comparison result is that the link score is greater than the preset threshold, it is characterized that the web page corresponding to the link to be detected needs to be further detected, so as to determine whether the web page corresponding to the link to be detected is a malicious web page.
It can be understood that by scoring the links to be detected, only the web pages corresponding to the links to be detected with the link score larger than the preset threshold can be detected, a large number of harmless normal web pages can be filtered, and the detection workload is reduced.
It should be noted that the link score is used as a judgment basis for judging whether to detect the web page corresponding to the link to be detected, and the link score is used in this embodiment to represent the possibility that the web page corresponding to the link to be detected is a malicious web page. In another specific implementation manner, the link score may be used to represent the possibility that the web page corresponding to the link to be detected is a normal web page, that is, if the link score is higher, the web page is less likely to be a malicious web page, and at this time, the web page corresponding to the link to be detected whose link score is smaller than a preset threshold needs to be detected.
S105: and if the webpage corresponding to the link to be detected is detected to be the malicious webpage, judging the webpage to be detected to be the tampered webpage.
In the step, if the webpage corresponding to the link to be detected is detected to be a malicious webpage, the webpage to be detected is judged to be a tampered webpage; in addition, if the detected web page is a non-malicious web page, the detected web page is determined to be a normal web page.
According to the scheme, the method for detecting the tampered webpage comprises the following steps: extracting a link to be detected from a webpage to be detected; scoring the link to be detected according to the characteristic information of the link to be detected to obtain a link score; judging whether the link score is larger than a preset threshold value or not; if the link score is larger than the preset threshold value, detecting whether the webpage corresponding to the link to be detected is a malicious webpage or not; and if the webpage corresponding to the link to be detected is detected to be a malicious webpage, judging the webpage to be detected to be a tampered webpage. According to the webpage tampering detection method, the links to be detected are filtered by using the link scores of the links to be detected, so that only the webpages corresponding to the links to be detected with the link scores larger than the preset threshold value are required to be detected, the workload of webpage detection is reduced, and the webpage tampering detection efficiency is improved.
Further, the feature information involved in the above embodiments may include: at least one of the lexical characteristics of the link to be detected, the text content corresponding to the link to be detected, and the position of the link to be detected. Correspondingly, the embodiment of the application discloses another method for detecting a tampered webpage, and compared with the previous embodiment, the method for detecting a tampered webpage is further explained and optimized for determining the link score. Specifically, the method comprises the following steps:
referring to fig. 2, a flowchart of another method for detecting a tampered web page according to an embodiment of the present application is shown in fig. 2, and includes:
s201: extracting a link to be detected from a webpage to be detected;
s202: acquiring characteristic information of a link to be detected;
s203: when the characteristic information is the lexical characteristic of the link to be detected, scoring the link to be detected according to the irregular letter combination number in the link to be detected to obtain a link score;
it can be understood that when the link is composed of a plurality of irregular letters, the link is often highly suspicious, and for this reason, in this embodiment, when the feature information is a lexical feature, that is, when the corresponding score is determined based on the lexical feature of the link to be detected, the number of irregular letter combinations in the link to be detected may be obtained, so as to determine the link score according to the number of combinations, and specifically, the number of letter combinations may be determined using deep learning.
S204: when the characteristic information is the text content corresponding to the link to be detected, scoring the link to be detected according to the number of sensitive words in the text content to obtain a link score;
when the text content corresponding to the link to be detected is used for scoring the link, sensitive words in the text content can be obtained, and corresponding scoring is determined according to the number of the sensitive words.
S205: when the characteristic information is the position of the link to be detected, acquiring the position, and grading the link to be detected according to the type of the position to obtain a link grade;
it can be understood that, in this embodiment, the type of the position where the link to be detected is located may be used to determine a corresponding score for the link to be detected, and in a specific implementation, after the type of the tag position where the link to be detected appears is obtained, the link to be detected is scored according to the importance or the suspiciousness of the type of the tag position.
S206: judging whether the link score is larger than a preset threshold value or not;
s207: if the link score is larger than a preset threshold value, detecting whether the webpage corresponding to the link to be detected is a malicious webpage or not;
s208: and if the webpage corresponding to the link to be detected is detected to be the malicious webpage, judging the webpage to be detected to be the tampered webpage.
On the basis of the above embodiment, as a preferred implementation manner, when the links to be detected are scored, the multiple pieces of feature information are used for comprehensive consideration, so as to obtain more comprehensive link scoring. Specifically, the method comprises the following steps:
referring to fig. 3, a flowchart of another method for detecting a tampered web page according to an embodiment of the present application is shown in fig. 3, and includes:
s301: extracting a link to be detected from a webpage to be detected;
s302: acquiring characteristic information of a link to be detected;
s303: acquiring a weight coefficient preset for each type of feature information, and acquiring a basic score corresponding to each type of feature information;
s304: scoring the links to be detected according to a weighting calculation method by using the weight coefficients and the basic scores to obtain the scores of the links;
in this embodiment, when determining the link score of the link to be detected based on the preset scoring rule, the weighting coefficient preset for each type of feature information is obtained, the basic score corresponding to each type of feature information is determined, and the weighting calculation is further performed by using the weighting coefficient and the basic score to obtain the comprehensive link score.
In this embodiment, when weights are set for each type of feature information in advance, the weights may be determined according to the importance level order, the suspicious level order, or the comprehensive consideration of the importance level order and the suspicious level order.
In a specific example, weights are respectively determined for lexical features of the link to be detected, text content corresponding to the link to be detected, and a location of the link to be detected, for example, for link http: nf, then the link is more suspicious and should be weighted more heavily; if the position of the link to be detected is the < a > tag, checking whether the corresponding anchor text has sensitive content, if the position of the link to be detected is other than the < a > tag, checking whether the surrounding text has sensitive content, if the sensitive content exists, the suspiciousness is higher, and a larger weight is obtained; if the link to be detected appears in a more important or more abnormal tag position, such as < title > tag, < meta > tag, < display: none > tag, etc., then the link to be detected at the tag position should obtain a larger weight.
S305: judging whether the link score is larger than a preset threshold value or not;
s306: if the link score is larger than a preset threshold value, detecting whether the webpage corresponding to the link to be detected is a malicious webpage or not;
s307: and if the webpage corresponding to the link to be detected is detected to be the malicious webpage, judging the webpage to be detected to be the tampered webpage.
On the basis that the embodiment scores the links to be detected based on the characteristic information of the links to be detected, the embodiment may further optimize the scoring mechanism based on the importance degree of the web pages to be detected, specifically, identify the importance degree of the web pages to be detected, and determine whether the importance degree of the current web pages to be detected is higher than the preset importance degree, and if the importance degree is higher than the preset importance degree, reduce the size of the preset threshold. For example, if the importance degree of the web page to be detected is 5, the preset importance degree is 3, and the preset threshold value is 10, at this time, the importance degree is higher than the preset importance degree, and the preset threshold value is reduced to 8. Specifically, the corresponding relationship between the importance degree and the preset threshold reduction amount may be preset, and the present invention is not limited thereto.
It can be understood that, in this embodiment, the importance of the web page to be detected is identified, and the preset threshold corresponding to the more important web page (for example, the top page) is reduced, so that the important web page to be detected more easily meets the preset threshold, and thus, the detection can be performed more likely, and adverse consequences caused by missed detection of the important web page are prevented.
Further, the process of detecting whether the web page corresponding to the link to be detected is a malicious web page is further described:
in a specific embodiment, whether the web page corresponding to the link to be detected is a malicious web page may be determined according to whether the matching number is greater than a preset maximum matching value, specifically: crawling content information of high-weight external links in the links to be detected, and performing matching operation on webpage information by using a preset information base so as to detect whether a large amount of malicious content larger than a preset maximum matching value appears in the webpage. The preset information base comprises preset keyword information and/or preset character string information, and the keyword information and the character string information are pre-collected black words or sensitive contents.
In consideration of work efficiency, as a preferred embodiment, the embodiment designs the web crawler in a multi-thread manner when crawling the web page, so as to crawl the web page in a parallel manner.
It can be understood that, in the embodiment, the web page content is captured by the web crawler, whether the web page to be detected is a malicious web page is further determined, the detection rate of web page tampering is improved, and a pure-link web page tampering mode can be simultaneously covered.
In another specific implementation, after the crawling operation is performed on the high-weight outer link in the link to be detected, the pre-trained neural network can be used for identifying the web page to obtain an identification result.
Specifically, in the embodiment, a large number of training samples are collected in advance, and a neural network is obtained through training based on the training samples, so that the neural network is used to identify the web page corresponding to the high-weight external link, to judge whether the web page corresponding to the high-weight external link is a malicious web page, and to further determine whether the web page to be detected inserted into the current external link is a malicious web page. In addition, after the recognition result of the current webpage is obtained by using the neural network, the embodiment can further update the neural network by using the information and the recognition result of the current webpage, so that the detection accuracy of the neural network is higher and higher.
On the basis of any of the above embodiments, in the embodiments of the present application, before the link to be detected is extracted from the web page to be detected, the target web page is first filtered, so that the web page meeting the preset condition in the target web page is determined as the web page to be detected, and the web page to be detected is analyzed and detected, thereby achieving the purpose of primary screening. Therefore, the embodiment of the present application discloses another method for detecting a tampered web page, and compared with the previous embodiment, the embodiment further describes and optimizes the technical scheme. Specifically, the method comprises the following steps:
referring to fig. 4, a flowchart of another method for detecting a tampered web page according to an embodiment of the present application is shown in fig. 4, and includes:
s401: acquiring a target webpage and extracting external link information of the target webpage;
s402: performing matching operation on the external link information by using a preset blacklist;
s403: if any external link information is successfully matched with the preset blacklist, determining the target webpage as the webpage to be detected;
the process of filtering the target webpage comprises the following steps: and acquiring the external link information of the target webpage, and matching the preset blacklist with the external link information. The external link refers to a link in a webpage pointing to a non-local website.
It is understood that the preset blacklist can be constructed by way of manual collection. And matching the preset blacklist and the external link information, and determining the target webpage as the webpage to be detected if any information in the external link information of the current target webpage is successfully matched with the preset blacklist and the possibility that the malicious webpage exists in the current target webpage is indicated.
S404: extracting a link to be detected from a webpage to be detected;
s405: grading the link to be detected according to the characteristic information of the link to be detected to obtain a link grade;
s406: judging whether the link score is larger than a preset threshold value or not;
s407: if the link score is larger than a preset threshold value, detecting whether the webpage corresponding to the link to be detected is a malicious webpage or not;
s408: if the webpage corresponding to the link to be detected is detected to be a malicious webpage, the webpage to be detected is judged to be a tampered webpage, and the external link information corresponding to the webpage is updated to a preset blacklist.
In this embodiment, if the detected web page to be detected is a tampered web page, the external link information corresponding to the tampered web page is used to update the preset blacklist. It can be understood that through the continuous web page detection process, the blacklist can be continuously updated, so that the screening of the target web pages by using the blacklist has higher and higher accuracy.
Further, after the web page to be detected is determined as a tampered web page, the embodiment may further generate corresponding warning information and display the warning information, so that an administrator performs corresponding operations. In addition, basic information of the tampered web page can be recorded in a log, for example, the detection time of the tampered web page and the tampering information are recorded, so that an administrator can acquire the detected tampering web page information by inquiring the history log.
In a specific implementation manner, before the link to be detected is extracted from the web page to be detected, the embodiment may further match the external link information by using a preset white list, so as to identify whether the target web page is a normal web page, remove the normal web page, and improve the detection rate of the tampered web page.
Specifically, the preset white list may be constructed by obtaining domain names with top ranking of traffic in Alexa, or by collecting sT L D sponsorship top-level domain names.
In this embodiment, if all the external link information of the target webpage is successfully matched with the preset white list, the target webpage is characterized as a normal webpage, further detection on the normal webpage is not needed, and the remaining target webpages except the normal webpage in all the target webpages are determined as the webpages to be detected, so that the webpages to be detected are detected.
Further, if the target webpage is a normal webpage, updating a preset white list of the external link information corresponding to the webpage so that the white list has higher and higher accuracy.
In specific implementation, the preset blacklist and the preset white list can be used to match with the external link information of the target webpage respectively, as shown in fig. 5, if any one of the external link information is successfully matched with the preset blacklist, the target webpage is determined as the webpage to be detected; if all the information of the external link information is successfully matched with the preset white list, determining the target webpage as a normal webpage; in addition, if the external link information fails to match both the preset blacklist and the preset white list, the target webpage is represented as an unknown webpage, and whether the target webpage is a normal webpage cannot be accurately determined.
On the basis of any of the above embodiments, as a preferred implementation manner, before the link to be detected is extracted from the web page to be detected, the embodiment further determines whether the link to be detected conforms to a preset link format.
It is understood that, in the specific implementation, the preset link format may be set according to a general link format specification. Specifically, if the link to be detected conforms to the preset link format, the link to be detected is an effective link, and the link to be detected is extracted from the web page to be detected; and if the link to be detected does not conform to the preset link format, the link to be detected is an invalid link, the webpage to be detected is directly judged as an invalid webpage, and the invalid webpage does not need to be analyzed. In the embodiment, before the links to be detected are extracted from the web pages to be detected, the links to be detected are screened in advance by using the preset link format, so that invalid links can be eliminated, and the link detection efficiency is improved.
In the following, a tampered web page detection system provided in an embodiment of the present application is introduced, and a tampered web page detection system described below and a tampered web page detection method described above may refer to each other.
Referring to fig. 6, a structure diagram of a tamper webpage detection system provided in an embodiment of the present application is shown in fig. 6, and includes:
the link extraction module 100 is used for extracting the link to be detected from the webpage to be detected;
the link scoring module 200 is configured to score the link to be detected according to the characteristic information of the link to be detected, so as to obtain a link score;
a score judging module 300, configured to judge whether the link score is greater than a preset threshold;
the web page detection module 400 is configured to detect whether a web page corresponding to the link to be detected is a malicious web page if the link score is greater than the preset threshold;
the web page determining module 500 is configured to determine that the web page to be detected is a tampered web page if it is detected that the web page corresponding to the link to be detected is a malicious web page.
The present application further provides an electronic device, referring to fig. 7, an embodiment of the present application provides a structure diagram of an electronic device, as shown in fig. 7, including:
a memory 11 for storing a computer program;
the processor 12, when executing the computer program, may implement the steps provided by the above embodiments.
Specifically, the memory 11 includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and computer-readable instructions, and the internal memory provides an environment for the operating system and the computer-readable instructions in the non-volatile storage medium to run. Processor 12, which in some embodiments may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data processing chip, provides computing and control capabilities for the electronic device.
On the basis of the above embodiment, as a preferred implementation, referring to fig. 8, the electronic device further includes:
and the input interface 13 is connected with the processor 12 and is used for acquiring computer programs, parameters and instructions imported from the outside, and storing the computer programs, parameters and instructions into the memory 11 under the control of the processor 12. The input interface 13 may be connected to an input device for receiving parameters or instructions manually input by a user. The input device may be a touch layer covered on a display screen, or a button, a track ball or a touch pad arranged on a terminal shell, or a keyboard, a touch pad or a mouse, etc.
The display unit 14 is connected to the processor 12, and is used for displaying data processed by the processor 12 and displaying a visual user interface, the display unit 14 may be a L ED display, a liquid crystal display, a touch-sensitive liquid crystal display, an O L ED (Organic L light-Emitting Diode) touch device, and the like.
And the network port 15 is connected with the processor 12 and is used for carrying out communication connection with external terminal equipment, wherein the communication technology adopted by the communication connection can be a wired communication technology or a wireless communication technology, such as mobile high-definition link technology (MH L), Universal Serial Bus (USB), high-definition multimedia interface (HDMI), wireless fidelity technology (WiFi), Bluetooth communication technology, low-power Bluetooth communication technology, IEEE802.11s-based communication technology and the like.
Fig. 8 shows only an electronic device with components 11-15, and those skilled in the art will appreciate that the structure shown in fig. 8 does not constitute a limitation of the electronic device, and may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
The present application also provides a computer-readable storage medium, which may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk. The storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the tamper webpage detection method disclosed above.
According to the webpage tampering detection method, the links to be detected are filtered by using the link scores of the links to be detected, so that only the webpages corresponding to the links to be detected with the link scores larger than the preset threshold value are required to be detected, the workload of webpage detection is reduced, and the webpage tampering detection efficiency is improved.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (12)

1. A method for detecting a tampered webpage is characterized by comprising the following steps:
extracting a link to be detected from a webpage to be detected;
scoring the link to be detected according to the characteristic information of the link to be detected to obtain a link score;
judging whether the link score is larger than a preset threshold value or not;
if the link score is larger than the preset threshold value, detecting whether the webpage corresponding to the link to be detected is a malicious webpage or not;
and if the webpage corresponding to the link to be detected is detected to be a malicious webpage, judging the webpage to be detected to be a tampered webpage.
2. The method according to claim 1, wherein the characteristic information includes:
at least one of the lexical characteristics of the link to be detected, the text content corresponding to the link to be detected, and the position of the link to be detected.
3. The method for detecting falsification of the web page according to claim 2, wherein when the feature information is the lexical feature, scoring the link to be detected according to the feature information of the link to be detected includes:
and scoring the links to be detected according to the irregular letter combination number in the links to be detected.
4. The method for detecting falsification of the web page according to claim 2, wherein when the feature information is the text content, scoring the link to be detected according to the feature information of the link to be detected includes:
and scoring the link to be detected according to the sensitive vocabulary number in the text content.
5. The method for detecting falsification of the web page according to claim 2, wherein when the feature information is the location, scoring the link to be detected according to the feature information of the link to be detected includes:
and acquiring the position of the link to be detected, and grading the link to be detected according to the type of the position.
6. The method for detecting falsification of web pages according to claim 2, wherein when the number of the feature information used for scoring the links to be detected is plural, scoring the links to be detected according to the feature information of the links to be detected includes:
acquiring a weight coefficient preset for each type of the characteristic information, and acquiring a basic score corresponding to each type of the characteristic information;
and scoring the links to be detected according to a weighting calculation method by using the weight coefficients and the basic scores.
7. The method for detecting falsified web page according to claim 1, further comprising:
identifying the importance degree of the webpage to be detected;
and if the importance degree is higher than the preset importance degree, reducing the size of the preset threshold value.
8. The method for detecting falsified web page according to any one of claims 1 to 7, before the step of extracting the link to be detected from the web page to be detected, further comprising:
acquiring a target webpage and extracting external link information of the target webpage;
matching the external link information by using a preset blacklist;
and if any external link information is successfully matched with the preset blacklist, determining the target webpage as the webpage to be detected.
9. The method for detecting the falsified web page according to claim 8, after the web page to be detected is determined as the falsified web page, further comprising:
generating and displaying alarm information;
or recording the basic information of the tampered webpage into a log so as to obtain the tampered webpage by inquiring the log.
10. A tamper webpage detection system, comprising:
the link extraction module is used for extracting the link to be detected from the webpage to be detected;
the link scoring module is used for scoring the link to be detected according to the characteristic information of the link to be detected to obtain a link score;
the score judging module is used for judging whether the link score is larger than a preset threshold value or not;
the webpage detection module is used for detecting whether the webpage corresponding to the link to be detected is a malicious webpage or not if the link score is larger than the preset threshold value;
and the webpage judging module is used for judging the webpage to be detected as a tampered webpage if the webpage corresponding to the link to be detected is detected to be a malicious webpage.
11. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the tamper webpage detection method according to any one of claims 1 to 9 when executing the computer program.
12. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method for detecting a falsified web page according to any one of claims 1 to 9.
CN201910074357.7A 2019-01-25 2019-01-25 Method and system for detecting falsified webpage, electronic equipment and storage medium Pending CN111488621A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910074357.7A CN111488621A (en) 2019-01-25 2019-01-25 Method and system for detecting falsified webpage, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910074357.7A CN111488621A (en) 2019-01-25 2019-01-25 Method and system for detecting falsified webpage, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111488621A true CN111488621A (en) 2020-08-04

Family

ID=71793954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910074357.7A Pending CN111488621A (en) 2019-01-25 2019-01-25 Method and system for detecting falsified webpage, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111488621A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113656671A (en) * 2021-06-16 2021-11-16 北京百度网讯科技有限公司 Model training method, link scoring method, device, equipment, medium and product
CN115130104A (en) * 2022-07-15 2022-09-30 深圳安巽科技有限公司 Comprehensive judgment method, system and storage medium for malicious website

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436563A (en) * 2011-12-30 2012-05-02 奇智软件(北京)有限公司 Method and device for detecting page tampering
CN102622435A (en) * 2012-02-29 2012-08-01 百度在线网络技术(北京)有限公司 Method and device for detecting black chain
CN103856442A (en) * 2012-11-30 2014-06-11 腾讯科技(深圳)有限公司 Black chain detection method, apparatus and system
CN104036190A (en) * 2014-05-16 2014-09-10 北京奇虎科技有限公司 Method and device for detecting page tampering
CN104537303A (en) * 2014-12-30 2015-04-22 中国科学院深圳先进技术研究院 Distinguishing system and method for phishing website
CN104598595A (en) * 2015-01-23 2015-05-06 安一恒通(北京)科技有限公司 Fraud webpage detection method and corresponding device
CN105306462A (en) * 2015-10-13 2016-02-03 郑州悉知信息科技股份有限公司 Web page link detecting method and device
CN111538929A (en) * 2020-07-08 2020-08-14 腾讯科技(深圳)有限公司 Network link identification method and device, storage medium and electronic equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436563A (en) * 2011-12-30 2012-05-02 奇智软件(北京)有限公司 Method and device for detecting page tampering
CN102622435A (en) * 2012-02-29 2012-08-01 百度在线网络技术(北京)有限公司 Method and device for detecting black chain
CN103856442A (en) * 2012-11-30 2014-06-11 腾讯科技(深圳)有限公司 Black chain detection method, apparatus and system
CN104036190A (en) * 2014-05-16 2014-09-10 北京奇虎科技有限公司 Method and device for detecting page tampering
CN104537303A (en) * 2014-12-30 2015-04-22 中国科学院深圳先进技术研究院 Distinguishing system and method for phishing website
CN104598595A (en) * 2015-01-23 2015-05-06 安一恒通(北京)科技有限公司 Fraud webpage detection method and corresponding device
CN105306462A (en) * 2015-10-13 2016-02-03 郑州悉知信息科技股份有限公司 Web page link detecting method and device
CN111538929A (en) * 2020-07-08 2020-08-14 腾讯科技(深圳)有限公司 Network link identification method and device, storage medium and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113656671A (en) * 2021-06-16 2021-11-16 北京百度网讯科技有限公司 Model training method, link scoring method, device, equipment, medium and product
CN113656671B (en) * 2021-06-16 2024-05-24 北京百度网讯科技有限公司 Model training method, link scoring method, device, equipment, medium and product
CN115130104A (en) * 2022-07-15 2022-09-30 深圳安巽科技有限公司 Comprehensive judgment method, system and storage medium for malicious website

Similar Documents

Publication Publication Date Title
CN107547555B (en) Website security monitoring method and device
CN107566391B (en) Method for detecting webpage dark chain by constructing machine learning model through domain identification and theme identification
US9003537B2 (en) CVSS information update by analyzing vulnerability information
CN102436563B (en) Method and device for detecting page tampering
CN108566399B (en) Phishing website identification method and system
CN106685936B (en) Webpage tampering detection method and device
US20150324478A1 (en) Detection method and scanning engine of web pages
CN109922065B (en) Quick identification method for malicious website
CN102446255B (en) Method and device for detecting page tamper
CN105184159A (en) Web page falsification identification method and apparatus
CN111488623A (en) Webpage tampering detection method and related device
CN102467633A (en) Method and system for safely browsing webpage
CN104980404B (en) Method and system for protecting account information security
WO2021017318A1 (en) Cross-site scripting attack protection method and apparatus, device and storage medium
US20200336498A1 (en) Method and apparatus for detecting hidden link in website
EP3745292A1 (en) Hidden link detection method and apparatus for website
CN104036190A (en) Method and device for detecting page tampering
CN112637194A (en) Security event detection method and device, electronic equipment and storage medium
JP4881718B2 (en) Web page alteration detection device, program, and recording medium
CN110889113A (en) Log analysis method, server, electronic device and storage medium
CN107784107B (en) Dark chain detection method and device based on escape behavior analysis
CN111488621A (en) Method and system for detecting falsified webpage, electronic equipment and storage medium
CN112532624A (en) Black chain detection method and device, electronic equipment and readable storage medium
CN104036189A (en) Page distortion detecting method and black link database generating method
CN103475673A (en) Phishing website recognizing method and device and client side

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination