CN109522494B - Dark chain detection method, device, equipment and computer readable storage medium - Google Patents

Dark chain detection method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN109522494B
CN109522494B CN201811325782.0A CN201811325782A CN109522494B CN 109522494 B CN109522494 B CN 109522494B CN 201811325782 A CN201811325782 A CN 201811325782A CN 109522494 B CN109522494 B CN 109522494B
Authority
CN
China
Prior art keywords
website
weight
detected
attribution
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811325782.0A
Other languages
Chinese (zh)
Other versions
CN109522494A (en
Inventor
胡冰
范渊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dbappsecurity Technology Co Ltd
Original Assignee
Hangzhou Dbappsecurity Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dbappsecurity Technology Co Ltd filed Critical Hangzhou Dbappsecurity Technology Co Ltd
Priority to CN201811325782.0A priority Critical patent/CN109522494B/en
Publication of CN109522494A publication Critical patent/CN109522494A/en
Application granted granted Critical
Publication of CN109522494B publication Critical patent/CN109522494B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a dark chain detection method, a device, equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring a website to be detected, and determining a weight value of the website to be detected in a specified search engine; crawling the website to be detected, analyzing the crawled pages to obtain links therein, eliminating links which are the same as the domain name of the website to be detected from the analyzed links to obtain all external links, and determining the weight value of each external link in the specified search engine; and calculating a weight difference value of the weight value of any one of the external links and the weight value of the website to be detected according to a preset algorithm, judging whether the weight difference value is greater than a weight threshold value, if so, preliminarily determining the corresponding external link as a dark link, and otherwise, determining the corresponding external link as a normal link. The problem that professional knowledge requirements for managers are high and manpower is consumed in the technical scheme for realizing the dark chain detection in the prior art is solved.

Description

Dark chain detection method, device, equipment and computer readable storage medium
Technical Field
The present invention relates to the field of dark chain detection technologies, and in particular, to a dark chain detection method, apparatus, device, and computer-readable storage medium.
Background
The link of the hidden link in the website is made very hidden and is not easy to be perceived in a short time. At the present stage, a plurality of dark chain implantation modes are provided, for example, a visually invisible mode is implanted through a css (cs) mode, specifically, the font color of the dark chain is set to be consistent with the background color of a website, and a code block is hidden; jumping through the js script specifically includes implanting a js file into a website webpage, executing the js file when the webpage is loaded, and jumping to a dark chain website through the js file.
At present, a commonly used technical scheme for realizing the detection of the dark chain is that whether the dark chain exists is determined by a manager based on a webpage source code of a website, but the mode has higher requirement on professional knowledge of the manager and consumes manpower.
In summary, the technical solutions for realizing dark chain detection in the prior art have the problems of high requirements on professional knowledge of managers and labor consumption.
Disclosure of Invention
The invention aims to provide a dark chain detection method, a device, equipment and a computer readable storage medium, which can solve the problems of high requirement on professional knowledge of managers and labor consumption in the technical scheme of realizing dark chain detection in the prior art.
In order to achieve the above purpose, the invention provides the following technical scheme:
a dark chain detection method, comprising:
acquiring a website to be detected, and determining a weight value of the website to be detected in a specified search engine;
crawling the website to be detected, analyzing the crawled pages to obtain links therein, eliminating links which are the same as the domain name of the website to be detected from the analyzed links to obtain all external links, and determining the weight value of each external link in the specified search engine;
and calculating a weight difference value of the weight value of any one of the external links and the weight value of the website to be detected according to a preset algorithm, judging whether the weight difference value is greater than a weight threshold value, if so, preliminarily determining the corresponding external link as a dark link, and otherwise, determining the corresponding external link as a normal link.
Preferably, after the website to be detected is obtained, the method further includes: acquiring the attribution of the website to be detected;
after obtaining all external chains, the method further comprises the following steps: determining a home location for each of the outer chains;
before judging whether the weight difference is greater than the weight threshold, the method further comprises: judging whether the attribution of any external link in each external link is the same as that of the website to be detected or not;
correspondingly, the determining whether the weight difference is greater than the weight threshold includes: if the attribution of any external link is the same as the attribution of the website to be detected, judging whether the weight difference value corresponding to any external link is larger than a first threshold value, otherwise, judging whether the weight difference value corresponding to any external link is larger than a second threshold value, wherein the weight threshold value comprises the first threshold value and the second threshold value smaller than the first threshold value.
Preferably, the determining whether the attribution of any one of the external links is the same as the attribution of the website to be detected includes:
determining that the attribution of any one external link in each external link is a first attribution, and the attribution of the website to be detected is a second attribution;
judging whether the first attribution and the second attribution belong to the same nationality, if so, judging whether the first attribution and the second attribution belong to China, and if not, determining that the first attribution and the second attribution are different; if the first attribution and the second attribution both belong to China, judging whether the first attribution and the second attribution both belong to inland, and if the first attribution and the second attribution do not both belong to China, determining that the first attribution and the second attribution are the same; and if the first attribution and the second attribution are not both inland, determining that the first attribution and the second attribution are different.
Preferably, after determining whether each of all outer chains is a dark chain, the method further includes:
and counting the total number of the dark chains, and finally determining that the website to be detected is implanted with the dark chains if the total number is larger than a preset number threshold, or else, finally determining that the website to be detected is not implanted with the dark chains.
Preferably, after finally determining that the website to be detected is implanted with the dark chain, the method further includes:
and packaging the links of the website to be detected and all the determined dark chains into a scanning result object, and returning and writing the scanning result object into a corresponding database.
Preferably, determining the weight value of the website to be detected or any dark chain in a specified search engine includes:
if the number of the appointed search engines is one, determining that the weight value of the to-be-detected website or any dark chain in the appointed search engines is the weight value required to be obtained, and if the number of the appointed search engines is multiple, determining that the average value of the weight values of the to-be-detected website or any dark chain in the multiple appointed search engines is the weight value required to be obtained.
Preferably, calculating a weight difference between a weight value of any one of the external links and a weight value of the website to be detected according to a preset algorithm, includes:
determining a weight value of any one external link in each external link and a weight level to which the weight value of the to-be-detected website belongs respectively, wherein the weight level is a corresponding level obtained by dividing all weight values in advance;
the weight difference is calculated according to the following formula:
W=Q1*L1-Q2*L2;
w represents a weight difference value, Q1 and L1 represent a weight value and a weight grade of the website to be detected, respectively, and Q2 and L2 represent a weight value and a weight grade of any one of the external links, respectively.
A dark chain detection apparatus comprising:
an acquisition module to: acquiring a website to be detected, and determining a weight value of the website to be detected in a specified search engine;
a crawling module to: crawling the website to be detected, analyzing the crawled pages to obtain links therein, eliminating links which are the same as the domain name of the website to be detected from the analyzed links to obtain all external links, and determining the weight value of each external link in the specified search engine;
a determination module configured to: and calculating a weight difference value of the weight value of any one of the external links and the weight value of the website to be detected according to a preset algorithm, judging whether the weight difference value is greater than a weight threshold value, if so, preliminarily determining the corresponding external link as a dark link, and otherwise, determining the corresponding external link as a normal link.
A dark chain detection apparatus comprising:
a memory for storing a computer program;
a processor for implementing the steps of the dark chain detection method as claimed in any one of the preceding claims when executing the computer program.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the dark chain detection method according to any one of the preceding claims.
The invention provides a dark chain detection method, a device, equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring a website to be detected, and determining a weight value of the website to be detected in a specified search engine; crawling the website to be detected, analyzing the crawled pages to obtain links therein, eliminating links which are the same as the domain name of the website to be detected from the analyzed links to obtain all external links, and determining the weight value of each external link in the specified search engine; and calculating a weight difference value of the weight value of any one of the external links and the weight value of the website to be detected according to a preset algorithm, judging whether the weight difference value is greater than a weight threshold value, if so, preliminarily determining the corresponding external link as a dark link, and otherwise, determining the corresponding external link as a normal link. In the technical scheme disclosed by the application, the weight value of the website to be detected in the designated search engine is firstly obtained, then the weight value of each external chain in the website to be detected in the designated search engine is obtained, further determining whether the corresponding external link is a dark link or not based on whether the difference between the weight value of any external link and the weight value of the website to be detected is too large, since the difference between the weight values of a website and the normal links of the website in the same search engine is not too large in general, therefore, by the technical scheme disclosed by the application, whether the external chain is an illegally-implanted dark chain can be effectively determined based on the difference between the external chain and the weighted value of the website to be detected corresponding to the specified search engine, and the dark chain detection can be automatically realized without manual intervention, and the problems of high requirement on professional knowledge of managers and labor consumption in the technical scheme for realizing the dark chain detection in the prior art are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a dark chain detection method according to an embodiment of the present invention;
fig. 2 is a block diagram of a specific implementation of a dark chain detection method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a dark chain detection device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flowchart of a dark chain detection method according to an embodiment of the present invention is shown, which may include:
s11: and acquiring the website to be detected, and determining the weight value of the website to be detected in the appointed search engine.
It should be noted that, the execution subject of the dark chain detection method provided by the embodiment of the present invention may be a corresponding device. Firstly, any website needing to realize the dark chain detection can be used as a website to be detected, and the acquired website to be detected can be a domain name of a website needing to realize the dark chain detection and input from the outside. The weight value of a website or a link refers to a certain authority value given to a corresponding website (including a webpage) by a search engine, namely, the evaluation of the authority of the corresponding website (including the webpage), and in brief, the higher the weight value of a website is, the larger the weight of the website is, the better the ranking is in the search engine; correspondingly, the weighted value of the website or the link is improved, so that the corresponding website (including the webpage) is more front ranked in a search engine, the flow of the whole website can be improved, and the trust of the website is improved. The weight value of any website is given in the search engine, that is, the website and the corresponding search engine for which the weight value needs to be determined are determined, and the weight value of the current website in the corresponding search engine can be determined. The designated search engine may be a search engine designated according to actual needs, such as 360, hundred degrees, ***, etc., and is not limited herein. The weight value can be stored in the form of a map object, the key value corresponds to the name of a specified search engine, and the value corresponds to the weight value under the specified search engine.
S12: crawling a website to be detected, analyzing the crawled pages to obtain links therein, eliminating links which are the same as the domain name of the website to be detected in the analyzed links to obtain all external links, and determining the weight value of each external link in a specified search engine.
It should be noted that, because the website may have many sub-pages (all of which are web pages), each sub-page is a layer, when the domain name of the website to be detected input from the outside is obtained, the number of layers of web pages to be crawled set by the outside can also be obtained, where the number of layers of crawled web pages corresponds to the number of layers of web pages to be crawled when the website to be detected is crawled, and generally the number of layers of web pages is set to be selected according to the needs of customers. In addition, in order to effectively realize dark chain detection, corresponding parameters such as refer, usergent, cookie and the like of the website to be detected can be set. Specifically, crawling the website to be detected can be achieved by using a web crawler, the number of crawlable layers is the set number of web page layers, the website to be detected is crawled by using the web crawler, an html page (namely, a page obtained by crawling) of the website to be detected is obtained, based on an implementation principle of a corresponding technical scheme in the prior art, url links in the html page are obtained (namely, the crawled page is analyzed to obtain links therein) through regular matching and hyperlink labels, links with the same domain name as the website to be detected are removed, and all external links of the html page are obtained.
S13: calculating a weight difference value of the weight value of any one of the external links and the weight value of the website to be detected according to a preset algorithm, judging whether the weight difference value is larger than a weight threshold value, if so, preliminarily determining the corresponding external link as a dark link, and otherwise, determining the corresponding external link as a normal link.
It should be noted that the preset algorithm may be determined according to actual needs, and regardless of how the preset algorithm is specifically set, the obtained weight difference value can represent a difference between the weight value of any external link and the weight value of the website to be detected, and since the difference between the weight value of a normal link in a website and the weight value of the website to be detected is not too large in general cases, if the difference between the weight value of any external link and the weight value of the website to be detected is too large (the weight difference value is greater than a weight threshold value set according to actual needs), the corresponding external link may be preliminarily considered as an illegally-implanted dark link, otherwise, the corresponding external link may be preliminarily considered as a normal link. In addition, in the present application, the step related to determining whether any outer chain in each outer chain is a dark chain is a step that needs to be implemented for each outer chain.
In the technical scheme disclosed by the application, the weight value of the website to be detected in the designated search engine is firstly obtained, then the weight value of each external chain in the website to be detected in the designated search engine is obtained, further determining whether the corresponding external link is a dark link or not based on whether the difference between the weight value of any external link and the weight value of the website to be detected is too large, since the difference between the weight values of a website and the normal links of the website in the same search engine is not too large in general, therefore, by the technical scheme disclosed by the application, whether the external chain is an illegally-implanted dark chain can be effectively determined based on the difference between the external chain and the weighted value of the website to be detected corresponding to the specified search engine, and the dark chain detection can be automatically realized without manual intervention, and the problems of high requirement on professional knowledge of managers and labor consumption in the technical scheme for realizing the dark chain detection in the prior art are solved.
The method for detecting the dark chain provided by the embodiment of the invention can further comprise the following steps after the website to be detected is obtained: acquiring the attribution of a website to be detected;
after obtaining all the outer chains, the method can further comprise: determining a home location for each outer chain;
before determining whether the weight difference is greater than the weight threshold, the method may further include: judging whether the attribution of any external link in each external link is the same as that of the website to be detected or not;
correspondingly, determining whether the weight difference is greater than the weight threshold may include: if the attribution of any external link is the same as that of the website to be detected, judging whether the weight difference value corresponding to any external link is larger than a first threshold value, otherwise, judging whether the weight difference value corresponding to any external link is larger than a second threshold value, wherein the weight threshold value comprises the first threshold value and the second threshold value smaller than the first threshold value.
It should be noted that both the first threshold and the second threshold can be determined according to actual needs, and in general, the attributions of a website and links included in the website are the same, so that a layer of dark chain detection mechanism is added by comparing the attribution of any external chain with the attribution of the website to be detected. The reason that the first threshold value is larger than the second threshold value is that the attribution place of the website to be detected and any external chain is the same is that the website to be detected and any external chain are normal links from the attribution place, at the moment, if the difference between the weight values of the website to be detected and any external chain is not large enough, any external chain can be comprehensively considered as a normal external chain, and only if the difference between the weight values of the website to be detected and any external chain is large enough, any external chain can be comprehensively considered as an illegally-implanted dark chain; if the attribution of the website to be detected is different from that of any external chain, the fact that any external chain is an illegally-implanted dark chain is indicated by the attribution, and therefore any external chain can be comprehensively judged to be an illegally-implanted dark chain as long as a certain difference exists between the weight values of any external chain and the website to be detected; the comprehensive judgment method greatly increases the accuracy of dark chain detection by adding a mechanism for realizing dark chain detection based on attribution.
The dark chain detection method provided by the embodiment of the present invention determines whether the attribution of any external chain in each external chain is the same as the attribution of the website to be detected, and may include:
determining that the attribution of any one external link in each external link is a first attribution, and determining that the attribution of the website to be detected is a second attribution;
judging whether the first attribution place and the second attribution place belong to the same nationality, if so, judging whether the first attribution place and the second attribution place both belong to China, and if not, determining that the first attribution place and the second attribution place are different; if the first attribution place and the second attribution place both belong to China, judging whether the first attribution place and the second attribution place both belong to inland, and if the first attribution place and the second attribution place do not both belong to China, determining that the first attribution place and the second attribution place are the same; and if the first attribution and the second attribution are not both inland, determining that the first attribution and the second attribution are different.
Therefore, the first attribution place and the second attribution place are determined to be the same when the first attribution place and the second attribution place do not belong to China, statistics shows that the number of foreign links appearing in China is small at present, and the weight values of the foreign links are high under the ordinary condition, so that the foreign links are regarded as normal links in the application, namely the dark chain detection is mainly performed on the links in China, therefore, the workload of obtaining the related information of the foreign links can be greatly reduced, and the accuracy of the dark chain detection is not affected basically.
The embodiment of the invention provides a dark chain detection method,
after determining whether each outer chain in all the outer chains is a dark chain, the method further comprises the following steps:
and counting the total number of the dark chains, and finally determining that the website to be detected is implanted with the dark chains if the total number is larger than a preset number threshold, or else, finally determining that the website to be detected is not implanted with the dark chains.
It should be noted that, in the present application, a third layer of detection mechanism is also provided for dark chain detection, that is, the number of dark chains is counted, and if the counted number is large (the total number is greater than a number threshold value set in advance according to actual needs), it can be finally determined that an illegal dark chain is implanted into the website to be detected, and the preliminarily determined dark chain is an external chain illegally implanted into the website to be detected; if the counted number is small (the total number is not greater than the number threshold preset according to actual needs), it can be considered that normal links are mistaken for dark links due to some special reasons, and therefore it can be determined that the website to be detected is not implanted with illegal dark links, the accuracy of the final conclusion is further guaranteed through the mechanism, and the fault tolerance of the technical scheme is improved.
The method for detecting the dark chain provided by the embodiment of the invention can finally determine that the website to be detected is implanted with the dark chain, and further comprises the following steps:
and packaging the links of the website to be detected and all the determined dark chains into a scanning result object, and returning and writing the scanning result object into the corresponding database.
It should be noted that the links and all the hidden links of the website to be detected are packaged into a scanning result object and written into the database, so that the website to be detected can be conveniently checked and used at any time.
The dark chain detection method provided by the embodiment of the invention determines the weight value of a website to be detected or any dark chain in a specified search engine, and can comprise the following steps:
if the number of the appointed search engines is one, determining that the weight value of the website to be detected or any dark chain in the appointed search engines is the weight value required to be obtained, and if the number of the appointed search engines is multiple, determining that the average value of the weight values of the website to be detected or any dark chain in the multiple appointed search engines is the weight value required to be obtained.
In order to ensure the credibility of the weight value difference between the website to be detected and any external chain, the search engines for obtaining the weight values of the website to be detected and each external chain need to be ensured to be consistent, so that the weight values of the website to be detected and each external chain need to be determined by each appointed search engine no matter the number of the appointed search engines, the number of the specific appointed search engines can be determined according to actual needs, the weight values are obtained through the method, and the value for effectively reflecting the weight of the corresponding link can be simply obtained.
The dark chain detection method provided by the embodiment of the invention is used for calculating the weight difference value between the weight value of any one external chain in each external chain and the weight value of a website to be detected according to a preset algorithm, and the method can comprise the following steps:
determining a weight value of any one external link in each external link and a weight level to which the weight value of the to-be-detected website belongs respectively, wherein the weight level is a corresponding level obtained by dividing all the weight values in advance;
the weight difference is calculated according to the following formula:
W=Q1*L1-Q2*L2;
wherein, W represents the weight difference, Q1 and L1 represent the weight value and weight grade of the website to be detected, respectively, and Q2 and L2 represent the weight value and weight grade of any one of the external links, respectively.
It should be noted that all external links may be stored in the external link set, and then the external link set is traversed to query the weight value, the attribution, and other related information of each external link, and in addition, a corresponding weight level may be set for each weight value according to actual needs, taking an example of designating a search engine as a hundred degrees, where the weight values for links include 10 values, as shown in table 1; in order to reduce the number of weight levels and simplify the calculation process, the method of dividing all weight values in advance to obtain weight levels includes: dividing all weighted values into five weighted levels, wherein the weighted values from 0 to 3 belong to the weighted level 1, the weighted values from 4 to 5 belong to the weighted level 2, the weighted values from 6 to 7 belong to the weighted level 3, the weighted values from 8 belong to the weighted level 4, and the weighted values from 9 belong to the weighted level 5. Therefore, the weight difference value between the website to be detected and any external link is comprehensively calculated through the weight value and the weight grade, the weight difference between the website to be detected and any external link can be effectively reflected, and the effective implementation of the subsequent steps is further ensured.
TABLE 1
Figure BDA0001858716330000101
In addition, a specific implementation block diagram of the dark chain detection method according to the embodiment of the present invention may be shown in fig. 3, where G ═ 1 indicates that the attributions are the same, G ═ 0 indicates that the attributions are different, the first threshold is 10, the second threshold is 5, and other steps are all consistent with those in the foregoing embodiment of the present invention, and are not described again here. Therefore, through multi-layer verification deletion, the accuracy of finding the dark chain is greatly improved, the false alarm condition is reduced, the safety of the website can be guaranteed, and the robustness of the website is enhanced.
The present invention also provides a dark chain detection apparatus, as shown in fig. 3, which may include:
a weight value obtaining module 11, configured to: acquiring a website to be detected, and determining the weight value of the website to be detected in a specified search engine;
a website crawling module 12 configured to: crawling a website to be detected, analyzing the crawled pages to obtain links therein, eliminating links which are the same as the domain name of the website to be detected from the analyzed links to obtain all external links, and determining the weight value of each external link in a specified search engine;
a preliminary judgment module 13, configured to: calculating a weight difference value of the weight value of any one of the external links and the weight value of the website to be detected according to a preset algorithm, judging whether the weight difference value is greater than a weight threshold value, if so, determining that the corresponding external link is a dark link, and otherwise, determining that the corresponding external link is a normal link.
The dark chain detection device provided by the invention can further comprise:
a home acquisition module to: after acquiring a website to be detected, acquiring the attribution of the website to be detected;
a home determination module to: after all external chains are obtained, determining the attribution of each external chain;
a home determination module to: before judging whether the weight difference is larger than the weight threshold value, judging whether the attribution of any external link in each external link is the same as the attribution of the website to be detected;
the preliminary judgment module may include:
a preliminary judgment unit configured to: if the attribution of any external link is the same as that of the website to be detected, judging whether the weight difference value corresponding to any external link is larger than a first threshold value, otherwise, judging whether the weight difference value corresponding to any external link is larger than a second threshold value, wherein the weight threshold value comprises the first threshold value and the second threshold value smaller than the first threshold value.
The invention provides a dark chain detection device, wherein the attribution judgment module comprises:
a home determination unit configured to: determining that the attribution of any one external link in each external link is a first attribution, and determining that the attribution of the website to be detected is a second attribution; judging whether the first attribution place and the second attribution place belong to the same nationality, if so, judging whether the first attribution place and the second attribution place both belong to China, and if not, determining that the first attribution place and the second attribution place are different; if the first attribution place and the second attribution place both belong to China, judging whether the first attribution place and the second attribution place both belong to inland, and if the first attribution place and the second attribution place do not both belong to China, determining that the first attribution place and the second attribution place are the same; and if the first attribution and the second attribution are not both inland, determining that the first attribution and the second attribution are different.
The dark chain detection device provided by the invention can further comprise:
a final judgment module for: and after the corresponding external link is preliminarily determined to be the dark link, counting the total number of the dark links, and finally determining that the website to be detected is implanted with the dark link if the total number is larger than a preset number threshold, or else, finally determining that the website to be detected is not implanted with the dark link.
The dark chain detection device provided by the invention can further comprise:
a warehousing module for: and finally, after the hidden chain is implanted into the website to be detected, packaging the link of the website to be detected and all the determined hidden chains into a scanning result object, and returning and writing the scanning result object into the corresponding database.
In the dark chain detection device provided by the invention, the weight value obtaining module and the website crawling module may both comprise:
a weight value acquisition unit configured to: if the number of the appointed search engines is one, determining that the weight value of the website to be detected or any dark chain in the appointed search engines is the weight value required to be obtained, and if the number of the appointed search engines is multiple, determining that the average value of the weight values of the website to be detected or any dark chain in the multiple appointed search engines is the weight value required to be obtained.
The invention provides a dark chain detection device, and the preliminary judgment module can comprise:
a difference calculation unit for: determining the weight level of any one external link in each external link and the weight level of the to-be-detected website, wherein the weight level is a corresponding level obtained by dividing all the weight values in advance; the weight difference is calculated according to the following formula:
W=Q1*L1-Q2*L2;
wherein, W represents the weight difference, Q1 and L1 represent the weight value and weight grade of the website to be detected, respectively, and Q2 and L2 represent the weight value and weight grade of any one of the external links, respectively.
An embodiment of the present invention further provides a dark chain detection device, which may include:
a memory for storing a computer program;
a processor for implementing the steps of the dark chain detection method as any one of above when executing the computer program.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the dark chain detection method can be implemented as in any one of the above.
It should be noted that for the description of the relevant parts in the dark chain detection device, the apparatus and the computer-readable storage medium provided in the embodiments of the present invention, reference is made to the detailed description of the corresponding parts in the dark chain detection method provided in the embodiments of the present invention, and details are not repeated herein. In addition, parts of the technical solutions provided in the embodiments of the present invention that are consistent with the implementation principles of the corresponding technical solutions in the prior art are not described in detail, so as to avoid redundant description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A dark chain detection method, comprising:
acquiring a website to be detected, and determining a weight value of the website to be detected in a specified search engine;
crawling the website to be detected, analyzing the crawled pages to obtain links therein, eliminating links which are the same as the domain name of the website to be detected from the analyzed links to obtain all external links, and determining the weight value of each external link in the specified search engine;
and calculating a weight difference value of the weight value of any one of the external links and the weight value of the website to be detected according to a preset algorithm, judging whether the weight difference value is greater than a weight threshold value, if so, preliminarily determining the corresponding external link as a dark link, and otherwise, determining the corresponding external link as a normal link.
2. The method according to claim 1, wherein after acquiring the website to be detected, the method further comprises: acquiring the attribution of the website to be detected;
after obtaining all external chains, the method further comprises the following steps: determining a home location for each of the outer chains;
before judging whether the weight difference is greater than the weight threshold, the method further comprises: judging whether the attribution of any external link in each external link is the same as that of the website to be detected or not;
correspondingly, the determining whether the weight difference is greater than the weight threshold includes: if the attribution of any external link is the same as the attribution of the website to be detected, judging whether the weight difference value corresponding to any external link is larger than a first threshold value, otherwise, judging whether the weight difference value corresponding to any external link is larger than a second threshold value, wherein the weight threshold value comprises the first threshold value and the second threshold value smaller than the first threshold value.
3. The method according to claim 2, wherein determining whether the attribution of any one of the foreign chains is the same as the attribution of the website to be detected comprises:
determining that the attribution of any one external link in each external link is a first attribution, and the attribution of the website to be detected is a second attribution;
judging whether the first attribution and the second attribution belong to the same nationality, if so, judging whether the first attribution and the second attribution belong to China, and if not, determining that the first attribution and the second attribution are different; if the first attribution and the second attribution both belong to China, judging whether the first attribution and the second attribution both belong to inland, and if the first attribution and the second attribution do not both belong to China, determining that the first attribution and the second attribution are the same; and if the first attribution and the second attribution are not both inland, determining that the first attribution and the second attribution are different.
4. The method of claim 2, wherein determining whether each of the outer chains is a dark chain further comprises:
and counting the total number of the dark chains, and finally determining that the website to be detected is implanted with the dark chains if the total number is larger than a preset number threshold, or else, finally determining that the website to be detected is not implanted with the dark chains.
5. The method according to claim 4, further comprising, after finally determining that the website to be detected is embedded with the dark chain:
and packaging the links of the website to be detected and all the determined dark chains into a scanning result object, and returning and writing the scanning result object into a corresponding database.
6. The method according to claim 1, wherein determining the weight value of the website or any dark chain to be detected in a specified search engine comprises:
if the number of the appointed search engines is one, determining that the weight value of the to-be-detected website or any dark chain in the appointed search engines is the weight value required to be obtained, and if the number of the appointed search engines is multiple, determining that the average value of the weight values of the to-be-detected website or any dark chain in the multiple appointed search engines is the weight value required to be obtained.
7. The method according to claim 1, wherein calculating a weight difference between a weight value of any one of the external links and a weight value of the website to be detected according to a preset algorithm comprises:
determining a weight value of any one external link in each external link and a weight level to which the weight value of the to-be-detected website belongs respectively, wherein the weight level is a corresponding level obtained by dividing all weight values in advance;
the weight difference is calculated according to the following formula:
W=Q1*L1-Q2*L2;
w represents a weight difference value, Q1 and L1 represent a weight value and a weight grade of the website to be detected, respectively, and Q2 and L2 represent a weight value and a weight grade of any one of the external links, respectively.
8. A dark chain detection device, comprising:
a weight value obtaining module configured to: acquiring a website to be detected, and determining a weight value of the website to be detected in a specified search engine;
a website crawling module for: crawling the website to be detected, analyzing the crawled pages to obtain links therein, eliminating links which are the same as the domain name of the website to be detected from the analyzed links to obtain all external links, and determining the weight value of each external link in the specified search engine;
a preliminary judgment module configured to: and calculating a weight difference value of the weight value of any one of the external links and the weight value of the website to be detected according to a preset algorithm, judging whether the weight difference value is greater than a weight threshold value, if so, preliminarily determining the corresponding external link as a dark link, and otherwise, determining the corresponding external link as a normal link.
9. A dark chain detection apparatus, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the dark chain detection method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the dark chain detection method according to any one of claims 1 to 7.
CN201811325782.0A 2018-11-08 2018-11-08 Dark chain detection method, device, equipment and computer readable storage medium Active CN109522494B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811325782.0A CN109522494B (en) 2018-11-08 2018-11-08 Dark chain detection method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811325782.0A CN109522494B (en) 2018-11-08 2018-11-08 Dark chain detection method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN109522494A CN109522494A (en) 2019-03-26
CN109522494B true CN109522494B (en) 2020-09-15

Family

ID=65773708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811325782.0A Active CN109522494B (en) 2018-11-08 2018-11-08 Dark chain detection method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109522494B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680252B (en) * 2020-06-05 2023-07-25 腾讯科技(深圳)有限公司 Method, device, equipment and computer readable storage medium for identifying outer chain
CN112398963A (en) * 2020-10-13 2021-02-23 易讯科技股份有限公司 Method for realizing intelligent recognition and flexible translation of IPv4 external link
CN113407802A (en) * 2021-06-10 2021-09-17 杭州安恒信息技术股份有限公司 Spider pool website identification method and device, electronic device and storage medium

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096781A (en) * 2011-01-18 2011-06-15 南京邮电大学 Fishing detection method based on webpage relevance
CN102436563A (en) * 2011-12-30 2012-05-02 奇智软件(北京)有限公司 Method and device for detecting page tampering
CN102622435A (en) * 2012-02-29 2012-08-01 百度在线网络技术(北京)有限公司 Method and device for detecting black chain
CN102663054A (en) * 2012-03-29 2012-09-12 奇智软件(北京)有限公司 Method and device for determining weight of website
CN102682097A (en) * 2012-04-27 2012-09-19 北京神州绿盟信息安全科技股份有限公司 Method and equipment for detecting secrete links in web page
CN103593615A (en) * 2013-11-29 2014-02-19 北京奇虎科技有限公司 Method and device for detecting webpage tampering
CN103856442A (en) * 2012-11-30 2014-06-11 腾讯科技(深圳)有限公司 Black chain detection method, apparatus and system
CN103927480A (en) * 2013-01-14 2014-07-16 腾讯科技(深圳)有限公司 Method, device and system for identifying malicious web page
CN104239485A (en) * 2014-09-05 2014-12-24 中国科学院计算机网络信息中心 Statistical machine learning-based internet hidden link detection method
CN104391955A (en) * 2014-11-27 2015-03-04 北京国双科技有限公司 Web page correlation detection method and device
CN105488402A (en) * 2014-12-23 2016-04-13 哈尔滨安天科技股份有限公司 Dark link detection method and system
CN105740308A (en) * 2015-12-19 2016-07-06 哈尔滨安天科技股份有限公司 Hypertext markup language structure-based website invisible link detection method and system
RU2015148437A (en) * 2015-11-10 2017-05-15 федеральное государственное автономное образовательное учреждение высшего образования "Санкт-Петербургский политехнический университет Петра Великого" (ФГАОУ ВО "СПбПУ") A method for detecting hidden relationships and a system for investigating the incident of safety on the Internet of Things
CN107370718A (en) * 2016-05-12 2017-11-21 深圳市深信服电子科技有限公司 The detection method and device of black chain in webpage
CN107729386A (en) * 2017-09-19 2018-02-23 杭州安恒信息技术有限公司 A kind of dark chain detection technique based on degree of polymerization analysis
CN107784107A (en) * 2017-10-31 2018-03-09 杭州安恒信息技术有限公司 Dark chain detection method and device based on flight behavior analysis
CN108363711A (en) * 2017-07-04 2018-08-03 北京安天网络安全技术有限公司 The detection method and device of a kind of dark chain in webpage

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9954877B2 (en) * 2015-12-21 2018-04-24 Ebay Inc. Automatic detection of hidden link mismatches with spoofed metadata

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096781A (en) * 2011-01-18 2011-06-15 南京邮电大学 Fishing detection method based on webpage relevance
CN102436563A (en) * 2011-12-30 2012-05-02 奇智软件(北京)有限公司 Method and device for detecting page tampering
CN102622435A (en) * 2012-02-29 2012-08-01 百度在线网络技术(北京)有限公司 Method and device for detecting black chain
CN102663054A (en) * 2012-03-29 2012-09-12 奇智软件(北京)有限公司 Method and device for determining weight of website
CN102682097A (en) * 2012-04-27 2012-09-19 北京神州绿盟信息安全科技股份有限公司 Method and equipment for detecting secrete links in web page
CN103856442A (en) * 2012-11-30 2014-06-11 腾讯科技(深圳)有限公司 Black chain detection method, apparatus and system
CN103927480A (en) * 2013-01-14 2014-07-16 腾讯科技(深圳)有限公司 Method, device and system for identifying malicious web page
CN103593615A (en) * 2013-11-29 2014-02-19 北京奇虎科技有限公司 Method and device for detecting webpage tampering
CN104239485A (en) * 2014-09-05 2014-12-24 中国科学院计算机网络信息中心 Statistical machine learning-based internet hidden link detection method
CN104391955A (en) * 2014-11-27 2015-03-04 北京国双科技有限公司 Web page correlation detection method and device
CN105488402A (en) * 2014-12-23 2016-04-13 哈尔滨安天科技股份有限公司 Dark link detection method and system
RU2015148437A (en) * 2015-11-10 2017-05-15 федеральное государственное автономное образовательное учреждение высшего образования "Санкт-Петербургский политехнический университет Петра Великого" (ФГАОУ ВО "СПбПУ") A method for detecting hidden relationships and a system for investigating the incident of safety on the Internet of Things
CN105740308A (en) * 2015-12-19 2016-07-06 哈尔滨安天科技股份有限公司 Hypertext markup language structure-based website invisible link detection method and system
CN107370718A (en) * 2016-05-12 2017-11-21 深圳市深信服电子科技有限公司 The detection method and device of black chain in webpage
CN108363711A (en) * 2017-07-04 2018-08-03 北京安天网络安全技术有限公司 The detection method and device of a kind of dark chain in webpage
CN107729386A (en) * 2017-09-19 2018-02-23 杭州安恒信息技术有限公司 A kind of dark chain detection technique based on degree of polymerization analysis
CN107784107A (en) * 2017-10-31 2018-03-09 杭州安恒信息技术有限公司 Dark chain detection method and device based on flight behavior analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《基于机器学习的网页暗链检测方法》;周文怡等;《计算机工程》;20181031;第44卷(第10期);第22-27页 *
《基于统计机器学习的互联网暗链检测方法》;孟池洁等;《计算机应用研究》;20150930;第32卷(第9期);第2779-2783页 *

Also Published As

Publication number Publication date
CN109522494A (en) 2019-03-26

Similar Documents

Publication Publication Date Title
CN109522494B (en) Dark chain detection method, device, equipment and computer readable storage medium
US9336388B2 (en) Method and system for thwarting insider attacks through informational network analysis
CN105184159A (en) Web page falsification identification method and apparatus
CN102591965B (en) Method and device for detecting black chain
CN103746987B (en) Method and system for detecting DoS attack in semantic Web application
CN101950338A (en) Bug repair method based on hierarchical bug threat assessment
CN101490685A (en) A method for increasing the security level of a user machine browsing web pages
CN111159775A (en) Webpage tampering detection method, system and device and computer readable storage medium
US20200336498A1 (en) Method and apparatus for detecting hidden link in website
CN103679053B (en) A kind of detection method of webpage tamper and device
CN102682097A (en) Method and equipment for detecting secrete links in web page
WO2009152511A2 (en) Control flow deviation detection for software security
CN107784107B (en) Dark chain detection method and device based on escape behavior analysis
CN107506649A (en) A kind of leak detection method of html web page, device and electronic equipment
CN113381962A (en) Data processing method, device and storage medium
Wu et al. TrackerDetector: A system to detect third-party trackers through machine learning
CN111131166B (en) User behavior prejudging method and related equipment
CN108881154A (en) Webpage is tampered detection method, apparatus and system
CN110781497B (en) Method for detecting web page link and storage medium
CN111782991A (en) Method, device, equipment and storage medium for detecting abnormal hidden link of website
CN115114676A (en) Remote webpage tampering monitoring method, system, equipment and storage medium
CN114124564B (en) Method and device for detecting counterfeit website, electronic equipment and storage medium
CN104077353A (en) Method and device for detecting hacking links
CN110210221A (en) A kind of documentation risk detection method and device
CN114357331A (en) Webpage information display method and device, electronic equipment, storage medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant