CN105488402A - Dark link detection method and system - Google Patents
Dark link detection method and system Download PDFInfo
- Publication number
- CN105488402A CN105488402A CN201410807527.5A CN201410807527A CN105488402A CN 105488402 A CN105488402 A CN 105488402A CN 201410807527 A CN201410807527 A CN 201410807527A CN 105488402 A CN105488402 A CN 105488402A
- Authority
- CN
- China
- Prior art keywords
- page
- dark chain
- website
- dark
- chain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention proposes a dark link detection method and system. The method comprises: establishing a search engine spider user agent information library, a dark link feature library and a website basic information library; traversing each webpage in the website basic information library; comparing webpage information with the dark link feature library, and if features in the dark link feature library exist, then indicating that a dark link exists in the page, otherwise, simulating search engine spider user agent information and re-requesting for the page; and comparing whether a HASH value of the page is the same as that in the website basic information library or not, and if the HASH value of the page is the same as that in the website basic information library, then indicating that the dark link does not exist in the page, otherwise, indicating that the dark link exists in the page. According to the method, the dark link in a website can be detected and deleted without the need for artificial participation, so that the detection efficiency of the dark link is greatly improved.
Description
Technical field
The present invention relates to network safety filed, particularly a kind of detection method of dark chain and system.
Background technology
Along with the fast development of internet is with universal, Internet technology extends to social every field, and website shows abundant content as the carrier of information to people.But the content how interested to quick-searching people, the appearance of search engine solves this difficult problem, and the result of retrieval by information search with after integrating, according to priority can be showed user by search engine successively.
Dark chain is one of the most effective search engine optimization (SEO) mode of practising fraud, and this is also the most frequently used a kind of mode of hacker, invades website, then in webpage, implant dark chain, improve dark chain rank in a search engine, therefrom seek interests by batch.
Along with the antagonism of dark chain and its detection technique, hacker can judge that the UserAgent information in http protocol is search engine spider or browser access in some dark chain codes implanted, if browser access, then and hidden chain; Tradition adopts the method for the dark chain of manual detection, and efficiency is extremely low, and relies on merely the domain name feature of dark chain to go to detect, and cannot detect hidden chain.
Summary of the invention
For above-mentioned problems of the prior art, the present invention proposes a kind of detection method of dark chain, solve the limitation that dark chain detects, can the dark chain hidden be detected, also solve the hysteresis quality problem of manual detection.
A detection method for dark chain, comprising:
A. user agent's information bank of search engine spider is created;
Create dark chain feature database, the feature in described dark chain feature database comprises dark chain title and dark chain URL;
Create essential information storehouse, website, essential information storehouse, described website comprises all content of pages in website, the page corresponding URL, page HASH, hyperlinking names and URL;
B. all pages of all websites in essential information storehouse, website are traveled through one by one, until last end-of-page;
C. judge the feature whether had in the hyperlinking names of current page or URL in dark chain feature database, if so, then judge that current page exists dark chain, and delete the dark chain information in the page, the feature namely in dark chain feature database; Otherwise, perform steps d;
D. user agent's information of simulation search engine spider, asks the above-mentioned page again, and calculates its page HASH;
E. judge that whether described page HASH value is identical with the page HASH value in essential information storehouse, website, if so, then there is not hidden chain in described website, returns step b; Otherwise described website exists hidden chain, perform step f;
F. by the page of again asking and website essential information storehouse comparison, information different in the page of again asking is deleted.
In described method, described step f, before deleting information different in the page of again asking, also comprises: extract information different in the page of again asking, and resolve, and adds the dark chain title obtained after parsing and dark chain URL to dark chain feature database.
A detection system for dark chain, comprising:
MBM: for creating user agent's information bank of search engine spider;
Create dark chain feature database, the feature in described dark chain feature database comprises dark chain title and dark chain URL;
Create essential information storehouse, website, essential information storehouse, described website comprises all content of pages in website, the page corresponding URL, page HASH, hyperlinking names and URL;
Detection module, for traveling through all pages of all websites in essential information storehouse, website one by one, until last end-of-page;
Judge the feature whether had in the hyperlinking names of current page or URL in dark chain feature database, if so, then judge that current page exists dark chain, and delete the dark chain information in the page; Otherwise user agent's information of simulation search engine spider, asks the above-mentioned page again, and calculate its page HASH;
Judge that whether described page HASH value is identical with the page HASH value in essential information storehouse, website, if so, then there is not hidden chain in described website; Otherwise described website exists hidden chain, enter disposal module;
Dispose module, for the page of will again ask and website essential information storehouse comparison, information different in the page that deletion is asked again.
In described system, described disposal module, before deleting information different in the page of again asking, also comprises: extract information different in the page of again asking, and resolve, add the dark chain title obtained after parsing and dark chain URL to dark chain feature database.
Advantage of the present invention is, provides a kind of technology and the realization that detect dark chain, solves the limitation detecting dark chain at present, detects incomplete problem to dark chain feature based code, can detect the dark chain hidden; Also the hysteresis quality problem of manual detection is solved, higher than traditional detection mode efficiency.The present invention is based on user agent's simulation, adopt content matching mode simultaneously, not only can detect dark chain, the dark chain that hacker implants can also be recovered, solve the dark chain of traditional detection, features such as can not removing can only be looked into.Solve and the problem removed is disposed, without the need to manually participating in handling problems to dark chain.Solve traditional characteristic code and detect the artificial participation of dark chain needs, dispose, add the process that rule is a self-closing.And the inventive method is easy to expansion, is easy to safeguard, can fully automatic operation, there is no extra exploitation and the expense of manual maintenance, greatly facilitate realization and the O&M of the inventive method.
The present invention proposes a kind of detection method and system of dark chain, by setting up search engine spider user agent information bank, dark chain feature database and essential information storehouse, website, each webpage in traversal essential information storehouse, website, and info web and dark chain feature database are contrasted, if there is the feature in dark chain feature database, there is dark chain in the described page, otherwise simulation search engine spider user agent information, again the described page is asked, and whether compare its HASH value identical with the HASH value in essential information storehouse, website, if the same there is not dark chain in the described page, otherwise there is dark chain in the described page.Method of the present invention, without the need to artificial participation, can detect the dark chain in website and remove.Greatly improve the detection efficiency to dark chain.
Accompanying drawing explanation
In order to be illustrated more clearly in the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, the accompanying drawing that the following describes is only some embodiments recorded in the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the detection method embodiment process flow diagram of a kind of dark chain of the present invention;
Fig. 2 is the detection system example structure schematic diagram of a kind of dark chain of the present invention.
Embodiment
In order to make those skilled in the art person understand technical scheme in the embodiment of the present invention better, and enable above-mentioned purpose of the present invention, feature and advantage become apparent more, below in conjunction with accompanying drawing, technical scheme in the present invention is described in further detail.
For above-mentioned problems of the prior art, the present invention proposes a kind of detection method of dark chain, solve the limitation that dark chain detects, can the dark chain hidden be detected, also solve the hysteresis quality problem of manual detection.
A detection method for dark chain, as shown in Figure 1, comprising:
S101: the user agent's information bank creating search engine spider;
Create dark chain feature database, the feature in described dark chain feature database comprises dark chain title and dark chain URL;
Create essential information storehouse, website, essential information storehouse, described website comprises all content of pages in website, the page corresponding URL, page HASH, hyperlinking names and URL;
S102: all pages traveling through all websites in essential information storehouse, website one by one, until last end-of-page;
S103: judge the feature whether had in the hyperlinking names of current page or URL in dark chain feature database, if so, then judges that current page exists dark chain, and deletes the dark chain information in the page; Otherwise, perform S104;
S104: user agent's information of simulation search engine spider, asks the above-mentioned page again, and calculates its page HASH;
S105: judge that whether described page HASH value is identical with the page HASH value in essential information storehouse, website, if so, then described website does not exist hidden chain, returns S102; Otherwise described website exists hidden chain, perform S106;
S106: by the page of again asking and website essential information storehouse comparison, deletes information different in the page of again asking.Namely both do not exist together and are dark chain.
In described method, in described S106, before deleting information different in the page of again asking, also comprise: extract information different in the page of again asking, and resolve, add the dark chain title obtained after parsing and dark chain URL to dark chain feature database.
A detection system for dark chain, as shown in Figure 2, comprising:
MBM 201, for creating user agent's information bank of search engine spider;
Create dark chain feature database, the feature in described dark chain feature database comprises dark chain title and dark chain URL;
Create essential information storehouse, website, essential information storehouse, described website comprises all content of pages in website, the page corresponding URL, page HASH, hyperlinking names and URL;
Detection module 202, for traveling through all pages of all websites in essential information storehouse, website one by one, until last end-of-page;
Judge the feature whether had in the hyperlinking names of current page or URL in dark chain feature database, if so, then judge that current page exists dark chain, and delete the dark chain information in the page; Otherwise user agent's information of simulation search engine spider, asks the above-mentioned page again, and calculate its page HASH;
Judge that whether described page HASH value is identical with the page HASH value in essential information storehouse, website, if so, then there is not hidden chain in described website; Otherwise described website exists hidden chain, enter disposal module;
Dispose module 203, for the page of will again ask and website essential information storehouse comparison, information different in the page that deletion is asked again.
In described system, described disposal module, before deleting information different in the page of again asking, also comprises: extract information different in the page of again asking, and resolve, add the dark chain title obtained after parsing and dark chain URL to dark chain feature database.
Advantage of the present invention is, provides a kind of technology and the realization that detect dark chain, solves the limitation detecting dark chain at present, detects incomplete problem to dark chain feature based code, can detect the dark chain hidden; Also the hysteresis quality problem of manual detection is solved, higher than traditional detection mode efficiency.The present invention is based on user agent's simulation, adopt content matching mode simultaneously, not only can detect dark chain, the dark chain that hacker implants can also be recovered, solve the dark chain of traditional detection, features such as can not removing can only be looked into.Solve and the problem removed is disposed, without the need to manually participating in handling problems to dark chain.Solve traditional characteristic code and detect the artificial participation of dark chain needs, dispose, add the process that rule is a self-closing.And the inventive method is easy to expansion, is easy to safeguard, can fully automatic operation, there is no extra exploitation and the expense of manual maintenance, greatly facilitate realization and the O&M of the inventive method.
The present invention proposes a kind of detection method and system of dark chain, by setting up search engine spider user agent information bank, dark chain feature database and essential information storehouse, website, each webpage in traversal essential information storehouse, website, and info web and dark chain feature database are contrasted, if there is the feature in dark chain feature database, there is dark chain in the described page, otherwise simulation search engine spider user agent information, again the described page is asked, and whether compare its HASH value identical with the HASH value in essential information storehouse, website, if the same there is not dark chain in the described page, otherwise there is dark chain in the described page.Method of the present invention, without the need to artificial participation, can detect the dark chain in website and remove.Greatly improve the detection efficiency to dark chain.
As seen through the above description of the embodiments, those skilled in the art can be well understood to the mode that the present invention can add required general hardware platform by software and realizes.Based on such understanding, technical scheme of the present invention can embody with the form of software product the part that prior art contributes in essence in other words, this computer software product can be stored in storage medium, as ROM/RAM, magnetic disc, CD etc., comprising some instructions in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) perform the method described in some part of each embodiment of the present invention or embodiment.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, between each embodiment identical similar part mutually see, what each embodiment stressed is the difference with other embodiments.Especially, for system embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
The present invention can be used in numerous general or special purpose computing system environment or configuration.Such as: personal computer, server computer, handheld device or portable set, laptop device, multicomputer system, system, set top box, programmable consumer-elcetronics devices, network PC, small-size computer, mainframe computer, the distributed computing environment comprising above any system or equipment etc. based on microprocessor.
The present invention can describe in the general context of computer executable instructions, such as program module.Usually, program module comprises the routine, program, object, assembly, data structure etc. that perform particular task or realize particular abstract data type.Also can put into practice the present invention in a distributed computing environment, in these distributed computing environment, be executed the task by the remote processing devices be connected by communication network.In a distributed computing environment, program module can be arranged in the local and remote computer-readable storage medium comprising memory device.
Although depict the present invention by embodiment, those of ordinary skill in the art know, the present invention has many distortion and change and do not depart from spirit of the present invention, and the claim appended by wishing comprises these distortion and change and do not depart from spirit of the present invention.
Claims (4)
1. a detection method for dark chain, is characterized in that, comprising:
A. user agent's information bank of search engine spider is created;
Create dark chain feature database, the feature in described dark chain feature database comprises dark chain title and dark chain URL;
Create essential information storehouse, website, essential information storehouse, described website comprises all content of pages in website, the page corresponding URL, page HASH, hyperlinking names and URL;
B. all pages of all websites in essential information storehouse, website are traveled through one by one, until last end-of-page;
C. judge the feature whether had in the hyperlinking names of current page or URL in dark chain feature database, if so, then judge that current page exists dark chain, and delete the dark chain in the page; Otherwise, perform steps d;
D. user agent's information of simulation search engine spider, asks the above-mentioned page again, and calculates its page HASH;
E. judge that whether described page HASH value is identical with the page HASH value in essential information storehouse, website, if so, then there is not hidden chain in described website, returns step b; Otherwise described website exists hidden chain, perform step f;
F. by the page of again asking and website essential information storehouse comparison, information different in the page of again asking is deleted.
2. the method for claim 1, it is characterized in that, described step f, before deleting information different in the page of again asking, also comprise: extract information different in the page of again asking, and resolve, add the dark chain title obtained after parsing and dark chain URL to dark chain feature database.
3. a detection system for dark chain, is characterized in that, comprising:
MBM: for creating user agent's information bank of search engine spider;
Create dark chain feature database, the feature in described dark chain feature database comprises dark chain title and dark chain URL;
Create essential information storehouse, website, essential information storehouse, described website comprises all content of pages in website, the page corresponding URL, page HASH, hyperlinking names and URL;
Detection module, for traveling through all pages of all websites in essential information storehouse, website one by one, until last end-of-page;
Judge the feature whether had in the hyperlinking names of current page or URL in dark chain feature database, if so, then judge that current page exists dark chain, and delete the dark chain in the page; Otherwise user agent's information of simulation search engine spider, asks the above-mentioned page again, and calculate its page HASH;
Judge that whether described page HASH value is identical with the page HASH value in essential information storehouse, website, if so, then there is not hidden chain in described website; Otherwise described website exists hidden chain, enter disposal module;
Dispose module, for the page of will again ask and website essential information storehouse comparison, information different in the page that deletion is asked again.
4. system as claimed in claim 3, it is characterized in that, described disposal module, before deleting information different in the page of again asking, also comprise: extract information different in the page of again asking, and resolve, add the dark chain title obtained after parsing and dark chain URL to dark chain feature database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410807527.5A CN105488402A (en) | 2014-12-23 | 2014-12-23 | Dark link detection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410807527.5A CN105488402A (en) | 2014-12-23 | 2014-12-23 | Dark link detection method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105488402A true CN105488402A (en) | 2016-04-13 |
Family
ID=55675376
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410807527.5A Pending CN105488402A (en) | 2014-12-23 | 2014-12-23 | Dark link detection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105488402A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107784107A (en) * | 2017-10-31 | 2018-03-09 | 杭州安恒信息技术有限公司 | Dark chain detection method and device based on flight behavior analysis |
CN109067716A (en) * | 2018-07-18 | 2018-12-21 | 杭州安恒信息技术股份有限公司 | A kind of method and system identifying dark chain |
CN109522494A (en) * | 2018-11-08 | 2019-03-26 | 杭州安恒信息技术股份有限公司 | A kind of dark chain detection method, device, equipment and computer readable storage medium |
CN109784038A (en) * | 2018-12-29 | 2019-05-21 | 北京奇安信科技有限公司 | Detecting black chain method, apparatus, system and computer readable storage medium |
CN110309667A (en) * | 2019-04-16 | 2019-10-08 | 网宿科技股份有限公司 | A kind of dark chain detection method in website and device |
CN112487321A (en) * | 2020-12-08 | 2021-03-12 | 北京天融信网络安全技术有限公司 | Detection method, detection device, storage medium and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102446255A (en) * | 2011-12-30 | 2012-05-09 | 奇智软件(北京)有限公司 | Method and device for detecting page tamper |
CN102682097A (en) * | 2012-04-27 | 2012-09-19 | 北京神州绿盟信息安全科技股份有限公司 | Method and equipment for detecting secrete links in web page |
CN103902855A (en) * | 2013-12-17 | 2014-07-02 | 哈尔滨安天科技股份有限公司 | File tamper detecting and repairing method and system |
CN104036189A (en) * | 2014-05-16 | 2014-09-10 | 北京奇虎科技有限公司 | Page distortion detecting method and black link database generating method |
-
2014
- 2014-12-23 CN CN201410807527.5A patent/CN105488402A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102446255A (en) * | 2011-12-30 | 2012-05-09 | 奇智软件(北京)有限公司 | Method and device for detecting page tamper |
CN102682097A (en) * | 2012-04-27 | 2012-09-19 | 北京神州绿盟信息安全科技股份有限公司 | Method and equipment for detecting secrete links in web page |
CN103902855A (en) * | 2013-12-17 | 2014-07-02 | 哈尔滨安天科技股份有限公司 | File tamper detecting and repairing method and system |
CN104036189A (en) * | 2014-05-16 | 2014-09-10 | 北京奇虎科技有限公司 | Page distortion detecting method and black link database generating method |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107784107A (en) * | 2017-10-31 | 2018-03-09 | 杭州安恒信息技术有限公司 | Dark chain detection method and device based on flight behavior analysis |
CN107784107B (en) * | 2017-10-31 | 2020-06-30 | 杭州安恒信息技术股份有限公司 | Dark chain detection method and device based on escape behavior analysis |
CN109067716A (en) * | 2018-07-18 | 2018-12-21 | 杭州安恒信息技术股份有限公司 | A kind of method and system identifying dark chain |
CN109067716B (en) * | 2018-07-18 | 2021-05-28 | 杭州安恒信息技术股份有限公司 | Method and system for identifying dark chain |
CN109522494A (en) * | 2018-11-08 | 2019-03-26 | 杭州安恒信息技术股份有限公司 | A kind of dark chain detection method, device, equipment and computer readable storage medium |
CN109522494B (en) * | 2018-11-08 | 2020-09-15 | 杭州安恒信息技术股份有限公司 | Dark chain detection method, device, equipment and computer readable storage medium |
CN109784038A (en) * | 2018-12-29 | 2019-05-21 | 北京奇安信科技有限公司 | Detecting black chain method, apparatus, system and computer readable storage medium |
CN110309667A (en) * | 2019-04-16 | 2019-10-08 | 网宿科技股份有限公司 | A kind of dark chain detection method in website and device |
WO2020211130A1 (en) * | 2019-04-16 | 2020-10-22 | 网宿科技股份有限公司 | Hidden link detection method and apparatus for website |
CN110309667B (en) * | 2019-04-16 | 2022-08-30 | 网宿科技股份有限公司 | Website hidden link detection method and device |
CN112487321A (en) * | 2020-12-08 | 2021-03-12 | 北京天融信网络安全技术有限公司 | Detection method, detection device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105488402A (en) | Dark link detection method and system | |
Mahto et al. | A dive into Web Scraper world | |
CN102915295B (en) | Document detecting method and document detecting device | |
US20120284270A1 (en) | Method and device to detect similar documents | |
CN102662966B (en) | Method and system for obtaining subject-oriented dynamic page content | |
CN105608134A (en) | Multithreading-based web crawler system and web crawling method thereof | |
CN103077250B (en) | A kind of capturing webpage contents method and device | |
CN103605738A (en) | Webpage access data statistical method and webpage access data statistical device | |
CN105138558A (en) | User access content-based real-time personalized information collection method | |
CN103514189A (en) | Implementing method for web crawler based on search engines | |
CN105812196A (en) | WebShell detection method and electronic device | |
US20220292160A1 (en) | Automated system and method for creating structured data objects for a media-based electronic document | |
CN105528357A (en) | Webpage content extraction method based on similarity of URLs and similarity of webpage document structures | |
CN103823907A (en) | Method, device and engine for integrating on-line video resource addresses | |
CN104572934A (en) | Webpage key content extracting method based on DOM | |
CN106302849A (en) | A kind of method carrying out moving solid fusion by carrier data | |
CN104199893A (en) | System and method for publishing omnimedia contents fast | |
Brunelle et al. | Archival crawlers and JavaScript: discover more stuff but crawl more slowly | |
CN103744944A (en) | Method for re-filtering in webpage or data crawling by web crawler | |
CN103605742A (en) | Method and device for recognizing network resource entity content page | |
CN113743432A (en) | Image entity information acquisition method, device, electronic device and storage medium | |
CN104899320A (en) | Webpage repair method, terminal, server and system | |
CN108038233B (en) | Method and device for collecting articles, electronic equipment and storage medium | |
CN108574585B (en) | System fault solution obtaining method and device | |
CN104281693A (en) | Semantic search method and semantic search system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160413 |
|
WD01 | Invention patent application deemed withdrawn after publication |