CN105488402A - Dark link detection method and system - Google Patents

Dark link detection method and system Download PDF

Info

Publication number
CN105488402A
CN105488402A CN201410807527.5A CN201410807527A CN105488402A CN 105488402 A CN105488402 A CN 105488402A CN 201410807527 A CN201410807527 A CN 201410807527A CN 105488402 A CN105488402 A CN 105488402A
Authority
CN
China
Prior art keywords
page
dark chain
website
dark
chain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410807527.5A
Other languages
Chinese (zh)
Inventor
尹尚书
李柏松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Antiy Technology Co Ltd
Original Assignee
Harbin Antiy Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Antiy Technology Co Ltd filed Critical Harbin Antiy Technology Co Ltd
Priority to CN201410807527.5A priority Critical patent/CN105488402A/en
Publication of CN105488402A publication Critical patent/CN105488402A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention proposes a dark link detection method and system. The method comprises: establishing a search engine spider user agent information library, a dark link feature library and a website basic information library; traversing each webpage in the website basic information library; comparing webpage information with the dark link feature library, and if features in the dark link feature library exist, then indicating that a dark link exists in the page, otherwise, simulating search engine spider user agent information and re-requesting for the page; and comparing whether a HASH value of the page is the same as that in the website basic information library or not, and if the HASH value of the page is the same as that in the website basic information library, then indicating that the dark link does not exist in the page, otherwise, indicating that the dark link exists in the page. According to the method, the dark link in a website can be detected and deleted without the need for artificial participation, so that the detection efficiency of the dark link is greatly improved.

Description

A kind of detection method of dark chain and system
Technical field
The present invention relates to network safety filed, particularly a kind of detection method of dark chain and system.
Background technology
Along with the fast development of internet is with universal, Internet technology extends to social every field, and website shows abundant content as the carrier of information to people.But the content how interested to quick-searching people, the appearance of search engine solves this difficult problem, and the result of retrieval by information search with after integrating, according to priority can be showed user by search engine successively.
Dark chain is one of the most effective search engine optimization (SEO) mode of practising fraud, and this is also the most frequently used a kind of mode of hacker, invades website, then in webpage, implant dark chain, improve dark chain rank in a search engine, therefrom seek interests by batch.
Along with the antagonism of dark chain and its detection technique, hacker can judge that the UserAgent information in http protocol is search engine spider or browser access in some dark chain codes implanted, if browser access, then and hidden chain; Tradition adopts the method for the dark chain of manual detection, and efficiency is extremely low, and relies on merely the domain name feature of dark chain to go to detect, and cannot detect hidden chain.
Summary of the invention
For above-mentioned problems of the prior art, the present invention proposes a kind of detection method of dark chain, solve the limitation that dark chain detects, can the dark chain hidden be detected, also solve the hysteresis quality problem of manual detection.
A detection method for dark chain, comprising:
A. user agent's information bank of search engine spider is created;
Create dark chain feature database, the feature in described dark chain feature database comprises dark chain title and dark chain URL;
Create essential information storehouse, website, essential information storehouse, described website comprises all content of pages in website, the page corresponding URL, page HASH, hyperlinking names and URL;
B. all pages of all websites in essential information storehouse, website are traveled through one by one, until last end-of-page;
C. judge the feature whether had in the hyperlinking names of current page or URL in dark chain feature database, if so, then judge that current page exists dark chain, and delete the dark chain information in the page, the feature namely in dark chain feature database; Otherwise, perform steps d;
D. user agent's information of simulation search engine spider, asks the above-mentioned page again, and calculates its page HASH;
E. judge that whether described page HASH value is identical with the page HASH value in essential information storehouse, website, if so, then there is not hidden chain in described website, returns step b; Otherwise described website exists hidden chain, perform step f;
F. by the page of again asking and website essential information storehouse comparison, information different in the page of again asking is deleted.
In described method, described step f, before deleting information different in the page of again asking, also comprises: extract information different in the page of again asking, and resolve, and adds the dark chain title obtained after parsing and dark chain URL to dark chain feature database.
A detection system for dark chain, comprising:
MBM: for creating user agent's information bank of search engine spider;
Create dark chain feature database, the feature in described dark chain feature database comprises dark chain title and dark chain URL;
Create essential information storehouse, website, essential information storehouse, described website comprises all content of pages in website, the page corresponding URL, page HASH, hyperlinking names and URL;
Detection module, for traveling through all pages of all websites in essential information storehouse, website one by one, until last end-of-page;
Judge the feature whether had in the hyperlinking names of current page or URL in dark chain feature database, if so, then judge that current page exists dark chain, and delete the dark chain information in the page; Otherwise user agent's information of simulation search engine spider, asks the above-mentioned page again, and calculate its page HASH;
Judge that whether described page HASH value is identical with the page HASH value in essential information storehouse, website, if so, then there is not hidden chain in described website; Otherwise described website exists hidden chain, enter disposal module;
Dispose module, for the page of will again ask and website essential information storehouse comparison, information different in the page that deletion is asked again.
In described system, described disposal module, before deleting information different in the page of again asking, also comprises: extract information different in the page of again asking, and resolve, add the dark chain title obtained after parsing and dark chain URL to dark chain feature database.
Advantage of the present invention is, provides a kind of technology and the realization that detect dark chain, solves the limitation detecting dark chain at present, detects incomplete problem to dark chain feature based code, can detect the dark chain hidden; Also the hysteresis quality problem of manual detection is solved, higher than traditional detection mode efficiency.The present invention is based on user agent's simulation, adopt content matching mode simultaneously, not only can detect dark chain, the dark chain that hacker implants can also be recovered, solve the dark chain of traditional detection, features such as can not removing can only be looked into.Solve and the problem removed is disposed, without the need to manually participating in handling problems to dark chain.Solve traditional characteristic code and detect the artificial participation of dark chain needs, dispose, add the process that rule is a self-closing.And the inventive method is easy to expansion, is easy to safeguard, can fully automatic operation, there is no extra exploitation and the expense of manual maintenance, greatly facilitate realization and the O&M of the inventive method.
The present invention proposes a kind of detection method and system of dark chain, by setting up search engine spider user agent information bank, dark chain feature database and essential information storehouse, website, each webpage in traversal essential information storehouse, website, and info web and dark chain feature database are contrasted, if there is the feature in dark chain feature database, there is dark chain in the described page, otherwise simulation search engine spider user agent information, again the described page is asked, and whether compare its HASH value identical with the HASH value in essential information storehouse, website, if the same there is not dark chain in the described page, otherwise there is dark chain in the described page.Method of the present invention, without the need to artificial participation, can detect the dark chain in website and remove.Greatly improve the detection efficiency to dark chain.
Accompanying drawing explanation
In order to be illustrated more clearly in the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, the accompanying drawing that the following describes is only some embodiments recorded in the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the detection method embodiment process flow diagram of a kind of dark chain of the present invention;
Fig. 2 is the detection system example structure schematic diagram of a kind of dark chain of the present invention.
Embodiment
In order to make those skilled in the art person understand technical scheme in the embodiment of the present invention better, and enable above-mentioned purpose of the present invention, feature and advantage become apparent more, below in conjunction with accompanying drawing, technical scheme in the present invention is described in further detail.
For above-mentioned problems of the prior art, the present invention proposes a kind of detection method of dark chain, solve the limitation that dark chain detects, can the dark chain hidden be detected, also solve the hysteresis quality problem of manual detection.
A detection method for dark chain, as shown in Figure 1, comprising:
S101: the user agent's information bank creating search engine spider;
Create dark chain feature database, the feature in described dark chain feature database comprises dark chain title and dark chain URL;
Create essential information storehouse, website, essential information storehouse, described website comprises all content of pages in website, the page corresponding URL, page HASH, hyperlinking names and URL;
S102: all pages traveling through all websites in essential information storehouse, website one by one, until last end-of-page;
S103: judge the feature whether had in the hyperlinking names of current page or URL in dark chain feature database, if so, then judges that current page exists dark chain, and deletes the dark chain information in the page; Otherwise, perform S104;
S104: user agent's information of simulation search engine spider, asks the above-mentioned page again, and calculates its page HASH;
S105: judge that whether described page HASH value is identical with the page HASH value in essential information storehouse, website, if so, then described website does not exist hidden chain, returns S102; Otherwise described website exists hidden chain, perform S106;
S106: by the page of again asking and website essential information storehouse comparison, deletes information different in the page of again asking.Namely both do not exist together and are dark chain.
In described method, in described S106, before deleting information different in the page of again asking, also comprise: extract information different in the page of again asking, and resolve, add the dark chain title obtained after parsing and dark chain URL to dark chain feature database.
A detection system for dark chain, as shown in Figure 2, comprising:
MBM 201, for creating user agent's information bank of search engine spider;
Create dark chain feature database, the feature in described dark chain feature database comprises dark chain title and dark chain URL;
Create essential information storehouse, website, essential information storehouse, described website comprises all content of pages in website, the page corresponding URL, page HASH, hyperlinking names and URL;
Detection module 202, for traveling through all pages of all websites in essential information storehouse, website one by one, until last end-of-page;
Judge the feature whether had in the hyperlinking names of current page or URL in dark chain feature database, if so, then judge that current page exists dark chain, and delete the dark chain information in the page; Otherwise user agent's information of simulation search engine spider, asks the above-mentioned page again, and calculate its page HASH;
Judge that whether described page HASH value is identical with the page HASH value in essential information storehouse, website, if so, then there is not hidden chain in described website; Otherwise described website exists hidden chain, enter disposal module;
Dispose module 203, for the page of will again ask and website essential information storehouse comparison, information different in the page that deletion is asked again.
In described system, described disposal module, before deleting information different in the page of again asking, also comprises: extract information different in the page of again asking, and resolve, add the dark chain title obtained after parsing and dark chain URL to dark chain feature database.
Advantage of the present invention is, provides a kind of technology and the realization that detect dark chain, solves the limitation detecting dark chain at present, detects incomplete problem to dark chain feature based code, can detect the dark chain hidden; Also the hysteresis quality problem of manual detection is solved, higher than traditional detection mode efficiency.The present invention is based on user agent's simulation, adopt content matching mode simultaneously, not only can detect dark chain, the dark chain that hacker implants can also be recovered, solve the dark chain of traditional detection, features such as can not removing can only be looked into.Solve and the problem removed is disposed, without the need to manually participating in handling problems to dark chain.Solve traditional characteristic code and detect the artificial participation of dark chain needs, dispose, add the process that rule is a self-closing.And the inventive method is easy to expansion, is easy to safeguard, can fully automatic operation, there is no extra exploitation and the expense of manual maintenance, greatly facilitate realization and the O&M of the inventive method.
The present invention proposes a kind of detection method and system of dark chain, by setting up search engine spider user agent information bank, dark chain feature database and essential information storehouse, website, each webpage in traversal essential information storehouse, website, and info web and dark chain feature database are contrasted, if there is the feature in dark chain feature database, there is dark chain in the described page, otherwise simulation search engine spider user agent information, again the described page is asked, and whether compare its HASH value identical with the HASH value in essential information storehouse, website, if the same there is not dark chain in the described page, otherwise there is dark chain in the described page.Method of the present invention, without the need to artificial participation, can detect the dark chain in website and remove.Greatly improve the detection efficiency to dark chain.
As seen through the above description of the embodiments, those skilled in the art can be well understood to the mode that the present invention can add required general hardware platform by software and realizes.Based on such understanding, technical scheme of the present invention can embody with the form of software product the part that prior art contributes in essence in other words, this computer software product can be stored in storage medium, as ROM/RAM, magnetic disc, CD etc., comprising some instructions in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) perform the method described in some part of each embodiment of the present invention or embodiment.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, between each embodiment identical similar part mutually see, what each embodiment stressed is the difference with other embodiments.Especially, for system embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
The present invention can be used in numerous general or special purpose computing system environment or configuration.Such as: personal computer, server computer, handheld device or portable set, laptop device, multicomputer system, system, set top box, programmable consumer-elcetronics devices, network PC, small-size computer, mainframe computer, the distributed computing environment comprising above any system or equipment etc. based on microprocessor.
The present invention can describe in the general context of computer executable instructions, such as program module.Usually, program module comprises the routine, program, object, assembly, data structure etc. that perform particular task or realize particular abstract data type.Also can put into practice the present invention in a distributed computing environment, in these distributed computing environment, be executed the task by the remote processing devices be connected by communication network.In a distributed computing environment, program module can be arranged in the local and remote computer-readable storage medium comprising memory device.
Although depict the present invention by embodiment, those of ordinary skill in the art know, the present invention has many distortion and change and do not depart from spirit of the present invention, and the claim appended by wishing comprises these distortion and change and do not depart from spirit of the present invention.

Claims (4)

1. a detection method for dark chain, is characterized in that, comprising:
A. user agent's information bank of search engine spider is created;
Create dark chain feature database, the feature in described dark chain feature database comprises dark chain title and dark chain URL;
Create essential information storehouse, website, essential information storehouse, described website comprises all content of pages in website, the page corresponding URL, page HASH, hyperlinking names and URL;
B. all pages of all websites in essential information storehouse, website are traveled through one by one, until last end-of-page;
C. judge the feature whether had in the hyperlinking names of current page or URL in dark chain feature database, if so, then judge that current page exists dark chain, and delete the dark chain in the page; Otherwise, perform steps d;
D. user agent's information of simulation search engine spider, asks the above-mentioned page again, and calculates its page HASH;
E. judge that whether described page HASH value is identical with the page HASH value in essential information storehouse, website, if so, then there is not hidden chain in described website, returns step b; Otherwise described website exists hidden chain, perform step f;
F. by the page of again asking and website essential information storehouse comparison, information different in the page of again asking is deleted.
2. the method for claim 1, it is characterized in that, described step f, before deleting information different in the page of again asking, also comprise: extract information different in the page of again asking, and resolve, add the dark chain title obtained after parsing and dark chain URL to dark chain feature database.
3. a detection system for dark chain, is characterized in that, comprising:
MBM: for creating user agent's information bank of search engine spider;
Create dark chain feature database, the feature in described dark chain feature database comprises dark chain title and dark chain URL;
Create essential information storehouse, website, essential information storehouse, described website comprises all content of pages in website, the page corresponding URL, page HASH, hyperlinking names and URL;
Detection module, for traveling through all pages of all websites in essential information storehouse, website one by one, until last end-of-page;
Judge the feature whether had in the hyperlinking names of current page or URL in dark chain feature database, if so, then judge that current page exists dark chain, and delete the dark chain in the page; Otherwise user agent's information of simulation search engine spider, asks the above-mentioned page again, and calculate its page HASH;
Judge that whether described page HASH value is identical with the page HASH value in essential information storehouse, website, if so, then there is not hidden chain in described website; Otherwise described website exists hidden chain, enter disposal module;
Dispose module, for the page of will again ask and website essential information storehouse comparison, information different in the page that deletion is asked again.
4. system as claimed in claim 3, it is characterized in that, described disposal module, before deleting information different in the page of again asking, also comprise: extract information different in the page of again asking, and resolve, add the dark chain title obtained after parsing and dark chain URL to dark chain feature database.
CN201410807527.5A 2014-12-23 2014-12-23 Dark link detection method and system Pending CN105488402A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410807527.5A CN105488402A (en) 2014-12-23 2014-12-23 Dark link detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410807527.5A CN105488402A (en) 2014-12-23 2014-12-23 Dark link detection method and system

Publications (1)

Publication Number Publication Date
CN105488402A true CN105488402A (en) 2016-04-13

Family

ID=55675376

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410807527.5A Pending CN105488402A (en) 2014-12-23 2014-12-23 Dark link detection method and system

Country Status (1)

Country Link
CN (1) CN105488402A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107784107A (en) * 2017-10-31 2018-03-09 杭州安恒信息技术有限公司 Dark chain detection method and device based on flight behavior analysis
CN109067716A (en) * 2018-07-18 2018-12-21 杭州安恒信息技术股份有限公司 A kind of method and system identifying dark chain
CN109522494A (en) * 2018-11-08 2019-03-26 杭州安恒信息技术股份有限公司 A kind of dark chain detection method, device, equipment and computer readable storage medium
CN109784038A (en) * 2018-12-29 2019-05-21 北京奇安信科技有限公司 Detecting black chain method, apparatus, system and computer readable storage medium
CN110309667A (en) * 2019-04-16 2019-10-08 网宿科技股份有限公司 A kind of dark chain detection method in website and device
CN112487321A (en) * 2020-12-08 2021-03-12 北京天融信网络安全技术有限公司 Detection method, detection device, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102446255A (en) * 2011-12-30 2012-05-09 奇智软件(北京)有限公司 Method and device for detecting page tamper
CN102682097A (en) * 2012-04-27 2012-09-19 北京神州绿盟信息安全科技股份有限公司 Method and equipment for detecting secrete links in web page
CN103902855A (en) * 2013-12-17 2014-07-02 哈尔滨安天科技股份有限公司 File tamper detecting and repairing method and system
CN104036189A (en) * 2014-05-16 2014-09-10 北京奇虎科技有限公司 Page distortion detecting method and black link database generating method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102446255A (en) * 2011-12-30 2012-05-09 奇智软件(北京)有限公司 Method and device for detecting page tamper
CN102682097A (en) * 2012-04-27 2012-09-19 北京神州绿盟信息安全科技股份有限公司 Method and equipment for detecting secrete links in web page
CN103902855A (en) * 2013-12-17 2014-07-02 哈尔滨安天科技股份有限公司 File tamper detecting and repairing method and system
CN104036189A (en) * 2014-05-16 2014-09-10 北京奇虎科技有限公司 Page distortion detecting method and black link database generating method

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107784107A (en) * 2017-10-31 2018-03-09 杭州安恒信息技术有限公司 Dark chain detection method and device based on flight behavior analysis
CN107784107B (en) * 2017-10-31 2020-06-30 杭州安恒信息技术股份有限公司 Dark chain detection method and device based on escape behavior analysis
CN109067716A (en) * 2018-07-18 2018-12-21 杭州安恒信息技术股份有限公司 A kind of method and system identifying dark chain
CN109067716B (en) * 2018-07-18 2021-05-28 杭州安恒信息技术股份有限公司 Method and system for identifying dark chain
CN109522494A (en) * 2018-11-08 2019-03-26 杭州安恒信息技术股份有限公司 A kind of dark chain detection method, device, equipment and computer readable storage medium
CN109522494B (en) * 2018-11-08 2020-09-15 杭州安恒信息技术股份有限公司 Dark chain detection method, device, equipment and computer readable storage medium
CN109784038A (en) * 2018-12-29 2019-05-21 北京奇安信科技有限公司 Detecting black chain method, apparatus, system and computer readable storage medium
CN110309667A (en) * 2019-04-16 2019-10-08 网宿科技股份有限公司 A kind of dark chain detection method in website and device
WO2020211130A1 (en) * 2019-04-16 2020-10-22 网宿科技股份有限公司 Hidden link detection method and apparatus for website
CN110309667B (en) * 2019-04-16 2022-08-30 网宿科技股份有限公司 Website hidden link detection method and device
CN112487321A (en) * 2020-12-08 2021-03-12 北京天融信网络安全技术有限公司 Detection method, detection device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN105488402A (en) Dark link detection method and system
Mahto et al. A dive into Web Scraper world
CN102915295B (en) Document detecting method and document detecting device
US20120284270A1 (en) Method and device to detect similar documents
CN102662966B (en) Method and system for obtaining subject-oriented dynamic page content
CN105608134A (en) Multithreading-based web crawler system and web crawling method thereof
CN103077250B (en) A kind of capturing webpage contents method and device
CN103605738A (en) Webpage access data statistical method and webpage access data statistical device
CN105138558A (en) User access content-based real-time personalized information collection method
CN103514189A (en) Implementing method for web crawler based on search engines
CN105812196A (en) WebShell detection method and electronic device
US20220292160A1 (en) Automated system and method for creating structured data objects for a media-based electronic document
CN105528357A (en) Webpage content extraction method based on similarity of URLs and similarity of webpage document structures
CN103823907A (en) Method, device and engine for integrating on-line video resource addresses
CN104572934A (en) Webpage key content extracting method based on DOM
CN106302849A (en) A kind of method carrying out moving solid fusion by carrier data
CN104199893A (en) System and method for publishing omnimedia contents fast
Brunelle et al. Archival crawlers and JavaScript: discover more stuff but crawl more slowly
CN103744944A (en) Method for re-filtering in webpage or data crawling by web crawler
CN103605742A (en) Method and device for recognizing network resource entity content page
CN113743432A (en) Image entity information acquisition method, device, electronic device and storage medium
CN104899320A (en) Webpage repair method, terminal, server and system
CN108038233B (en) Method and device for collecting articles, electronic equipment and storage medium
CN108574585B (en) System fault solution obtaining method and device
CN104281693A (en) Semantic search method and semantic search system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160413

WD01 Invention patent application deemed withdrawn after publication