CN107370718A - The detection method and device of black chain in webpage - Google Patents

The detection method and device of black chain in webpage Download PDF

Info

Publication number
CN107370718A
CN107370718A CN201610319264.2A CN201610319264A CN107370718A CN 107370718 A CN107370718 A CN 107370718A CN 201610319264 A CN201610319264 A CN 201610319264A CN 107370718 A CN107370718 A CN 107370718A
Authority
CN
China
Prior art keywords
domain name
webpage
detected
chain
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610319264.2A
Other languages
Chinese (zh)
Other versions
CN107370718B (en
Inventor
靳荣纪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shenxinfu Electronic Technology Co Ltd
Original Assignee
Shenzhen Shenxinfu Electronic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Shenxinfu Electronic Technology Co Ltd filed Critical Shenzhen Shenxinfu Electronic Technology Co Ltd
Priority to CN201610319264.2A priority Critical patent/CN107370718B/en
Publication of CN107370718A publication Critical patent/CN107370718A/en
Application granted granted Critical
Publication of CN107370718B publication Critical patent/CN107370718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2119Authenticating web pages, e.g. with suspicious links

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of detection method of black chain in webpage, including:Obtain the second domain name type corresponding to the URL of the first domain name type and the webpage to be detected corresponding to the URL of exterior chain in webpage to be detected;Obtain the first similarity difference of the first domain name type and the second domain name type;When the first similarity difference is more than predetermined threshold value, judge black chain be present in the webpage to be detected.The invention also discloses a kind of detection means of black chain in webpage.The probability that the mode that the present invention detects black chain is judged by accident is relatively low, improves the accuracy of detecting black chain.

Description

The detection method and device of black chain in webpage
Technical field
The present invention relates to the detection method and device of black chain in communication technical field, more particularly to a kind of webpage.
Background technology
With the increasing to internet illegal contents hitting dynamics, pass through normal channels (e.g., search engine) It is fewer and fewer directly to have access to the situation of illegal objectionable website entrance, therefore these illegal objectionable website products Other approach are found in pole, increase the chance that oneself is arrived by dereference.In nets such as government, education, industries It is that one of most efficient method, black chain can be effectively around search engines to non-among these that black chain is hung on standing The filtering of method harmful content, and cheat search engine and include illegal harmful content.
Traditional detecting black chain method is typically to check whether exterior chain is black to determine whether with hiding Styles Chain, because the exterior chain to hide Styles is exactly not necessarily black chain, hide link effect meanwhile, it is capable to produce Pattern is too numerous to mention, will realize that detection is hardly possible to every kind of pattern, in addition, black chain is also not necessarily The exterior chain exactly to stash, creates collapse directories in website, is wherein creating comprising illegal bad interior The webpage of appearance is also common black chain form, but this black chain is not easy by traditional detecting black chain method inspection Measure, therefore, the detection method detecting black chain precision of this kind of black chain is relatively low.
The content of the invention
It is a primary object of the present invention to propose the detection method and device of black chain in a kind of webpage, it is intended to solve The certainly relatively low technical problem of detecting black chain precision in the prior art.
To achieve the above object, the present invention provides a kind of detection method of black chain in webpage, in the webpage The detection method of black chain comprises the following steps:
Obtain the first domain name type corresponding to the URL of exterior chain and the webpage to be detected in webpage to be detected URL corresponding to the second domain name type;
Obtain the first similarity difference of the first domain name type and the second domain name type;
When the first similarity difference is more than predetermined threshold value, exist in the judgement webpage to be detected black Chain.
Alternatively, the similarity difference for obtaining the first domain name type and the second domain name type The step of after, the detection method of black chain also includes step in the webpage:
Judge to whether there is content of text in the exterior chain of the webpage to be detected;
When content of text be present in having the exterior chain, obtain corresponding to the keyword in the content of text First keyword type;
Obtain the of first keyword type, second keyword type corresponding with the webpage to be detected Two similarity differences, and the second similarity difference is superimposed to the first similarity difference and obtains Three similarity differences;
When the third phase is more than the predetermined threshold value like degree difference, judge to deposit in the webpage to be detected In black chain;
Content of text is not present in the exterior chain, and in the first similarity difference more than described pre- If during threshold value, perform in the judgement webpage to be detected and the step of black chain be present.
Alternatively, each first keyword type of acquisition is corresponding with the URL of webpage to be detected Before the step of second similarity difference of the second keyword type, the detection method of black chain in the webpage Also include step:
Obtain the search engine sensitive tags in the webpage to be detected;
Obtain keyword type corresponding to each search engine sensitive tags, and by the pass of acquisition Keyword type is as the second keyword type corresponding to the webpage to be detected.
Alternatively, it is described to obtain the first domain name type and institute corresponding to the URL of exterior chain in webpage to be detected Before the step of stating the second domain name type corresponding to the URL of webpage to be detected, the inspection of black chain in the webpage Survey method also includes step:
Obtain the first domain name in each URL inserted in webpage to be detected and the URL in webpage to be detected In the second domain name;
Obtain in first domain name with unmatched 3rd domain name of second domain name;
Will the URL corresponding with the 3rd domain name as exterior chain, wherein, first domain name with It is described when second domain name is identical, or during subdomain name that first domain name is second domain name First domain name matches with second domain name.
Alternatively, it is described to obtain the first domain name type and institute corresponding to the URL of exterior chain in webpage to be detected The step of stating the second domain name type corresponding to the URL of webpage to be detected includes:
Domain name type where the domain name according to corresponding to exterior chain determines the first domain corresponding to the URL of the exterior chain Name type, and determine URL pairs of the webpage to be detected according to the domain name type where second domain name The the second domain name type answered.
In addition, to achieve the above object, the present invention also proposes a kind of detection means of black chain in webpage, institute The detection means for stating black chain in webpage comprises the following steps:
Acquisition module, for obtain in webpage to be detected first domain name type corresponding to the URL of exterior chain and Second domain name type corresponding to the URL of the webpage to be detected;
Similarity difference calculating module, for obtaining the first domain name type and the second domain name type The first similarity difference;
Black chain determination module, for when the first similarity difference is more than predetermined threshold value, described in judgement Black chain in webpage to be detected be present.
Alternatively, the detection means of black chain also includes in the webpage:
Judge module, it whether there is content of text in the exterior chain for judging the webpage to be detected;
The black chain determination module, it is additionally operable in the exterior chain that content of text is not present, and described When first similarity difference is more than predetermined threshold value, judge black chain be present in the webpage to be detected;
The acquisition module, it is additionally operable to, when content of text be present in having the exterior chain, obtain the text First keyword type corresponding to keyword in content;
The similarity difference calculating module, be additionally operable to obtain first keyword type with it is described to be checked Second similarity difference of the second keyword type corresponding to survey grid page, and by the second similarity difference It is superimposed to the first similarity difference and obtains third phase and seemingly spends difference;
The black chain determination module, it is additionally operable to when the third phase is more than the predetermined threshold value like degree difference, Judge black chain be present in the webpage to be detected.
Alternatively, the acquisition module is additionally operable to:
Obtain the search engine sensitive tags in the webpage to be detected;
Obtain keyword type corresponding to each search engine sensitive tags, and by the pass of acquisition Keyword type is as the second keyword type corresponding to the webpage to be detected.
Alternatively, the detection means of black chain also includes in the webpage:
The acquisition module, it is additionally operable to obtain the first domain name in each URL inserted in webpage to be detected With the second domain name in the URL in webpage to be detected, and obtain in first domain name with described second Unmatched 3rd domain name of domain name;
Processing module, for will the URL corresponding with the 3rd domain name as exterior chain, wherein, When first domain name is identical with second domain name, or first domain name is second domain name During subdomain name, first domain name matches with second domain name.
Alternatively, the acquisition module, the domain name type where being additionally operable to the domain name according to corresponding to exterior chain are true First domain name type corresponding to the URL of the fixed exterior chain, and according to the domain name kind where second domain name Type determines the second domain name type corresponding to the URL of the webpage to be detected.
The detection method and device of black chain in webpage proposed by the present invention, after black chain is inserted in webpage, Corresponding type correlation can differ very more to type corresponding to black chain in itself with webpage, then can pass through acquisition The URL of first domain name type and webpage to be detected corresponding to the URL of exterior chain is corresponding in webpage to be detected The second domain name type, and obtain the first similarity of the first domain name type and the second domain name type Difference, to determine whether to exist black chain, when the first similarity difference is more than predetermined threshold value, illustrate insertion Exterior chain and webpage to be detected between style differences it is very big, now can determine that and exist in webpage to be detected Black chain, the probability that the mode that this kind detects black chain is judged by accident is relatively low, improves the accuracy of detecting black chain.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the detection method first embodiment of black chain in webpage of the present invention;
Fig. 2 is the schematic flow sheet of the detection method second embodiment of black chain in webpage of the present invention;
Fig. 3 is the schematic flow sheet of the detection method 3rd embodiment of black chain in webpage of the present invention;
Fig. 4 is the high-level schematic functional block diagram of the detection means first embodiment of black chain in webpage of the present invention;
Fig. 5 is the high-level schematic functional block diagram of the detection means second embodiment of black chain in webpage of the present invention;
Fig. 6 is the high-level schematic functional block diagram of the detection means 3rd embodiment of black chain in webpage of the present invention.
The realization, functional characteristics and advantage of the object of the invention will be done further referring to the drawings in conjunction with the embodiments Explanation.
Embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to limit The fixed present invention.
The present invention provides a kind of detection method of black chain in webpage.
Reference picture 1, Fig. 1 are the schematic flow sheet of the embodiment of detection method one of black chain in webpage of the present invention.
The present embodiment proposes a kind of detection method of black chain in webpage, and the detection method of black chain includes in webpage:
Step S10, obtain in webpage to be detected first domain name type corresponding to the URL of exterior chain and to be checked Second domain name type corresponding to the URL of survey grid page;
Predeterminable domain name typelib, prestore in the domain name typelib between each domain name and domain name type Mapping relations, then can obtain the domain name in the URL of exterior chain, and by each domain name of acquisition and default domain Domain name in name typelib is compared, and to get the first domain name type, similarly obtains webpage to be detected URL in domain name, and the domain name of acquisition is compared with the domain name in default domain name typelib, with Obtain the second domain name typelib.The exterior chain of webpage to be detected is connected to outside website corresponding to webpage to be detected Web page interlinkage.
It is understood that be to improve the efficiency for obtaining domain name type, then by the domain name in the URL of exterior chain When being compared with the domain name in domain name typelib, can occur according to domain name type the probability of black chain by greatly to It is small that each domain name type is ranked up, and compare the domain name got and each domain successively according to the order There is the general of black chain in domain name in name type, such as the domain name type of gambling class, pornographic class and game class Rate is very big, then can be first by the domain name in the URL of exterior chain and the domain of gambling class, pornographic class and game class Domain name in name type is compared;Similarly, by the domain name in the URL of webpage to be detected with and domain name When domain name in typelib is compared, the probability that black chain can be inserted into according to domain name type is descending right Each domain name type is ranked up, such as government's class and educational domain name are inserted into the probability of black chain very Height, then can be first by the domain name in the URL of webpage to be detected and government's class and educational domain name type Domain name be compared.
Step S20, obtain the first similarity difference of each first domain name type and the second domain name type;
Generally realized when black chain is generally inserted in webpage by inserting multiple exterior chains, then predeterminable each domain Similarity value corresponding to name type, and ask difference to obtain each first domain name type and the second domain name type Similarity difference, the similarity difference is absolute value the first domain name type and the second domain name in the present embodiment Similarity value corresponding to type carries out seeking the absolute value obtained after difference operation, and the first similarity difference is to obtaining To each similarity difference carry out summation operation and obtain.
Step S30, when the first similarity difference is more than predetermined threshold value, judge to exist in webpage to be detected black Chain.
Illustrate when the first similarity difference is more than predetermined threshold value each exterior chain for being inserted in webpage to be detected with Correlation between webpage to be detected is very small, then can determine that in the webpage to be detected black chain be present.
It is understood that when black chain be present in webpage to be detected, exportable webpage reliability is relatively low Prompt message, the exterior chain in the webpage to be detected is also can remove, or to intercept the webpage to be detected etc. a variety of Processing mode, specific processing mode can as needed be set by developer, will not be repeated here.
The detection method of black chain, black after black chain is inserted in webpage in the webpage that the present embodiment proposes Corresponding type correlation can differ very more to type corresponding to chain in itself with webpage, then can be treated by obtaining Detect in webpage corresponding to the URL of the first domain name type and webpage to be detected corresponding to the URL of exterior chain Second domain name type, and the first similarity difference of the first domain name type and the second domain name type is obtained, to sentence Surely it whether there is black chain, when the first similarity difference is more than predetermined threshold value, illustrate the exterior chain of insertion with treating The style differences detected between webpage are very big, now can determine that black chain, this kind in webpage to be detected be present It is relatively low to detect the probability that the mode of black chain is judged by accident, improves the accuracy of detecting black chain.
Further, reference picture 2, the detection method of black chain in webpage of the present invention is proposed based on first embodiment Second embodiment, in the present embodiment, after step S20, the detection method of black chain also includes in webpage Step:
Step S40, judge to whether there is content of text in webpage to be detected;
Step S50, when content of text in having exterior chain be present, obtain corresponding to the keyword in content of text First keyword type;
Step S60, it is similar to obtain the first keyword type second keyword type corresponding with webpage to be detected Difference is spent, and the second similarity difference is superimposed to the first similarity difference obtains third phase and seemingly spend difference;
When third phase is more than predetermined threshold value like degree difference, step S30 is performed;
Step S70, content of text is not present in exterior chain, it is pre- to judge whether the first similarity difference is more than If threshold value;
When the first similarity difference is more than predetermined threshold value, step S30 is performed.
In the present embodiment, the correlation of keyword type can further be increased to determine whether there is black chain, Keyword, such as " the private clothes of legend " and " swimsuit pin may be attached in the possible linked contents of exterior chain Sell " etc., then need to obtain the keyword in the exterior chain, and by the keyword of acquisition and default keyword Keyword in typelib is compared, and keyword type and keyword are prestored in the keyword type storehouse Between mapping relations, specific comparison process is similar to the comparison process of domain name typelib, no longer superfluous herein State.First key types of the acquisition may be it is multiple, then be calculated each first key types with After the similarity difference of second keyword type, the similarity difference is overlapped, to obtain the second phase Like degree difference.
Keyword in webpage to be detected can be obtained by the search engine sensitive tags in web page interlinkage to be detected Get, be i.e. also include step before step S60:
Obtain the search engine sensitive tags in webpage to be detected;
Obtain keyword type corresponding to each search engine sensitive tags, and by the keyword type of acquisition As the second keyword type corresponding to webpage to be detected.
The search engine sensitive tags can be that the title (title) of webpage and keyword etc. are obtained in web page interlinkage Arrive, compared the keyword of extraction and the keyword in keyword type storehouse after extracting the keyword It is right, to get the second keyword type.
Further, reference picture 3, the inspection of black chain in webpage of the present invention is proposed based on first or second embodiments Device 3rd embodiment is surveyed, in the present embodiment, also includes step before step S10:
Step S80, obtain the first domain name in each URL inserted in webpage to be detected and survey grid to be checked The second domain name in URL in page;
Step S90, obtain the first domain name in unmatched 3rd domain name of the second domain name;
Step S100, using URL corresponding to the 3rd domain name as exterior chain, wherein, in the first domain name and second When domain name is identical, or during subdomain name that the first domain name is the second domain name, the first domain name and the second domain name Match somebody with somebody.
When the first domain name in the URL of insertion is the subdomain name of the second domain name, illustrate the URL of the insertion For the sublink of webpage to be detected.
It is understood that step S10 includes:Domain name type where the domain name according to corresponding to exterior chain is true Determine the first domain name type corresponding to the URL of exterior chain, and determine to treat according to the domain name type where the second domain name Detect the second domain name type corresponding to the URL of webpage.
The present invention further provides a kind of detection means of black chain in webpage.
Reference picture 4, Fig. 4 are that the functional module of the detection means preferred embodiment of black chain in webpage of the present invention is shown It is intended to.
It is emphasized that it will be apparent to those skilled in the art that functional block diagram is only shown in Fig. 4 The exemplary plot of one preferred embodiment, those skilled in the art's black chain in the webpage shown in Fig. 4 The functional module of detection means, the supplement of new functional module can be carried out easily;The title of each functional module It is self-defined title, is only used for each program function block that auxiliary understands the detection means of black chain in webpage, Restriction technical scheme is not used in, the core of technical solution of the present invention is, each self-defined title The function to be reached of functional module.
The present embodiment proposes a kind of detection means of black chain in webpage, and the detection means of black chain includes in webpage:
Acquisition module 10, for obtain in webpage to be detected first domain name type corresponding to the URL of exterior chain with And the second domain name type corresponding to the URL of webpage to be detected;
Predeterminable domain name typelib, prestore in the domain name typelib between each domain name and domain name type Mapping relations, then can obtain the domain name in the URL of exterior chain, and by each domain name of acquisition and default domain Domain name in name typelib is compared, and to get the first domain name type, similarly obtains webpage to be detected URL in domain name, and the domain name of acquisition is compared with the domain name in default domain name typelib, with Obtain the second domain name typelib.The exterior chain of webpage to be detected is connected to outside website corresponding to webpage to be detected Web page interlinkage.
It is understood that be to improve the efficiency for obtaining domain name type, then by the domain name in the URL of exterior chain When being compared with the domain name in domain name typelib, can occur according to domain name type the probability of black chain by greatly to It is small that each domain name type is ranked up, and compare the domain name got and each domain successively according to the order There is the general of black chain in domain name in name type, such as the domain name type of gambling class, pornographic class and game class Rate is very big, then can be first by the domain name in the URL of exterior chain and the domain of gambling class, pornographic class and game class Domain name in name type is compared;Similarly, by the domain name in the URL of webpage to be detected with and domain name When domain name in typelib is compared, the probability that black chain can be inserted into according to domain name type is descending right Each domain name type is ranked up, such as government's class and educational domain name are inserted into the probability of black chain very Height, then can be first by the domain name in the URL of webpage to be detected and government's class and educational domain name type Domain name be compared.
Similarity difference calculating module 20, for obtaining the first of the first domain name type and the second domain name type Similarity difference;
Generally realized when black chain is generally inserted in webpage by inserting multiple exterior chains, then predeterminable each domain Similarity value corresponding to name type, and ask difference to obtain each first domain name type and the second domain name type Similarity difference, the similarity difference is absolute value the first domain name type and the second domain name in the present embodiment Similarity value corresponding to type carries out seeking the absolute value obtained after difference operation, and the first similarity difference is to obtaining To each similarity difference carry out summation operation and obtain.
Black chain determination module 30, for when the first similarity difference is more than predetermined threshold value, judging to be detected Black chain in webpage be present.
Illustrate when the first similarity difference is more than predetermined threshold value each exterior chain for being inserted in webpage to be detected with Correlation between webpage to be detected is very small, then can determine that in the webpage to be detected black chain be present.
It is understood that when black chain be present in webpage to be detected, exportable webpage reliability is relatively low Prompt message, the exterior chain in the webpage to be detected is also can remove, or to intercept the webpage to be detected etc. a variety of Processing mode, specific processing mode can as needed be set by developer, will not be repeated here.
The detection means of black chain, black after black chain is inserted in webpage in the webpage that the present embodiment proposes Corresponding type correlation can differ very more to type corresponding to chain in itself with webpage, then can be treated by obtaining Detect in webpage corresponding to the URL of the first domain name type and webpage to be detected corresponding to the URL of exterior chain Second domain name type, and the first similarity difference of the first domain name type and the second domain name type is obtained, to sentence Surely it whether there is black chain, when the first similarity difference is more than predetermined threshold value, illustrate the exterior chain of insertion with treating The style differences detected between webpage are very big, now can determine that black chain, this kind in webpage to be detected be present It is relatively low to detect the probability that the mode of black chain is judged by accident, improves the accuracy of detecting black chain.
Further, reference picture 5, the detection means of black chain in webpage of the present invention is proposed based on first embodiment Second embodiment, in the present embodiment, the detection means of black chain also includes in webpage:
Judge module 40, it whether there is content of text in the exterior chain for judging webpage to be detected;
Black chain determination module 30, it is additionally operable in exterior chain that content of text is not present, and in the first similarity When difference is more than predetermined threshold value, judge black chain in webpage to be detected be present;
Acquisition module 10, being additionally operable to when content of text in having exterior chain be present, obtaining the pass in content of text First keyword type corresponding to keyword;
Similarity difference calculating module 20, it is corresponding with webpage to be detected to be additionally operable to the first keyword type of acquisition The second keyword type the second similarity difference, and it is similar that the second similarity difference is superimposed into first Degree difference obtains third phase and seemingly spends difference;
Black chain determination module 30, it is additionally operable to, when third phase is more than predetermined threshold value like degree difference, judge to be checked Black chain in survey grid page be present.
In the present embodiment, the correlation of keyword type can further be increased to determine whether there is black chain, Keyword, such as " the private clothes of legend " and " swimsuit pin may be attached in the possible linked contents of exterior chain Sell " etc., then need to obtain the keyword in the exterior chain, and by the keyword of acquisition and default keyword Keyword in typelib is compared, and keyword type and keyword are prestored in the keyword type storehouse Between mapping relations, specific comparison process is similar to the comparison process of domain name typelib, no longer superfluous herein State.First key types of the acquisition may be it is multiple, then be calculated each first key types with After the similarity difference of second keyword type, the similarity difference is overlapped, to obtain the second phase Like degree difference.
Keyword in webpage to be detected can be obtained by the search engine sensitive tags in web page interlinkage to be detected Get, i.e., acquisition module 10 is additionally operable to:
Obtain the search engine sensitive tags in webpage to be detected;
Obtain keyword type corresponding to each search engine sensitive tags, and by the keyword type of acquisition As the second keyword type corresponding to webpage to be detected.
The search engine sensitive tags can be that the title (title) of webpage and keyword etc. are obtained in web page interlinkage Arrive, compared the keyword of extraction and the keyword in keyword type storehouse after extracting the keyword It is right, to get the second keyword type.
Further, reference picture 3, the inspection of black chain in webpage of the present invention is proposed based on first or second embodiments Survey method 3rd embodiment, in the present embodiment, the detection means of black chain also includes in the webpage:
Acquisition module 10, be additionally operable to obtain the first domain name in each URL inserted in webpage to be detected with The second domain name in URL in webpage to be detected, and obtain in the first domain name and mismatched with the second domain name The 3rd domain name;
Processing module 50, for using URL corresponding to the 3rd domain name as exterior chain, wherein, in the first domain name When identical with the second domain name, or the first domain name be the second domain name subdomain name when, the first domain name and second Domain name matches.
When the first domain name in the URL of insertion is the subdomain name of the second domain name, illustrate the URL of the insertion For the sublink of webpage to be detected.
It is understood that acquisition module 10, the domain name kind being additionally operable to where the domain name according to corresponding to exterior chain Type determines the first domain name type corresponding to the URL of exterior chain, and true according to the domain name type where the second domain name Second domain name type corresponding to the URL of fixed webpage to be detected.
It should be noted that herein, term " comprising ", "comprising" or its any other variant Be intended to contain including for the nonexcludability so that process, method, article including a series of elements or Person's device not only includes those key elements, but also the other element including being not expressly set out, or also Including for this process, method, article or the intrinsic key element of device.In the feelings not limited more Under condition, the key element that is limited by sentence "including a ...", it is not excluded that the process including the key element, Other identical element in method, article or device also be present.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that above-mentioned Embodiment method can add the mode of required general hardware platform to realize by software, naturally it is also possible to logical Cross hardware, but the former is more preferably embodiment in many cases.It is of the invention based on such understanding The part that technical scheme substantially contributes to prior art in other words can in the form of software product body Reveal and, the computer software product is stored in storage medium (such as ROM/RAM, magnetic disc, a light Disk) in, including some instructions to cause a station terminal equipment (can be mobile phone, computer, high in the clouds Server, air conditioner, or network equipment etc.) perform method described in each embodiment of the present invention.
The preferred embodiments of the present invention are these are only, are not intended to limit the scope of the invention, it is every The equivalent structure or equivalent flow conversion made using description of the invention and accompanying drawing content, or directly or Connect and be used in other related technical areas, be included within the scope of the present invention.

Claims (10)

  1. A kind of 1. detection method of black chain in webpage, it is characterised in that the detection side of black chain in the webpage Method comprises the following steps:
    Obtain the first domain name type corresponding to the URL of exterior chain and the survey grid to be checked in webpage to be detected Second domain name type corresponding to the URL of page;
    Obtain the first similarity difference of the first domain name type and the second domain name type;
    When the first similarity difference is more than predetermined threshold value, exist in the judgement webpage to be detected black Chain.
  2. 2. the detection method of black chain in webpage as claimed in claim 1, it is characterised in that the acquisition After the step of similarity difference of the first domain name type and the second domain name type, the webpage In the detection method of black chain also include step:
    Judge to whether there is content of text in the exterior chain of the webpage to be detected;
    When content of text be present in having the exterior chain, obtain corresponding to the keyword in the content of text First keyword type;
    Obtain the of first keyword type, second keyword type corresponding with the webpage to be detected Two similarity differences, and the second similarity difference is superimposed to the first similarity difference and obtains Three similarity differences;
    When the third phase is more than the predetermined threshold value like degree difference, judge to deposit in the webpage to be detected In black chain;
    Content of text is not present in the exterior chain, and in the first similarity difference more than described pre- If during threshold value, perform in the judgement webpage to be detected and the step of black chain be present.
  3. 3. the detection method of black chain in webpage as claimed in claim 2, it is characterised in that the acquisition The of each first keyword type second keyword type corresponding with the URL of webpage to be detected Before the step of two similarity differences, the detection method of black chain also includes step in the webpage:
    Obtain the search engine sensitive tags in the webpage to be detected;
    Obtain keyword type corresponding to each search engine sensitive tags, and by the pass of acquisition Keyword type is as the second keyword type corresponding to the webpage to be detected.
  4. 4. the detection method of black chain in the webpage as described in claim any one of 1-3, it is characterised in that It is described to obtain the first domain name type and the survey grid to be checked corresponding to the URL of exterior chain in webpage to be detected Page URL corresponding to the second domain name type the step of before, the detection method of black chain is also wrapped in the webpage Include step:
    Obtain in the first domain name and the webpage to be detected in each URL inserted in webpage to be detected The second domain name in URL;
    Obtain in first domain name with unmatched 3rd domain name of second domain name;
    Using the URL corresponding to the 3rd domain name as exterior chain, wherein, in first domain name and institute State the second domain name it is identical when, or during subdomain name that first domain name is second domain name, described the One domain name matches with second domain name.
  5. 5. the detection method of black chain in webpage as claimed in claim 4, it is characterised in that the acquisition The URL of first domain name type and the webpage to be detected corresponding to the URL of exterior chain in webpage to be detected The step of corresponding second domain name type, includes:
    According to corresponding to the domain name type where domain name corresponding to the exterior chain determines the URL of the exterior chain First domain name type, and the webpage to be detected is determined according to the domain name type where second domain name Second domain name type corresponding to URL.
  6. A kind of 6. detection means of black chain in webpage, it is characterised in that the detection dress of black chain in the webpage Put and comprise the following steps:
    Acquisition module, for obtain in webpage to be detected first domain name type corresponding to the URL of exterior chain and Second domain name type corresponding to the URL of the webpage to be detected;
    Similarity difference calculating module, for obtaining the first domain name type and the second domain name type The first similarity difference;
    Black chain determination module, for when the first similarity difference is more than predetermined threshold value, described in judgement Black chain in webpage to be detected be present.
  7. 7. the detection means of black chain in webpage as claimed in claim 6, it is characterised in that the webpage In the detection means of black chain also include:
    Judge module, it whether there is content of text in the exterior chain for judging the webpage to be detected;
    The black chain determination module, it is additionally operable in the exterior chain that content of text is not present, and described When first similarity difference is more than predetermined threshold value, judge black chain be present in the webpage to be detected;
    The acquisition module, it is additionally operable to, when content of text be present in having the exterior chain, obtain the text First keyword type corresponding to keyword in content;
    The similarity difference calculating module, be additionally operable to obtain first keyword type with it is described to be checked Second similarity difference of the second keyword type corresponding to survey grid page, and by the second similarity difference It is superimposed to the first similarity difference and obtains third phase and seemingly spends difference;
    The black chain determination module, it is additionally operable to when the third phase is more than the predetermined threshold value like degree difference, Judge black chain be present in the webpage to be detected.
  8. 8. the detection means of black chain in webpage as claimed in claim 7, it is characterised in that the acquisition Module is additionally operable to:
    Obtain the search engine sensitive tags in the webpage to be detected;
    Obtain keyword type corresponding to each search engine sensitive tags, and by the pass of acquisition Keyword type is as the second keyword type corresponding to the webpage to be detected.
  9. 9. the detection means of black chain in the webpage as described in claim any one of 6-8, it is characterised in that The detection means of black chain also includes in the webpage:
    The acquisition module, it is additionally operable to obtain the first domain name in each URL inserted in webpage to be detected With the second domain name in the URL in webpage to be detected, and obtain in first domain name with described second Unmatched 3rd domain name of domain name;
    Processing module, for using the URL corresponding to the 3rd domain name as exterior chain, wherein, in institute State the first domain name it is identical with second domain name when, or first domain name be second domain name son During domain name, first domain name matches with second domain name.
  10. 10. the detection means of black chain in webpage as claimed in claim 9, it is characterised in that described to obtain Modulus block, the domain name type where being additionally operable to the domain name according to corresponding to exterior chain determine the URL of the exterior chain Corresponding first domain name type, and determined according to the domain name type where second domain name described to be detected Second domain name type corresponding to the URL of webpage.
CN201610319264.2A 2016-05-12 2016-05-12 Method and device for detecting black chain in webpage Active CN107370718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610319264.2A CN107370718B (en) 2016-05-12 2016-05-12 Method and device for detecting black chain in webpage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610319264.2A CN107370718B (en) 2016-05-12 2016-05-12 Method and device for detecting black chain in webpage

Publications (2)

Publication Number Publication Date
CN107370718A true CN107370718A (en) 2017-11-21
CN107370718B CN107370718B (en) 2020-12-18

Family

ID=60304395

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610319264.2A Active CN107370718B (en) 2016-05-12 2016-05-12 Method and device for detecting black chain in webpage

Country Status (1)

Country Link
CN (1) CN107370718B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908764A (en) * 2017-11-27 2018-04-13 杭州安恒信息技术有限公司 A kind of exterior chain monitoring method of fixed issue content
CN109067716A (en) * 2018-07-18 2018-12-21 杭州安恒信息技术股份有限公司 A kind of method and system identifying dark chain
CN109522494A (en) * 2018-11-08 2019-03-26 杭州安恒信息技术股份有限公司 A kind of dark chain detection method, device, equipment and computer readable storage medium
CN109561078A (en) * 2018-11-09 2019-04-02 深圳万物云联科技有限公司 A kind of exterior chain url resource transfer method and device
CN109784038A (en) * 2018-12-29 2019-05-21 北京奇安信科技有限公司 Detecting black chain method, apparatus, system and computer readable storage medium
CN110532784A (en) * 2019-09-04 2019-12-03 杭州安恒信息技术股份有限公司 A kind of dark chain detection method, device, equipment and computer readable storage medium
CN111654472A (en) * 2020-05-14 2020-09-11 亚信科技(成都)有限公司 Domain name detection method and device
WO2020211130A1 (en) * 2019-04-16 2020-10-22 网宿科技股份有限公司 Hidden link detection method and apparatus for website
CN112532624A (en) * 2020-11-27 2021-03-19 深信服科技股份有限公司 Black chain detection method and device, electronic equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295320A (en) * 2008-06-30 2008-10-29 腾讯科技(深圳)有限公司 Method and system for judging anchor text noise level
CN102073730A (en) * 2011-01-14 2011-05-25 哈尔滨工程大学 Method for constructing topic web crawler system
CN102236654A (en) * 2010-04-26 2011-11-09 广东开普互联信息科技有限公司 Web useless link filtering method based on content relevancy
CN103516693A (en) * 2012-06-28 2014-01-15 中国电信股份有限公司 Method and device for identifying phishing website
CN104077353A (en) * 2011-12-30 2014-10-01 北京奇虎科技有限公司 Method and device for detecting hacking links

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295320A (en) * 2008-06-30 2008-10-29 腾讯科技(深圳)有限公司 Method and system for judging anchor text noise level
CN102236654A (en) * 2010-04-26 2011-11-09 广东开普互联信息科技有限公司 Web useless link filtering method based on content relevancy
CN102073730A (en) * 2011-01-14 2011-05-25 哈尔滨工程大学 Method for constructing topic web crawler system
CN104077353A (en) * 2011-12-30 2014-10-01 北京奇虎科技有限公司 Method and device for detecting hacking links
CN103516693A (en) * 2012-06-28 2014-01-15 中国电信股份有限公司 Method and device for identifying phishing website

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
苏秀芝: "网页去噪与特征提取算法的研究及实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908764B (en) * 2017-11-27 2021-06-22 杭州安恒信息技术股份有限公司 External link monitoring method for fixed release content
CN107908764A (en) * 2017-11-27 2018-04-13 杭州安恒信息技术有限公司 A kind of exterior chain monitoring method of fixed issue content
CN109067716A (en) * 2018-07-18 2018-12-21 杭州安恒信息技术股份有限公司 A kind of method and system identifying dark chain
CN109522494A (en) * 2018-11-08 2019-03-26 杭州安恒信息技术股份有限公司 A kind of dark chain detection method, device, equipment and computer readable storage medium
CN109522494B (en) * 2018-11-08 2020-09-15 杭州安恒信息技术股份有限公司 Dark chain detection method, device, equipment and computer readable storage medium
CN109561078A (en) * 2018-11-09 2019-04-02 深圳万物云联科技有限公司 A kind of exterior chain url resource transfer method and device
CN109784038A (en) * 2018-12-29 2019-05-21 北京奇安信科技有限公司 Detecting black chain method, apparatus, system and computer readable storage medium
WO2020211130A1 (en) * 2019-04-16 2020-10-22 网宿科技股份有限公司 Hidden link detection method and apparatus for website
CN110532784A (en) * 2019-09-04 2019-12-03 杭州安恒信息技术股份有限公司 A kind of dark chain detection method, device, equipment and computer readable storage medium
CN111654472A (en) * 2020-05-14 2020-09-11 亚信科技(成都)有限公司 Domain name detection method and device
CN111654472B (en) * 2020-05-14 2022-05-24 亚信科技(成都)有限公司 Domain name detection method and device
CN112532624A (en) * 2020-11-27 2021-03-19 深信服科技股份有限公司 Black chain detection method and device, electronic equipment and readable storage medium
CN112532624B (en) * 2020-11-27 2023-09-05 深信服科技股份有限公司 Black chain detection method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN107370718B (en) 2020-12-18

Similar Documents

Publication Publication Date Title
CN107370718A (en) The detection method and device of black chain in webpage
CN103605738B (en) Web page access data statistical method and device
CN104462152B (en) A kind of recognition methods of webpage and device
CN104143008B (en) The method and device of fishing webpage is detected based on picture match
CN109639744A (en) A kind of detection method and relevant device in the tunnel DNS
CN103685228B (en) Website vulnerability rapid scanning method and device
CN105868630A (en) Malicious PDF document detection method
CN107798080B (en) Similar sample set construction method for fishing URL detection
CN103399872B (en) The method and apparatus that webpage capture is optimized
CN105631340B (en) A kind of method and device of XSS Hole Detection
CN108156165A (en) A kind of method and system for reporting detection by mistake
CN109413016A (en) A kind of rule-based message detecting method and device
CN113221032A (en) Link risk detection method, device and storage medium
CN105205356A (en) APP application re-packaging detection method
CN104679798B (en) Page detection method and device
CN108667766A (en) File detection method and file detection device
Park et al. Phishing website detection framework through web scraping and data mining
CN105704099A (en) Method for detecting illegal links hidden in website scripts
CN107506649A (en) A kind of leak detection method of html web page, device and electronic equipment
CN103475673B (en) Fishing website recognition methods, device and client
CN109522494B (en) Dark chain detection method, device, equipment and computer readable storage medium
KR101639869B1 (en) Program for detecting malignant code distributing network
CN107633020B (en) Article similarity detection method and device
CN110011964B (en) Webpage environment detection method and device
CN109067716A (en) A kind of method and system identifying dark chain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Nanshan District Xueyuan Road in Shenzhen city of Guangdong province 518052 No. 1001 Nanshan Chi Park building A1 layer

Applicant after: SANGFOR TECHNOLOGIES Inc.

Address before: Nanshan District Xueyuan Road in Shenzhen city of Guangdong province 518052 No. 1001 Nanshan Chi Park building A1 layer

Applicant before: Sangfor Technologies Co.,Ltd.

GR01 Patent grant
GR01 Patent grant