CN110298006A - For detecting the method and apparatus for usurping the website of link - Google Patents

For detecting the method and apparatus for usurping the website of link Download PDF

Info

Publication number
CN110298006A
CN110298006A CN201910579576.0A CN201910579576A CN110298006A CN 110298006 A CN110298006 A CN 110298006A CN 201910579576 A CN201910579576 A CN 201910579576A CN 110298006 A CN110298006 A CN 110298006A
Authority
CN
China
Prior art keywords
link
keyword
stream
website
target pages
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910579576.0A
Other languages
Chinese (zh)
Inventor
刘昊骋
张梦
许韩晨玺
陈浩
胡庆华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910579576.0A priority Critical patent/CN110298006A/en
Publication of CN110298006A publication Critical patent/CN110298006A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/972Access to data in other repository systems, e.g. legacy data or dynamic Web page generation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present application discloses method, apparatus, electronic equipment and the computer-readable medium for detecting the website for usurping link.One specific embodiment of this method includes: to obtain network behavior data;Keyword feature is carried out to network behavior data and link jumps feature extraction, keyword is obtained and link relevant to keyword jumps stream;Based on keyword and it is relevant to keyword link jump stream and target pages preset authorization Website Hosting, determine that link jumps the corresponding website of link without target pages authorization in the link that stream includes, wherein, target pages are that link jumps the page that stream jumps to.The embodiment can be realized the monitoring that behavior is usurped in link.

Description

For detecting the method and apparatus for usurping the website of link
Technical field
The invention relates to field of computer technology, and in particular to network data processing method, more particularly, to The method and apparatus that the website of link is usurped in detection.
Background technique
In internet, number of site link has higher requirement, such as the link of website of bank to linking secure.This A little websites would generally authorize some other websites using its link, so that the link that user can be provided by the website of authorization is pacified These websites are jumped to entirely.
However, these web site urls are often usurped by some unwarranted websites, visited by unwarranted website When asking, safety is unable to get guarantee.
Summary of the invention
Embodiment of the disclosure proposes method, apparatus, electronic equipment and the calculating for detecting the website for usurping link Machine readable medium.
In a first aspect, embodiment of the disclosure provide it is a kind of for detecting the method for usurping the website of link, comprising: obtain Take network behavior data;Keyword feature and link are carried out to network behavior data and jump feature extraction, obtain keyword and with The relevant link of keyword jumps stream;Based on keyword and it is relevant to keyword link jump stream and target pages it is pre- If authorizing Website Hosting, determine that link jumps the corresponding website of link without target pages authorization in the link that stream includes, Wherein, target pages are that link jumps the page that stream jumps to.
In some embodiments, it is above-mentioned based on keyword and it is relevant to keyword link jump flow and target pages Preset authorization Website Hosting, determine that link jumps the corresponding station of link without target pages authorization in the link that stream includes Point, comprising: intention analysis is carried out to keyword, determines the target keyword of the intention comprising the access preset page;It will be with mesh The relevant link of mark keyword jumps stream as Object linking and jumps stream, determines that Object linking jumps the page object that stream jumps to Face;The preset authorization Website Hosting that stream and target pages are jumped based on Object linking determines that link jumps the chain that stream includes The corresponding website of link without target pages authorization in connecing.
In some embodiments, the above-mentioned preset authorization Website Hosting that stream and target pages are jumped based on Object linking, Determine that link jumps the corresponding website of link without target pages authorization in the link that stream includes, comprising: parsing object chain It connects and jumps stream, obtain Object linking and jump at least one sublink that stream includes;Data are carried out to the corresponding website of sublink to climb It takes, determines whether the corresponding website of sublink includes that Object linking jumps and flows characterized link and jump row according to the data crawled For;Behavior is jumped in response to determining that the corresponding website of sublink jumps the characterized link of stream comprising Object linking, judges subchain Corresponding website is connect whether in the preset authorization Website Hosting of target pages;If the corresponding website of sublink is not in target pages Preset authorization Website Hosting in, determine that subchain is connected in the link without target pages authorization.
In some embodiments, above-mentioned that intention analysis is carried out to the relevant network behavior of keyword, it determines comprising access The target keyword of the intention of the default page, comprising: by the keyword extracted, in preset intention keyword set It is intended to the successful keyword of Keywords matching as target keyword, wherein preset intention keyword set includes having determined that The intention comprising the access preset page intention keyword.
In some embodiments, it is above-mentioned based on keyword and it is relevant to keyword link jump flow and target pages Preset authorization Website Hosting, determine that link jumps the corresponding station of link without target pages authorization in the link that stream includes Point, comprising: in response to determining that link jumps the corresponding website of link that stream includes not in the preset authorization set of sites of target pages In conjunction, keyword and link relevant to keyword are jumped in the identification model that stream input has been trained, identify that link jumps The corresponding website of link without target pages authorization in the link that stream includes;Wherein, identification model based on target pages Know that the relevant keyword feature of unauthorized website and link jump stream feature training and obtain.
Second aspect, embodiment of the disclosure provide a kind of for detecting the device for usurping the website of link, comprising: obtain Unit is taken, is configured as obtaining network behavior data;Extraction unit is configured as carrying out keyword feature to network behavior data Feature extraction is jumped with link, keyword is obtained and link relevant to keyword jumps stream;Detection unit is configured as being based on Keyword and it is relevant to keyword link jump stream and target pages preset authorization Website Hosting, determine link jump The corresponding website of link without target pages authorization in the link that turn of tidal stream includes, wherein target pages are that link jumps stream jump The page gone to.
In some embodiments, above-mentioned detection unit is configured to determine that link jumps stream as follows The corresponding website of link without target pages authorization in the link for including: carrying out intention analysis to keyword, determine include The target keyword of the intention of the access preset page;It relevant to target keyword will link to jump to flow and be jumped as Object linking Stream, determines that Object linking jumps the target pages that stream jumps to;Stream is jumped based on Object linking and the default of target pages is awarded Website Hosting is weighed, determines that link jumps the corresponding website of link without target pages authorization in the link that stream includes.
In some embodiments, above-mentioned detection unit is configured to determine that link jumps stream as follows The corresponding website of link without target pages authorization in the link for including: parsing Object linking jumps stream, obtains Object linking Jump at least one sublink that stream includes;Data are carried out to the corresponding website of sublink to crawl, and are determined according to the data crawled Whether the corresponding website of sublink includes that Object linking jumps and flows characterized link and jump behavior;In response to determining sublink pair Whether the website answered jumps the characterized link of stream comprising Object linking and jumps behavior, judge the corresponding website of sublink in target In the preset authorization Website Hosting of the page;If the corresponding website of sublink not in the preset authorization Website Hosting of target pages, Determine that subchain is connected in the link without target pages authorization.
In some embodiments, above-mentioned detection unit is configured to as follows to the relevant net of keyword Network behavior carries out intention analysis, determines the target keyword of the intention comprising the access preset page: the keyword that will be extracted In, with the successful keyword of intention Keywords matching in preset intention keyword set as target keyword, wherein pre- If intention keyword set include the fixed intention comprising the access preset page intention keyword.
In some embodiments, above-mentioned detection unit is configured to determine that link jumps stream as follows The corresponding website of link without target pages authorization in the link for including: in response to determining that link jumps the link pair that stream includes The website answered not in the preset authorization Website Hosting of target pages, by keyword and it is relevant to keyword link jump stream it is defeated Enter in the identification model trained, the link for identifying that link is jumped without target pages authorization in the link that stream includes is corresponding Website;Wherein, identification model jumps stream spy based on the relevant keyword feature of known unauthorized website of target pages and link Sign training obtains.
The third aspect, embodiment of the disclosure provide a kind of electronic equipment, comprising: one or more processors;Storage Device, for storing one or more programs, when one or more programs are executed by one or more processors so that one or Multiple processors realize the method for detecting the website for usurping link provided such as first aspect.
Fourth aspect, embodiment of the disclosure provide a kind of computer-readable medium, are stored thereon with computer program, Wherein, the method for detecting the website for usurping link that first aspect provides is realized when program is executed by processor.
Method and apparatus, electronic equipment and the calculating for being used to detect the website for usurping link that embodiment of the disclosure provides Machine readable medium carries out keyword feature to network behavior data and link jumps feature and mentions by obtaining network behavior data Take, obtain keyword and link relevant to keyword jumps stream, based on keyword and it is relevant to keyword link jump stream, And the preset authorization Website Hosting of target pages, determine that link jumps in the link that stream includes without target pages authorization Link corresponding website, wherein target pages are that link jumps the page that stream jumps to, and realize the network behavior using magnanimity Behavior is usurped in data monitoring link, to promote the safety of website visiting.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is that embodiment of the disclosure can be applied to exemplary system architecture figure therein;
Fig. 2 is the flow chart according to one embodiment of the method for detecting the website for usurping link of the disclosure;
Fig. 3 is the flow chart according to another embodiment of the method for detecting the website for usurping link of the disclosure;
Fig. 4 is the effect diagram that link jumps stream process of analysis;
The structural schematic diagram of one embodiment of the device for detecting the website for usurping link of Fig. 5 disclosure;
Fig. 6 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of the embodiment of the present disclosure.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1, which is shown, can usurp chain using the method for being used to detect the website for usurping link of the application or for detecting The exemplary system architecture of the device of the website connect.
As shown in Figure 1, may include terminal device 101,102,103, network 104 and server in system architecture 100 105.Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network can To include various connection types, such as wired, wireless communication link or fiber optic cables etc..
Terminal device 101,102,103 can be the electronic equipment with display screen, can be smart phone, notebook electricity Brain, desktop computer, tablet computer, smartwatch, etc..Various resources can be installed to visit on terminal device 101,102,103 Ask class application, such as searching class application, audio and video playing application, information client, browser application, social platform software, etc. Deng.User can be used the various applications in terminal device 101,102,103 and obtain Internet resources.
Server 105, which can be, provides the server of back-office support for the various applications on terminal device 101,102,103, Such as it can be the background server of search engine.Server 105 can obtain user's by terminal device 101,102,103 Network accesses behavioral data, and accesses behavioral data according to the network of mass users and Internet resources are analyzed and detected, really It makes unsafe network resource data and is alerted accordingly or shielding processing.
It should be noted that server 105 can be hardware, it is also possible to software.It, can when server 105 is hardware To be implemented as the distributed server cluster that multiple servers form, individual server also may be implemented into.When server 105 is When software, multiple softwares or software module may be implemented into (such as providing multiple softwares of Distributed Services or software mould Block), single software or software module also may be implemented into.It is not specifically limited herein.
Above-mentioned terminal device 101,102,103 is also possible to software.It, can when terminal device 101,102,103 is software To be mounted in above-mentioned cited electronic equipment.Multiple softwares or software module may be implemented into (such as providing point in it The multiple softwares or software module of cloth service), single software or software module also may be implemented into.Specific limit is not done herein It is fixed.
It should be noted that can be by for detecting the method for the website for usurping link provided by embodiment of the disclosure Server 105 executes, and correspondingly, usurps the device of the website of link for detecting and can be set in server 105.
It should be understood that the terminal device, network, the number of server in Fig. 1 are only schematical.According to realization need It wants, can have any number of terminal device, network, server.
With continued reference to Fig. 2, it illustrates a realities according to the method for detecting the website for usurping link of the disclosure Apply the process 200 of example.The method for being used to detect the website for usurping link, comprising the following steps:
Step 201, network behavior data are obtained.
In the present embodiment, for detecting executing subject (such as the service shown in FIG. 1 for the method for website for usurping link Device) the network behavior data of mass users can be collected, or the network row of mass users can be received from other electronic equipments For data.Herein, it is the access behavioral data generated that network behavior data, which are customer access networks, for example, by browser or The data that types of applications softward interview Internet resources generate may include search data, social platform data, browsing record, opinion Altar exchanges data, online shopping data, etc..
In practice, the application program that user can be installed by the electronic equipment (such as mobile phone, computer etc.) of user terminal Network data is accessed, such as information search is carried out by search engine, by audio and video playing application viewing audio and video resources, is led to It crosses the specified content of pages of browser access, published an article by forum's application or social platform application.The backstage clothes respectively applied The access behavioral data that business device available user generates when accessing network by corresponding application program, including user's input Text, picture, voice, the operations such as the click of user, search, the operation object of the operations such as browsing, subscribing to, thumb up and operating time Data.Above-mentioned executing subject can obtain network behavior data by establishing connection with the background server of each application.
Step 202, keyword feature and link are carried out to network behavior data and jump feature extraction, obtain keyword and with The relevant link of keyword jumps stream.
Can then filter out the relevant data of text from network behavior data, for example, search information, social platform and The article etc. delivered in forum's application, carries out keyword feature extraction.Specifically the relevant data of text can be divided first Then word extracts keyword using key word analysis technology, such as can use PLSA (Probabilistic latent Semantic analysis, probability latent semantic analysis) scheduling algorithm extracts keyword.
Can extract the relevant data of link from network behavior data, for example, the page to jump behavior automatically relevant Then data, the relevant data of behavior of user clicks on links extract link from the relevant data of link and jump stream.At this In, what link jumped that flow table shows clickthrough jumps timing behavior, such as a link jumps timing behavior and is: being jumped by A website B website is gone to, jumps to C website after the link of C website is clicked in B website, then an available link jumps stream are as follows: " A →B→C”。
In the present embodiment, stream feature can be jumped to keyword feature and link and is associated extraction, obtain keyword Link relevant with to keyword jumps stream.Herein, it is relevant to keyword link jump stream refer to be based on and keyword phase The link that the operation of pass realizes that link is jumped and formed jumps stream.For example, the user in the search result based on search key One of link X is clicked, user clicks in the page of link X after the page for jumping to link X links to page The user option of face Y.It is related to the search key that stream is then jumped by the link that X jumps to Y.
Above-mentioned executing subject can parse every network behavior data, extract keyword therein and link is jumped It changes one's profession to jump stream and associating the keyword extracted with linking.
Due to the link extracted jump stream be it is relevant to keyword, show the link extracted jump stream be due to close The relevant operation of keyword generates.May insure in this way link that step 202 extracts jump link included in stream be can Link.
Step 203, based on keyword and it is relevant to keyword link jump stream and target pages preset authorization station Point set determines that link jumps the corresponding website of link without target pages authorization in the link that stream includes.
Stream is jumped for the link extracted, can determine that the page that it is jumped to is target pages.Optionally, it will link The last one page that stream jumps to is jumped as target pages.It is then based on the preset authorization Website Hosting of the target pages, Judgement link jumps in the link that stream includes with the presence or absence of the link without target pages authorization.
The preset authorization Website Hosting of target pages is defaulted as the white list website of search engine offer.Optionally, if mesh The mark page is specifically for example, bank site or database site to the access higher website of approach security requirement, can be with Some websites are preassigned as authorization Website Hosting.By the site access target pages not in preset authorization Website Hosting Behavior be considered the access of security risk.If the website in preset authorization Website Hosting is not jumped by link Target pages are had accessed, then can determine that the website not in preset authorization Website Hosting has the link for usurping target pages Behavior.
Stream can be jumped to above-mentioned link to parse, show that link jumps and flow passed through all-links, then successively Every link is judged whether in the corresponding preset authorization Website Hosting for linking and jumping the target pages that stream jumps to, if link The link in stream is jumped not in the preset authorization Website Hosting of corresponding target pages, it is determined that this is linked as without page object The corresponding website of link of power is personally instructed, namely determines the website for being linked as usurping the link of target pages.
Above-described embodiment jumps the extraction for flowing feature by the way that mass network data are carried out keyword feature and linked, and point Analysis link jumps stream whether the link comprising the corresponding website of link without target pages authorization, can batch detection go out network The middle website that behavior is usurped in the presence of link realizes that the overall monitor of behavior is usurped in link, and then can be to illegal access behavior Security control is carried out, the safety of website visiting is improved.
In some optional implementations, it can determine that link jumps in the link that stream includes not as follows The corresponding website of link through target pages authorization: in response to determining that link jumps the corresponding website of link that stream includes not in mesh In the preset authorization Website Hosting for marking the page, keyword and link relevant to keyword are jumped stream and input the identification trained In model, identify that link jumps the corresponding website of link without target pages authorization in the link that stream includes.
Herein, identification model can be based on the relevant keyword feature of known unauthorized website of target pages and link Stream feature training is jumped to obtain.The known unauthorized website of target pages is known without target pages authorization and to be included in The website of target pages is jumped to by keyword relevant operation.The identification model can export link and jump included in stream It is linked as usurping the confidence score of the link of target pages illegally linked, it, can be true when confidence score is greater than a certain threshold value The fixed corresponding illegal link for being linked as usurping the link of target pages.
In training, unauthorized website known to these can be used and be closed with the same or similar mode of step 202 Keyword feature and link relevant to keyword jump feature extraction, and chain is then usurped in the feature extracted input for identification The identification model of the website connect, the recognition accuracy based on identification model are taken turns the parameter of iteration adjustment identification model more, are being identified The deconditioning when recognition accuracy of model is promoted to preset threshold value, the identification model trained.
In this way, recycling the identification model trained to carry out after being screened by the preset authorization website of target pages Further recognition detection more comprehensively can accurately detect the illegal link for usurping the link of target pages.
With continued reference to Fig. 3, it illustrates according to the method for detecting the website for usurping link of the disclosure another The flow diagram of embodiment.As shown in figure 3, the process 300 of the method for detecting the website for usurping link of the present embodiment, The following steps are included:
Step 301, network behavior data are obtained.
In the present embodiment, it can be obtained from database for detecting the executing subject for the method for website for usurping link The network behavior data that data acquisition generates are carried out to the network access behavior of mass users.The network behavior data may include Search for data, social platform data, browsing record, forum's exchange data, online shopping data, etc..Every network behavior number According to the Internet resources and operation behavior data that may include user accessed in primary network access.
Step 302, keyword feature and link are carried out to network behavior data and jump feature extraction, obtain keyword and with The relevant link of keyword jumps stream.
Keyword feature can be carried out to network behavior data and link jumps feature extraction.Specifically, for a net Network behavioral data, can be it is first determined whether for the relevant data of text, if so, being converted to received text, such as voice turns Then text, text Regularization carry out word segmentation processing to text, extract keyword spy according to preset keywords database Sign.
It is then possible to based on link behavior special characteristic (such as link the specific character for being included or link spy Fix formula) the relevant data of link are extracted from network behavior data, link, which is extracted, from link related data jumps stream. What link jumped that flow table shows clickthrough jumps timing behavior, comprising by first page jump into the access path of the page according to The link of the page of secondary process.
Above-mentioned link jumps stream and is associated with extraction with keyword, i.e., link jump stream be by with the associated operation of keyword The link of (such as search, click etc.) triggering jumps behavior formation.
Above-mentioned steps 301, step 302 are consistent with the step 201 of previous embodiment, step 202 respectively, step 301, step 302 specific implementation can also be no longer superfluous herein respectively with reference to the description in previous embodiment to step 201, step 202 It states.
Step 303, intention analysis is carried out to keyword, determines the target critical of the intention comprising the access preset page Word.
In the present embodiment, intention analysis can be carried out to the keyword that step 302 extracts, determination is related to keyword The page access of user that is characterized of operation be intended to.It specifically can net according to history keyword word and with history keyword word association Network accesses the statistics or modeling result of behavior, determines the page finally accessed in the network access behavior of history keyword word association. Then the keyword that step 302 extracts is matched with history keyword word, by the history keyword word association of successful match The page finally accessed in network access behavior is intended to as the page access that keyword is included.To the keyword respectively extracted After carrying out intention analysis, determine the keyword of the intention comprising the access preset page as target keyword.
Herein, the default page can be the page that page type is specified type, such as the page, the payment of website of bank The class page.The default page is also possible to the page in preset page set, such as can be by the page of website of bank, branch The addition such as class page is paid in preset page set, and stores corresponding page address in page set.
It is alternatively possible to determine target keyword based on preset intention keyword set: the keyword that will be extracted In, with the successful keyword of intention Keywords matching in preset intention keyword set as target keyword, wherein pre- If intention keyword set include the fixed intention comprising the access preset page intention keyword.
In the above-mentioned mode for determining target keyword based on preset intention keyword set, it can preset and include The intention keyword of the intention of the access preset page.For example, " credit card is applied for card " is to handle the page comprising access bank card Intention intention keyword, " monthly payment plan " be comprising access payment the page intention intention keyword.When step 302 mentions When the keyword of taking-up and the intention Keywords matching, it can determine that the keyword extracted has the intention keyword corresponding The access of the default page is intended to.
By collecting intention keyword in advance, building is intended to keyword set, extracts using from network behavior data Keyword and be intended to the matched mode of keyword set, can quickly determine comprising the access preset page intention target Keyword.
Step 304, it relevant to target keyword will link to jump to flow and jump stream as Object linking, and determine Object linking Jump the target pages that stream jumps to.
It is relevant that the keyword and link that above-mentioned steps 302 extract, which jump stream,.It herein, can be by target keyword Relevant link jumps stream and is determined as test object, jumps stream as Object linking.Then determine that Object linking jumps stream For the page jumped to as target pages, which can be one of them above-mentioned default page.
Step 305, the preset authorization Website Hosting that stream and target pages are jumped based on Object linking, determines to link Jump the corresponding website of link without target pages authorization in the link that stream includes.
The preset authorization Website Hosting for the target pages that available step 304 is determined.The preset authorization Website Hosting The website of access has been authorized comprising target pages, i.e., has been recognized by these behaviors for having authorized access website to jump to target pages To be legal behavior.Through the website without target pages authorization (i.e. not in the preset authorization Website Hosting of target pages Website) to jump to the behavior of target pages be considered as having usurped the behavior of the link of target pages, these are not in page object Website in the preset authorization website in face is the website for usurping the link of target pages.
Stream can be jumped to Object linking to parse, extract all-links wherein included, that is, extract object chain The link for jumping each website that stream is passed through is connect, successively judges whether the link of these websites awards in the default of above-mentioned target pages It weighs in Website Hosting.It is not awarded in the default of above-mentioned target pages included in stream in this way, Object linking can be extracted and jumped The link in Website Hosting is weighed, so that it is determined that going out to usurp the website of the link of target pages.
It, after the step 304, can be further according to such as under type in some optional implementations of the present embodiment Determine that link jumps the corresponding website of link without target pages authorization in the link that stream includes:
Firstly, parsing Object linking jumps stream, obtains Object linking and jump at least one sublink that stream includes.
In an illustrative scene, as shown in figure 4, user clickthrough 1 in search result, jumps to link 2, Then link 3 is jumped to by the anchor point in 2 corresponding websites of link, is jumped later by the anchor point in 3 corresponding websites of link Website of bank homepage is gone to, a link is generated and jumps stream: link 1 → link 2 → link 3 → website of bank homepage.Herein, Website of bank homepage is target pages.At this moment stream can be jumped by parsing the link and parses sublink therein: link 1, Link 2, link 3.
Then, data are carried out to the corresponding website of sublink to crawl, determines the corresponding station of sublink according to the data crawled Whether point jumps the characterized link of stream comprising Object linking and jumps behavior.
It can use data and crawl the content of pages that tool crawls the corresponding website of sublink, it specifically can be to sublink pair Content relevant to the behavior of jumping is linked is crawled in the website answered, such as extracts the anchor point in the page.
Later, row is jumped in response to determining that the corresponding website of sublink jumps the characterized link of stream comprising Object linking To judge the corresponding website of sublink whether in the preset authorization Website Hosting of target pages;If the corresponding website of sublink Not in the preset authorization Website Hosting of target pages, determine that subchain is connected in the link without target pages authorization.
May determine that the anchor point for whether jumping next link in stream in the corresponding website of sublink comprising jumping to link. Judge that the corresponding website of sublink jumps behavior with the presence or absence of link, and judges that the link in the corresponding website of sublink is jumped Change one's profession for whether with link jump the sublink that stream is characterized to link the behavior of jumping consistent.It in this way can be to avoid will be not present The website that link jumps behavior is determined as usurping the website of link, it is ensured that is detected usurps the accuracy of the website of link.
If it is determined that the corresponding website of sublink, which jumps the characterized link of stream comprising Object linking, jumps behavior, then into one Step judges whether the corresponding website of sublink is preset authorization website that Object linking jumps the target pages that stream jumps to.If It is not that then the subchain is connected in the illegal link for the link for usurping target pages.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides one kind to steal for detecting With one embodiment of the device of the website of link, the Installation practice is corresponding with Fig. 2 and embodiment of the method shown in Fig. 3, The device specifically can be applied in various electronic equipments.
As shown in figure 5, the device 500 for detecting the website for usurping link of the present embodiment include: acquiring unit 501, Extraction unit 502 and detection unit 503.Wherein, acquiring unit 501 is configured as obtaining network behavior data;Extraction unit 502 are configured as carrying out keyword feature and link to network behavior data jumping feature extraction, obtain keyword and with key The relevant link of word jumps stream;Detection unit 503 be configured as based on keyword and it is relevant to keyword link jump stream, with And the preset authorization Website Hosting of target pages, determine that link jumps the chain without target pages authorization in the link that stream includes Connect corresponding website, wherein target pages are that link jumps the page that stream jumps to.
In some embodiments, above-mentioned detection unit 503 is configured to determine that link is jumped as follows The corresponding website of link without target pages authorization in the link that turn of tidal stream includes: intention analysis is carried out to keyword, is determined The target keyword of intention comprising the access preset page;It relevant to target keyword will link and jump stream as Object linking Stream is jumped, determines that Object linking jumps the target pages that stream jumps to;Based on Object linking jump stream and target pages it is pre- If authorizing Website Hosting, determine that link jumps the corresponding website of link without target pages authorization in the link that stream includes.
In some embodiments, above-mentioned detection unit 503 is configured to determine that link is jumped as follows The corresponding website of link without target pages authorization in the link that turn of tidal stream includes: parsing Object linking jumps stream, obtains target Link jumps at least one sublink that stream includes;It carries out data to the corresponding website of sublink to crawl, according to the data crawled Determine whether the corresponding website of sublink includes that Object linking jumps and flows characterized link and jump behavior;In response to determining subchain Connecing corresponding website includes that Object linking jumps and flows characterized link and jump behavior, judge the corresponding website of sublink whether In the preset authorization Website Hosting of target pages;If the corresponding website of sublink is not in the preset authorization Website Hosting of target pages In, determine that subchain is connected in the link without target pages authorization.
In some embodiments, above-mentioned detection unit 503 is configured to as follows to keyword correlation Network behavior carry out intention analysis, determine comprising the access preset page intention target keyword: the pass that will be extracted In keyword, with the successful keyword of intention Keywords matching in preset intention keyword set as target keyword, In, it is preset to be intended to the intention keyword that keyword set includes the fixed intention comprising the access preset page.
In some embodiments, above-mentioned detection unit 503 is configured to determine that link is jumped as follows The corresponding website of link without target pages authorization in the link that turn of tidal stream includes: in response to determining that link jumps the chain that stream includes Corresponding website is connect not in the preset authorization Website Hosting of target pages, keyword and link relevant to keyword are jumped In the identification model that stream input has been trained, identify that link jumps the link pair without target pages authorization in the link that stream includes The website answered;Wherein, identification model is jumped based on the relevant keyword feature of known unauthorized website of target pages and link Stream feature training obtains.
It should be appreciated that all units recorded in device 500 and each step phase in the method described referring to figs. 2 and 3 It is corresponding.It is equally applicable to device 500 and unit wherein included above with respect to the operation and feature of method description as a result, herein It repeats no more.
The device 500 for being used to detect the website for usurping link of above-described embodiment of the disclosure, is obtained by acquiring unit Network behavior data, extraction unit carries out keyword feature to network behavior data and link jumps feature extraction, obtains key Word and link relevant to keyword jump stream, detection unit be based on keyword and it is relevant to keyword link jump stream, with And the preset authorization Website Hosting of target pages, determine that link jumps the chain without target pages authorization in the link that stream includes Connect corresponding website, wherein target pages are that link jumps the page that stream jumps to, and realize the network behavior number using magnanimity Behavior is usurped according to monitoring link, to promote the safety of website visiting.
Below with reference to Fig. 6, it illustrates the electronic equipment that is suitable for being used to realize embodiment of the disclosure, (example is as shown in figure 1 Server) 600 structural schematic diagram.Electronic equipment shown in Fig. 6 is only an example, should not be to embodiment of the disclosure Function and use scope bring any restrictions.
As shown in fig. 6, electronic equipment 600 may include processing unit (such as central processing unit, graphics processor etc.) 601, random access can be loaded into according to the program being stored in read-only memory (ROM) 602 or from storage device 608 Program in memory (RAM) 603 and execute various movements appropriate and processing.In RAM 603, it is also stored with electronic equipment Various programs and data needed for 600 operations.Processing unit 601, ROM 602 and RAM 603 pass through the phase each other of bus 604 Even.Input/output (I/O) interface 605 is also connected to bus 604.
In general, following device can connect to I/O interface 605: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph As the input unit 606 of head, microphone, accelerometer, gyroscope etc.;Including such as liquid crystal display (LCD), loudspeaker, vibration The output device 607 of dynamic device etc.;Storage device 608 including such as hard disk etc.;And communication device 609.Communication device 609 can To allow electronic equipment 600 wirelessly or non-wirelessly to be communicated with other equipment to exchange data.Although Fig. 6 is shown with various The electronic equipment 600 of device, it should be understood that being not required for implementing or having all devices shown.It can be alternatively Implement or have more or fewer devices.Each box shown in Fig. 6 can represent a device, also can according to need Represent multiple devices.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communication device 609, or from storage device 608 It is mounted, or is mounted from ROM 602.When the computer program is executed by processing unit 601, the implementation of the disclosure is executed The above-mentioned function of being limited in the method for example.It should be noted that computer-readable medium described in embodiment of the disclosure can To be computer-readable signal media or computer readable storage medium either the two any combination.Computer can Reading storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device Or device, or any above combination.The more specific example of computer readable storage medium can include but is not limited to: tool There are electrical connection, the portable computer diskette, hard disk, random access storage device (RAM), read-only memory of one or more conducting wires (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD- ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In embodiment of the disclosure, computer Readable storage medium storing program for executing can be any tangible medium for including or store program, which can be commanded execution system, device Either device use or in connection.And in embodiment of the disclosure, computer-readable signal media may include In a base band or as the data-signal that carrier wave a part is propagated, wherein carrying computer-readable program code.It is this The data-signal of propagation can take various forms, including but not limited to electromagnetic signal, optical signal or above-mentioned any appropriate Combination.Computer-readable signal media can also be any computer-readable medium other than computer readable storage medium, should Computer-readable signal media can send, propagate or transmit for by instruction execution system, device or device use or Person's program in connection.The program code for including on computer-readable medium can transmit with any suitable medium, Including but not limited to: electric wire, optical cable, RF (radio frequency) etc. or above-mentioned any appropriate combination.
Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment;It is also possible to individualism, and not It is fitted into the electronic equipment.Above-mentioned computer-readable medium carries one or more program, when said one or more When a program is executed by the electronic equipment, so that the electronic equipment: obtaining network behavior data;Network behavior data are closed Keyword feature and link jump feature extraction, obtain keyword and link relevant to keyword jumps stream;Based on keyword and It is relevant to keyword to link the preset authorization Website Hosting for jumping stream and target pages, determine that link jumps stream and includes Link in the corresponding website of link without target pages authorization, wherein target pages be link jump the page that stream jumps to Face
The behaviour for executing embodiment of the disclosure can be write with one or more programming languages or combinations thereof The computer program code of work, programming language include object oriented program language-such as Java, Smalltalk, C++ further include conventional procedural programming language-such as " C " language or similar program design language Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or It is executed on server.In situations involving remote computers, remote computer can pass through the network of any kind --- packet Include local area network (LAN) or wide area network (WAN) --- it is connected to subscriber computer, or, it may be connected to outer computer (such as It is connected using ISP by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet Include acquiring unit, extraction unit and detection unit.Wherein, the title of these units is not constituted under certain conditions to the unit The restriction of itself, for example, acquiring unit is also described as " obtaining the unit of network behavior data ".
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims (12)

1. a kind of for detecting the method for usurping the website of link, comprising:
Obtain network behavior data;
Keyword feature and link are carried out to the network behavior data and jump feature extraction, obtain keyword and with keyword phase The link of pass jumps stream;
Based on the keyword and it is relevant to keyword link jump stream and target pages preset authorization Website Hosting, Determine that the link jumps the corresponding website of link without target pages authorization in the link that stream includes, wherein the mesh The mark page is that the link jumps the page that stream jumps to.
2. described to be jumped based on the keyword and link relevant to keyword according to the method described in claim 1, wherein The preset authorization Website Hosting of stream and target pages determines that the link jumps in the link that stream includes without page object Personally instruct the corresponding website of link of power, comprising:
Intention analysis is carried out to the keyword, determines the target keyword of the intention comprising the access preset page;
It relevant to target keyword will link to jump to flow and jump stream as Object linking, and determine that the Object linking will jump stream and jumps The target pages gone to;
The preset authorization Website Hosting that stream and the target pages are jumped based on the Object linking, determines the link Jump the corresponding website of link without target pages authorization in the link that stream includes.
3. described to jump stream and the page object based on the Object linking according to the method described in claim 2, wherein The preset authorization Website Hosting in face determines that the link jumps the link pair without target pages authorization in the link that stream includes The website answered, comprising:
It parses the Object linking and jumps stream, obtain Object linking and jump at least one sublink that stream includes;
Data are carried out to the corresponding website of the sublink to crawl, and determine the corresponding website of the sublink according to the data crawled Whether the characterized link of stream is jumped comprising the Object linking jump behavior;
The characterized link of stream, which is jumped, comprising the Object linking in response to the corresponding website of the determination sublink jumps behavior, Judge the corresponding website of the sublink whether in the preset authorization Website Hosting of the target pages;
If the corresponding website of the sublink determines the sublink not in the preset authorization Website Hosting of the target pages For the link without target pages authorization.
4. according to the method in claim 2 or 3, wherein described to be intended to the relevant network behavior of the keyword The target keyword of the intention comprising the access preset page is determined in analysis, comprising:
By in the keyword extracted, make with the successful keyword of intention Keywords matching in preset intention keyword set For target keyword, wherein the preset intention keyword set includes the fixed intention comprising the access preset page Intention keyword.
5. described to be jumped based on the keyword and link relevant to keyword according to the method described in claim 1, wherein The preset authorization Website Hosting of stream and target pages determines that the link jumps in the link that stream includes without page object Personally instruct the corresponding website of link of power, comprising:
The corresponding website of link that stream includes is jumped not at the preset authorization station of the target pages in response to the determination link In point set, the keyword and link relevant to keyword are jumped in the identification model that stream input has been trained, identified The link jumps the corresponding website of link without target pages authorization in the link that stream includes;
Wherein, known unauthorized website relevant keyword feature of the identification model based on the target pages and link are jumped The training of turn of tidal stream feature obtains.
6. a kind of for detecting the device for usurping the website of link, comprising:
Acquiring unit is configured as obtaining network behavior data;
Extraction unit is configured as carrying out keyword feature to the network behavior data and link jumps feature extraction, obtains Keyword and link relevant to keyword jump stream;
Detection unit is configured as jumping stream and target pages based on the keyword and link to keyword relevant Preset authorization Website Hosting, the link for determining that the link is jumped without target pages authorization in the link that stream includes are corresponding Website, wherein the target pages are that the link jumps the page that stream jumps to.
7. device according to claim 6, wherein the detection unit is configured to determine as follows The link jumps the corresponding website of link without target pages authorization in the link that stream includes out:
Intention analysis is carried out to the keyword, determines the target keyword of the intention comprising the access preset page;
It relevant to target keyword will link to jump to flow and jump stream as Object linking, and determine that the Object linking will jump stream and jumps The target pages gone to;
The preset authorization Website Hosting that stream and the target pages are jumped based on the Object linking, determines the link Jump the corresponding website of link without target pages authorization in the link that stream includes.
8. device according to claim 7, wherein the detection unit is configured to determine as follows The link jumps the corresponding website of link without target pages authorization in the link that stream includes out:
It parses the Object linking and jumps stream, obtain Object linking and jump at least one sublink that stream includes;
Data are carried out to the corresponding website of the sublink to crawl, and determine the corresponding website of the sublink according to the data crawled Whether the characterized link of stream is jumped comprising the Object linking jump behavior;
The characterized link of stream, which is jumped, comprising the Object linking in response to the corresponding website of the determination sublink jumps behavior, Judge the corresponding website of the sublink whether in the preset authorization Website Hosting of the target pages;
If the corresponding website of the sublink determines the sublink not in the preset authorization Website Hosting of the target pages For the link without target pages authorization.
9. device according to claim 7 or 8, wherein the detection unit is configured to as follows Intention analysis is carried out to the relevant network behavior of the keyword, determines the target critical of the intention comprising the access preset page Word:
By in the keyword extracted, make with the successful keyword of intention Keywords matching in preset intention keyword set For target keyword, wherein the preset intention keyword set includes the fixed intention comprising the access preset page Intention keyword.
10. device according to claim 6, wherein the detection unit is configured to as follows really Make the corresponding website of link for linking and jumping without target pages authorization in the link that stream includes:
The corresponding website of link that stream includes is jumped not at the preset authorization station of the target pages in response to the determination link In point set, the keyword and link relevant to keyword are jumped in the identification model that stream input has been trained, identified The link jumps the corresponding website of link without target pages authorization in the link that stream includes;
Wherein, known unauthorized website relevant keyword feature of the identification model based on the target pages and link are jumped The training of turn of tidal stream feature obtains.
11. a kind of electronic equipment, comprising:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 5.
12. a kind of computer-readable medium, is stored thereon with computer program, wherein real when described program is executed by processor Now such as method as claimed in any one of claims 1 to 5.
CN201910579576.0A 2019-06-28 2019-06-28 For detecting the method and apparatus for usurping the website of link Pending CN110298006A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910579576.0A CN110298006A (en) 2019-06-28 2019-06-28 For detecting the method and apparatus for usurping the website of link

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910579576.0A CN110298006A (en) 2019-06-28 2019-06-28 For detecting the method and apparatus for usurping the website of link

Publications (1)

Publication Number Publication Date
CN110298006A true CN110298006A (en) 2019-10-01

Family

ID=68029553

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910579576.0A Pending CN110298006A (en) 2019-06-28 2019-06-28 For detecting the method and apparatus for usurping the website of link

Country Status (1)

Country Link
CN (1) CN110298006A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117131301A (en) * 2023-10-24 2023-11-28 苏州阿基米德网络科技有限公司 Webpage end browsing method of medical equipment document

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117131301A (en) * 2023-10-24 2023-11-28 苏州阿基米德网络科技有限公司 Webpage end browsing method of medical equipment document
CN117131301B (en) * 2023-10-24 2024-01-05 苏州阿基米德网络科技有限公司 Webpage end browsing method of medical equipment document

Similar Documents

Publication Publication Date Title
CN113098870B (en) Phishing detection method and device, electronic equipment and storage medium
CN104766014B (en) Method and system for detecting malicious website
US9424516B2 (en) Comprehensive human computation framework
CN105447204B (en) Network address recognition methods and device
CN108153901A (en) The information-pushing method and device of knowledge based collection of illustrative plates
CN107491534A (en) Information processing method and device
CN108090351A (en) For handling the method and apparatus of request message
CN108780475A (en) Personalized inference certification for virtually assisting
CN110413908A (en) The method and apparatus classified based on web site contents to uniform resource locator
CN108228906B (en) Method and apparatus for generating information
CN112231570B (en) Recommendation system support attack detection method, device, equipment and storage medium
CN104598218B (en) For merging and reusing the method and system of gateway information
CN110516173B (en) Illegal network station identification method, illegal network station identification device, illegal network station identification equipment and illegal network station identification medium
WO2017121076A1 (en) Information-pushing method and device
CN106911693A (en) For detecting method, device and terminal device that web page contents are kidnapped
CN109408754A (en) Processing method, device, electronic equipment and the storage medium of web page operation data
CN109977839A (en) Information processing method and device
CN109359194A (en) Method and apparatus for predictive information classification
US20220385676A1 (en) Injecting computing code for detection of malicious computing attacks during suspicious device behavior
CN110069693A (en) Method and apparatus for determining target pages
CN109902446A (en) Method and apparatus for generating information prediction model
CN110245298A (en) Method and apparatus for pushed information
CN107977678A (en) Method and apparatus for output information
Jisha et al. Mobile applications recommendation based on user ratings and permissions
CN109471976A (en) Processing method, device, electronic equipment and the storage medium of web page operation data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination