CN103617225A - Associated webpage searching method and system - Google Patents

Associated webpage searching method and system Download PDF

Info

Publication number
CN103617225A
CN103617225A CN201310603918.0A CN201310603918A CN103617225A CN 103617225 A CN103617225 A CN 103617225A CN 201310603918 A CN201310603918 A CN 201310603918A CN 103617225 A CN103617225 A CN 103617225A
Authority
CN
China
Prior art keywords
url
web pages
page
webpage
associating web
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310603918.0A
Other languages
Chinese (zh)
Other versions
CN103617225B (en
Inventor
王智广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201310603918.0A priority Critical patent/CN103617225B/en
Publication of CN103617225A publication Critical patent/CN103617225A/en
Priority to PCT/CN2014/086522 priority patent/WO2015074455A1/en
Application granted granted Critical
Publication of CN103617225B publication Critical patent/CN103617225B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses an associated webpage searching method and system. The method includes: receiving a search request including search keywords; according to the search keywords, finding in a preset associated webpage database to acquire a webpage matched with the keywords; judging whether the webpage is an associated webpage or not; if yes, returning to the webpage and home information associated with the webpage. When the acquired webpage matched with the keywords is judged to be the associated webpage, a user returns to the webpage and the home information associated with the webpage, so that the process for searching or finding a home page repeatedly by the user is avoided, system operation is further reduced, occupation on system resources is reduced, and search efficiency is improved.

Description

A kind of associating web pages searching method and system
Technical field
The present invention relates to data searching technology field, be specifically related to a kind of associating web pages searching method, a kind of associating web pages search system.
Background technology
Along with the development of the Internet, more and more many information is to be presented on the Internet and to be inquired about for user by webpage mode, and the same Search engine data query in the Internet that passes through also becomes the data search method the most often using.
During search engine webpage, need to take different scheduling strategies for different types of webpage, the identification of webpage kind is an element task, and the identification of wherein page turning (Page turning) webpage is a more crucial job.So-called page turning webpage, checks a upper page of paging file, the next page or the non-current page existing arbitrarily.Page turning webpage can change the content in entity book or mobile Web forms, to watch different content.While using on the internet, this mechanism also presents the user interface element that can be used for browsing to other pages.
The recognition methods of existing page turning webpage is the URL(Uniform Resource Locator according to webpage, URL(uniform resource locator)) whether the keyword that comprises is identified be index page.For example, as URL, include while having numeral after the keywords such as page, pn, p and keyword, judge that the webpage that this URL is corresponding is page turning webpage.
But, this recognition methods recall rate is low, and the page turning of a lot of websites is not have these keywords, such as " http://cq.ABC.com/lvshi/o12/ ", " http://bbs.BCA.com/t661_10 ", " http://china.BCD.com/product/20110617/2647 ", but these webpages are still page turnings, make these recognition methodss easily cause maloperation, practicality is low.
Summary of the invention
In view of the above problems, the present invention has been proposed to a kind of overcome the problems referred to above or a kind of associating web pages searching method addressing the above problem at least in part and corresponding a kind of associating web pages search system are provided.
According to one aspect of the present invention, a kind of associating web pages searching method is provided, comprising:
Receive searching request; Described request comprises searched key word;
According to described searched key word, in preset associating web pages database, search, obtain the webpage mating with described keyword;
Judge whether described webpage is associating web pages; If so, return to the First page information of described webpage and described Webpage correlation.
Alternatively, described associating web pages database is set up in the following manner:
Whether the webpage that judgement grabs comprises associating web pages URL pattern; If so, obtain described associating web pages URL pattern;
Adopt described associating web pages URL pattern to obtain corresponding associating web pages;
Adopt associating web pages corresponding to described associating web pages URL pattern to set up associating web pages database.
Whether the webpage that alternatively, described judgement grabs comprises that the step of associating web pages URL pattern comprises:
Judge in the page elements of current web page and whether there is page turning feature string; If so, extract the URL of described page turning feature string link;
Adopt preset substitute character to replace the digital block in the URL of current web page, obtain First Characteristic URL prefix; Wherein, described digital block is to be spaced apart individual digit or a plurality of numeral that sign is partitioned into;
Adopt preset substitute character to replace the digital block in the URL of described page turning feature string link, obtain Second Characteristic URL prefix;
When described First Characteristic URL prefix is identical with described Second Characteristic URL prefix, judge whether the webpage grabbing comprises associating web pages URL pattern.
Alternatively, the step that whether has page turning feature string in the described page elements that judges current web page comprises:
Adopt page turning feature string to mate in the dom tree node of current web page;
When the match is successful, judge that current web page has page turning feature string.
Alternatively, the digital block in the URL of the substitute character replacement current web page that described employing is preset, the step that obtains First Characteristic URL prefix is:
Adopt identical substitute character to replace the digital block of diverse location in the URL of current web page, obtain First Characteristic URL prefix;
The preset substitute character of described employing is replaced the digital block in the URL of described page turning feature string link, and the step that obtains Second Characteristic URL prefix is:
Adopt identical substitute character to replace the digital block of diverse location in the URL of described feature string link, obtain Second Characteristic URL prefix.
Alternatively, the digital block in the URL of the substitute character replacement current web page that described employing is preset, the step that obtains First Characteristic URL prefix is:
Adopt respectively different substitute characters, the digital block of diverse location in the URL of replacement current web page, obtains First Characteristic URL prefix;
The preset substitute character of described employing is replaced the digital block in the URL of described page turning feature string link, and the step that obtains Second Characteristic URL prefix is:
Adopt respectively the substitute character identical with First Characteristic URL to replace URL that described page turning feature string links in the digital block of same position, acquisition Second Characteristic URL prefix.
Alternatively, the step that the described associating web pages URL of described employing pattern is obtained corresponding associating web pages comprises:
Corresponding associating web pages URL pattern using described First Characteristic URL prefix or Second Characteristic URL prefix as described current web page.
Alternatively, the step of obtaining associating web pages corresponding to described associating web pages URL pattern described in comprises:
By the general character in associating web pages URL pattern, partly carry out structure analysis, extract the page turning piece in associating web pages URL pattern, described page turning piece is replaced with to the URL that homepage sign obtains homepage associating web pages; Wherein, described page turning piece is the identical but digital different digital block in position in a plurality of associating web pages URL patterns;
The URL that accesses described homepage associating web pages obtains described homepage associating web pages.
Alternatively, described homepage sign comprise 0,1 and/or current associating web pages in greatest measure.
According to a further aspect in the invention, provide a kind of associating web pages search system, having comprised:
Searching request receiver module, is suitable for receiving searching request; Described request comprises searched key word;
Coupling webpage obtains module, is suitable for searching in preset associating web pages database according to described searched key word, obtains the webpage mating with described keyword;
Multipage associating web pages judge module, is suitable for judging whether described webpage is associating web pages; If so, recalls information is returned to module;
Information is returned to module, is suitable for returning the First page information of described webpage and described Webpage correlation.
Alternatively, described associating web pages database is set up by calling following submodule:
Associating web pages URL judges submodule, is suitable for judging whether the webpage grabbing comprises associating web pages URL pattern; If so, call associating web pages URL pattern and obtain submodule;
Associating web pages URL pattern is obtained submodule, is suitable for obtaining described associating web pages URL pattern;
Associating web pages obtains submodule, is suitable for adopting described associating web pages URL pattern to obtain corresponding associating web pages;
Associating web pages Database submodule, is suitable for adopting associating web pages corresponding to described associating web pages URL pattern to set up associating web pages database.
Alternatively, described associating web pages URL pattern submodule comprises:
Page turning feature string judging unit, is suitable for judging in the page elements of current web page whether have page turning feature string; If so, call URL extraction unit;
URL extraction unit, is suitable for extracting the URL of described page turning feature string link;
First Characteristic RUL prefix obtains unit, and the digital block in the URL that is suitable for adopting preset substitute character to replace current web page obtains First Characteristic URL prefix; Wherein, described digital block is to be spaced apart individual digit or a plurality of numeral that sign is partitioned into;
Second Characteristic RUL prefix obtains unit, is suitable for adopting preset substitute character to replace the digital block in the URL of described page turning feature string link, obtains Second Characteristic URL prefix;
Identifying unit, is suitable for associating web pages URL pattern when described First Characteristic URL prefix is identical with described Second Characteristic URL prefix, judges whether the webpage grabbing comprises associating web pages URL pattern.
Alternatively, described page turning feature string judging unit is also suitable for:
Adopt page turning feature string to mate in the dom tree node of current web page;
When the match is successful, judge that current web page has page turning feature string.
Alternatively, described First Characteristic RUL prefix acquisition unit is also suitable for:
Adopt identical substitute character to replace the digital block of diverse location in the URL of current web page, obtain First Characteristic URL prefix;
Described Second Characteristic RUL prefix obtains unit and is also suitable for:
Adopt identical substitute character to replace the digital block of diverse location in the URL of described feature string link, obtain Second Characteristic URL prefix.
Alternatively, described First Characteristic RUL prefix acquisition unit is also suitable for:
Adopt respectively different substitute characters, the digital block of diverse location in the URL of replacement current web page, obtains First Characteristic URL prefix;
Described Second Characteristic RUL prefix obtains unit and is also suitable for:
Adopt respectively the substitute character identical with First Characteristic URL to replace URL that described page turning feature string links in the digital block of same position, acquisition Second Characteristic URL prefix.
Alternatively, described associating web pages URL pattern acquisition module is also suitable for:
Corresponding associating web pages URL pattern using described First Characteristic URL prefix or Second Characteristic URL prefix as described current web page.
Alternatively, associated webpage acquisition module is also suitable for:
By the general character in associating web pages URL pattern, partly carry out structure analysis, extract the page turning piece in associating web pages URL pattern, described page turning piece is replaced with to the URL that homepage sign obtains homepage associating web pages; Wherein, described page turning piece is the identical but digital different digital block in position in a plurality of associating web pages URL patterns;
The URL that accesses described homepage associating web pages obtains described homepage associating web pages.
Alternatively, described homepage sign comprise 0,1 and/or current associating web pages in greatest measure.
When the webpage that the present invention is mated with keyword in judgement acquisition is associating web pages, return to the First page information of this webpage and this Webpage correlation, avoided user's repeat search or searched the process of homepage, further reduced the operation of system, reduce taking of system resource, improved the efficiency of search.
The present invention is based on the current webpage grabbing and extract associating web pages URL pattern, adopt associating web pages corresponding to associating web pages URL pattern to set up associating web pages database, avoided repeating to capture webpage, reduced taking of system resource, greatly improved the efficiency of setting up of database.
When the present invention has page turning feature string in the page elements of current web page, adopt preset substitute character to replace the digital block in the URL of current web page, obtain First Characteristic URL prefix, and adopt preset substitute character to replace the digital block in the URL of page turning feature string link, obtain Second Characteristic URL prefix, when described First Characteristic URL prefix is identical with described Second Characteristic URL prefix, corresponding associating web pages URL pattern using described First Characteristic URL prefix or Second Characteristic URL prefix as described current web page, the present invention adopts page turning feature string to identify associating web pages, recognition accuracy is high, adopt the general character of URL partly to mate, further improved the recognition accuracy of associating web pages, recall rate is significantly improved, can identify more than 90% associating web pages in actual applications.
The present invention replaces with by the page turning piece of associating web pages URL pattern the URL that homepage sign obtains homepage associating web pages, in like manner, also page turning piece can be replaced with to the URL that other chaining banners obtain other associating web pages, thereby increased the coverage rate of associating web pages, make it possible to obtain more comprehensively associating web pages, and then realized the operation of fine granularity.
Above-mentioned explanation is only the general introduction of technical solution of the present invention, in order to better understand technological means of the present invention, and can be implemented according to the content of instructions, and for above and other objects of the present invention, feature and advantage can be become apparent, below especially exemplified by the specific embodiment of the present invention.
Accompanying drawing explanation
By reading below detailed description of the preferred embodiment, various other advantage and benefits will become cheer and bright for those of ordinary skills.Accompanying drawing is only for the object of preferred implementation is shown, and do not think limitation of the present invention.And in whole accompanying drawing, by identical reference symbol, represent identical parts.In the accompanying drawings:
Fig. 1 shows the flow chart of steps of a kind of according to an embodiment of the invention associating web pages searching method embodiment;
Fig. 2 shows a kind of according to an embodiment of the invention structure of web page exemplary plot;
Fig. 3 shows the exemplary plot of a kind of page turning piece of one embodiment of the invention; And,
Fig. 4 shows the structured flowchart of a kind of according to an embodiment of the invention associating web pages search system embodiment.
Embodiment
Exemplary embodiment of the present disclosure is described below with reference to accompanying drawings in more detail.Although shown exemplary embodiment of the present disclosure in accompanying drawing, yet should be appreciated that and can realize the disclosure and the embodiment that should do not set forth limits here with various forms.On the contrary, it is in order more thoroughly to understand the disclosure that these embodiment are provided, and can by the scope of the present disclosure complete convey to those skilled in the art.
With reference to Fig. 1, show the flow chart of steps of a kind of associating web pages searching method embodiment of one embodiment of the invention, specifically can comprise the steps:
Step 100, receives searching request; Described request comprises searched key word;
Searching request can refer to that user sends to the be associated request of information search of certain searched key word.For example, user can the search key input frame in browser address bar, search column, search engine in inputted search keyword press enter key or click search button, be equivalent to receive user's searching request.
Step 200, searches in preset associating web pages database according to described searched key word, obtains the webpage mating with described keyword;
On the backstage of search engine, preset associating web pages database, for depositing the information of the associating web pages of collecting.Collected information is generally to show associating web pages content keyword or the phrase of (comprising the connection of the URL address of webpage itself, webpage, the code that forms webpage and turnover webpage).
As a kind of preferred exemplary, can be first a keyword sequence the searched key word segmentation of user's input, with q, represent, the keyword q of user search is split as q={q1, q2, q3 ..., qn}.And then according to user's inquiry mode, for example, be that all words connect together, still middle have space etc., and according to the part of speech of different keywords in q, determine the importance that in required query word, each word occupies in the displaying of Query Result.After being syncopated as search word set q, the corresponding URL sequence of each keyword---index database in q, also according to user's inquiry mode and part of speech, calculate important that each keyword occupies in the displaying of Query Result simultaneously, so only need to carry out a bit comprehensive sort algorithm, can obtain Search Results.
In a preferred embodiment of the present invention, described associating web pages database can be set up in the following manner:
Sub-step S101, whether the webpage that judgement grabs comprises associating web pages URL pattern; If so, carry out sub-step S102;
It should be noted that, the function that search engine automatically extracts webpage WWW can realize by web crawlers.Web crawlers is called again Web Spider, be WebSpider, Web Spider is to find webpage by the chained address of webpage, from the some pages in website (normally homepage), read the content of webpage, find other chained address in webpage, then by these chained addresses, find next webpage, circulation so is always gone down, until all webpages in this website have all been captured.If as a website, Web Spider just can all capture webpages all on internet get off by this principle so whole internet.
Associating web pages URL pattern can be the general character part Pattern of page turning webpage, the set that appearance or functionally similar URL/ webpage are got together and formed.
In a preferred embodiment of the present invention, described sub-step S101 specifically can comprise following sub-step:
Sub-step S11, judges in the page elements of current web page whether have page turning feature string; If so, extract the URL of described page turning feature string link;
Webpage can be divided into a plurality of regions according to function, with (the Bulletin Board System of some forums, BBS) the page is example, as shown in Figure 2, this page can be divided into navigation block (1), executing garbage (2,4), page turning piece (3), title piece (5), author information piece (6), date issued piece (7), text block (8).Wherein, navigation block can be positioned at webpage header top, or the banner of banner(webpage) bottom, be used in reference to the information column to webpage.Executing garbage can be the region with the very low page elements place of the Web page subject degree of correlation, function buttons such as " posting ", " reply ".Page turning piece can be the region of indication page turning.Title piece can be the region at the title of Web page subject (example " secure browser assemble black Thursday " as shown in Figure 2) place.Author information piece is for recording the region of this Web page subject author information.Text block is for recording the region of this Web page subject text.
With reference to Fig. 3, show the exemplary plot of a kind of page turning piece of one embodiment of the invention.
As shown in Figure 3, page turning piece mainly can be comprised of page turning feature string, and page turning feature string can be page turning feature anchor, for for identifying the page elements of page turning.
In specific implementation, page turning feature string can comprise following one or more:
[<<], [>>], [<<], [>>], [<<], [>>], [>], [<], [lower one page], [page up], [upper one], [next], [next], [last page], [endpage], [front page], [rear page], [< page up], [< upper one], [next >], [lower one page >], [1...].
Certainly, above-mentioned page turning feature string, just as example, when implementing the embodiment of the present invention, can arrange other page turning feature strings according to actual conditions, and the embodiment of the present invention is not limited this.
It should be noted that, current web page can be the crawled webpage arriving.
In a preferred embodiment of the present invention, described sub-step S11 further can comprise following sub-step:
Sub-step S111, adopts page turning feature string to mate in the dom tree node of current web page;
Sub-step S112, when the match is successful, judges that current web page has page turning feature string.
DOM(document dbject model, Document Object Model) be the standard program interface of processing extensible markup language.DOM can access and revise the content and structure of a document in a kind of mode that is independent of platform and language, mean and process the common method of a HTML or XML document.
DOM is actually the document model of describing with object-oriented way.DOM has defined and has represented and required object, the behavior of these objects and the relation between attribute and these objects of modification document.DOM can be thought to a tree represenation of data and structure on the page, but the page may not be the mode specific implementation with this tree certainly.
Can the whole html document of reconstruct by JavaScript, can add, remove, change or reset the project on the page.
Change certain thing of the page, JavaScript just needs to obtain the entrance that all elements in html document is conducted interviews.This entrance, together with the method that html element element is added, moves, changed or removes and attribute, all obtains (DOM) by DOM Document Object Model.
Can regard html document as tree construction, and this structure is called as node tree (HTML DOM).By HTML DOM, all nodes in tree all can conduct interviews by JavaScript.All html element elements (node) all can be modified, and also can create or deletion of node.
Node in node tree has hierarchical relationship each other.Can adopt the terms such as father (parent), son (child) and compatriot (sibling) to be used for describing these relations.Wherein, father node has child node.Child node at the same level is called as compatriot (brothers or sisters).In node tree, top node is called as root (root).Each node has father node, except root (it does not have father node).A node can have the son of any amount, and compatriot is the node that has identical father node.
Specifically can at node tree, search by several method the web page element of wishing operation:
For example, can be by using getElementById () and getElementsByTagName () method to search.
Again for example, can be by using parentNode, firstChild and the lastChild attribute of a node element.
Wherein, these two kinds of methods of getElementById () and getElementsByTagName (), can search any html element element in whole html document.And these two kinds of methods can be ignored the structure of document.If search <p> elements all in document, getElementsByTagName () can all find them, no matter which level of <p> element in document.Meanwhile, getElementById () method also can be returned to correct element, no matter where it is hidden in file structure.These two kinds of methods can provide any needed html element element, no matter their residing positions in document.
In addition, getElementById () can return to web page element by the ID of appointment.
In specific implementation, can be by hyperlink <a>(anchor in the html text dom tree of this webpage of identification, anchor point) whether sign comprises [<<], [>>], [<<], [>>], [<<], [>>], [>], [<], [lower one page], [page up], [upper one], [next], [next], [last page], [endpage], [front page], [rear page], [< page up], [< upper one], [next >], [lower one page >], one or more in [1...], if, judge that current web page has page turning feature string.
Wherein, <a> can be for being connected to the text of current location or picture other the page, text or image etc.
The basic syntax structure of < a > sign can be as follows:
<a
class=type
id=value
href=reference
name=value
rel=same|next|parent|previous
rev=value
target=window
style=value
title=title
onclick=function
onmouseout=function
Code </a > of onMouseOver=function > display text or picture
For example in following a kind of html text, the content of <a> sign is:
<divid="pgt"class="bmbw0pgscl">
<spanid="fd_page_top">
<divclass="pg">
<a
href="forum-99-1.html"class="prev"></a>
<a
href="forum-99-1.html">1</a><strong>2<>
<a
href="forum-99-3.html">3</a>
<a
href="forum-99-4.html">4</a>
<a
href="forum-99-5.html">5</a>
<a
href="forum-99-6.html">6</a>
<a
href="forum-99-7.html">7</a>
<a
href="forum-99-8.html">8</a>
<a
href="forum-99-9.html">9</a>
<a
href="forum-99-10.html">10</a>
<a
href="forum-99-1000.html"class="last">...2107</a>
<label>
" the input page number, by the quick redirect of carriage return " value=" 2 " onkeydown=" if (event.keyCode==13) { window.location='forum.php mod=forumdisplay & fid=99 & page='+this.valu e for <inputtype=" text " name=" custompage " class=" px " size=" 2 " title=; Doane (event); "/>
<spantitle=" totally 1000 pages " >/1000 page </span>
</label>
<a
One page </a> under href=" forum-99-3.html " class=" nxt " >
</div>
</span>
Coupling by <a> sign in html text, can judge that this webpage has one or more page turning feature strings.After identifying these one or more page turning feature strings, extract one or more URL of these one or more page turning feature string links, these one or more URL point to other the page turning webpage associated with current web page.
Sub-step S12, adopts preset substitute character to replace the digital block in the URL of current web page, obtains First Characteristic URL prefix; Wherein, described digital block is to be spaced apart individual digit or a plurality of numeral that sign is partitioned into;
Sub-step S13, adopts preset substitute character to replace the digital block in the URL of described page turning feature string link, obtains Second Characteristic URL prefix;
It should be noted that, substitute character can be any character, and the embodiment of the present invention is not limited this.Spacing identification can in URL for the symbol at interval, for example "/", ". ", "-", "? ", ": " etc.Digital block need to be numeral continuous in spacing identification, and for example " 123ABC " is not digital block.
In an embodiment of the present invention, described sub-step S12 further can comprise following sub-step:
Sub-step S121, adopts identical substitute character to replace the digital block of diverse location in the URL of current web page, obtains First Characteristic URL prefix;
With sub-step S121 accordingly, described sub-step S13 further can comprise following sub-step:
Sub-step S131, adopts identical substitute character to replace the digital block of diverse location in the URL of described feature string link, obtains Second Characteristic URL prefix.
In specific implementation, the URL that the URL of current web page is connected with page turning feature string can have one or more digital blocks, for reducing the operation steps of replacement and the resource occupation of system, can replace digital block with identical substitute character.
For example, the URL of current web page is http://bbs.XXX.com/forum-99-2.html, the URL that page turning feature string connects is http://bbs.XXX.com/forum-99-3.html, wherein " 99 ", " 2 " are identified is digital block, with " (d+) " a kind of example of character as an alternative, First Characteristic URL prefix can be the .html of http://bbs.XXX.com/forum-(d+)-(d+), and Second Characteristic URL prefix can be the .html of http://bbs.XXX.com/forum-(d+)-(d+).
In an embodiment of the present invention, described sub-step S12 further can comprise following sub-step:
Sub-step S122, adopts respectively different substitute characters, and the digital block of diverse location in the URL of replacement current web page, obtains First Characteristic URL prefix;
With sub-step S122 accordingly, described sub-step S13 further can comprise following sub-step:
Sub-step S132, adopts respectively the substitute character identical with First Characteristic URL to replace URL that described feature string links in the digital block of same position, acquisition Second Characteristic URL prefix.
In specific implementation, the URL that the URL of current web page is connected with page turning feature string can have one or more digital blocks, for improving judgement and the efficiency to the sign of digital block whether follow-up First Characteristic URL prefix is identical with Second Characteristic URL, can adopt different substitute characters to replace digital block.
For example, the URL of current web page is http://bbs.XXX.com/forum-99-2.html, the URL that page turning feature string connects is http://bbs.XXX.com/forum-99-3.html, wherein " 99 ", " 2 " are identified is digital block, with " (d+) ", " (e+) " a kind of example of character as an alternative, First Characteristic URL prefix can be the .html of http://bbs.XXX.com/forum-(d+)-(e+), and Second Characteristic URL prefix can be the .html of http://bbs.XXX.com/forum-(d+)-(e+).
Sub-step S14, when described First Characteristic URL prefix is identical with described Second Characteristic URL prefix, judges whether the webpage grabbing comprises associating web pages URL pattern.
In actual applications, when First Characteristic URL prefix is identical with Second Characteristic URL prefix, the URL that shows current web page is identical with the general character of the URL of page turning feature string link, can judge that current web page comprises associated page turning webpage with the webpage corresponding to URL of page turning feature string link.
Sub-step S102, obtains described associating web pages URL pattern;
In an embodiment of the present invention, described sub-step S102 specifically can comprise following sub-step:
Sub-step S21, the corresponding associating web pages URL pattern using described First Characteristic URL prefix or Second Characteristic URL prefix as described current web page.
Because First Characteristic URL prefix is identical with Second Characteristic URL, using First Characteristic URL prefix or Second Characteristic URL prefix all can as the corresponding associating web pages URL pattern Pattern of current web page.
When the present invention has page turning feature string in the page elements of current web page, adopt preset substitute character to replace the digital block in the URL of current web page, obtain First Characteristic URL prefix, and adopt preset substitute character to replace the digital block in the URL of page turning feature string link, obtain Second Characteristic URL prefix, when described First Characteristic URL prefix is identical with described Second Characteristic URL prefix, corresponding associating web pages URL pattern using described First Characteristic URL prefix or Second Characteristic URL prefix as described current web page, the present invention adopts page turning feature string to identify associating web pages, recognition accuracy is high, adopt the general character of URL partly to mate, further improved the recognition accuracy of associating web pages, recall rate is significantly improved, can identify more than 90% associating web pages in actual applications.
Sub-step S103, adopts described associating web pages URL pattern to obtain corresponding associating web pages;
In specific implementation, associating web pages can comprise homepage associating web pages and other associating web pages, wherein, homepage associating web pages generally can record important content, example text block as shown in Figure 3, therefore the important ratio of homepage associating web pages is higher, therefore knows that homepage associating web pages has important meaning.
In a preferred embodiment of the present invention, described sub-step S103 specifically can comprise following sub-step:
Sub-step S31, partly carries out structure analysis by the general character in associating web pages URL pattern, extracts the page turning piece in associating web pages URL pattern, and described page turning piece is replaced with to the URL that homepage sign obtains homepage associating web pages; Wherein, described page turning piece is the identical but digital different digital block in position in a plurality of associating web pages URL patterns;
Sub-step S32, the URL that accesses described homepage associating web pages obtains described homepage associating web pages.
In actual applications, URL can comprise one or more following structures:
1, protocol(agreement): specify the host-host protocol using, the most frequently used is http protocol, and it is also agreement most widely used in current WWW.Particularly, host-host protocol comprises that (resource is the file on local computer to file agreement, form is file: // /), ftp agreement is (by FTP access resources, form is FTP: //), gopher(is by Gopher protocol access resource), http agreement is (by HTTP access resources, form is HTTP: //), https agreement (by the HTTPS access resources of safety, form is HTTPS: //) etc.
2, hostname(host name): domain name system (DNS) host name or the IP address that refer to deposit the server of resource.Sometimes, before host name, also can comprise and be connected to the required username and password of server (form is username:password).
3, port(port numbers): the default port of operational version during omission, various host-host protocols have the port numbers of acquiescence, if the default port of http is 80.If omit during input, use default port number.Sometimes for safety or other, consider, can on server, to port, redefine, adopt non-standard ports number, now, in URL, just can not omit port numbers this.
4, path(path): by zero or the character string that separates of a plurality of "/" symbols, be generally used for representing catalogue or file address on main frame.
5, parameters(parameter): the option that can be used to specify special parameter.
6, query (inquiry): can be for giving dynamic web page (as used the webpage of the fabrication techniques such as CGI, ISAPI, PHP/JSP/ASP/ASP.NET) Transfer Parameters, can there be a plurality of parameters, with " & " symbol, separate, name and the value of each parameter separate with "=" symbol.
7, fragment(pieces of information): can be used to specify the segment in Internet resources.For example in a webpage, there are a plurality of explanations of nouns, can use fragment to be directly targeted to a certain explanation of nouns.
In specific implementation, by the general character in a plurality of associating web pages URL patterns is partly carried out to structure analysis, extract the page turning piece in associating web pages URL pattern, then described page turning piece is replaced with to the URL that homepage sign obtains homepage associating web pages.
For example, for associating web pages URL pattern-http://bbs.XXX.com/forum-of above-mentioned example (d+)-(e+) .html, (e+) is page turning piece identifying, then page turning piece is replaced with after homepage sign, obtain URL-http://bbs.XXX.com/forum-99-1.html of homepage associating web pages.
In a kind of preferred exemplary of the embodiment of the present invention, described homepage sign can comprise 0,1 and/or current associating web pages in greatest measure.
Different websites can adopt different page turning structures, has caused the difference of homepage associating web pages.For example, some website can adopt the 0th page as homepage associating web pages, and some website can adopt the 1st page as homepage associating web pages, and some website can adopt maximum page (example as shown in Figure 3 2100) as homepage associating web pages, etc.
Certainly, above-mentioned homepage associating web pages is just as example, and when implementing the embodiment of the present invention, the sign that can numeral be replaced with to arbitrary associating web pages soon according to actual conditions is obtained corresponding associating web pages, and the embodiment of the present invention is not described in detail one by one to this.
The present invention replaces with by the page turning piece of associating web pages URL pattern the URL that homepage sign obtains homepage associating web pages, in like manner, also page turning piece can be replaced with to the URL that other chaining banners obtain other associating web pages, thereby increased the coverage rate of associating web pages, make it possible to obtain more comprehensively associating web pages, and then realized the operation of fine granularity.
Sub-step S104, adopts associating web pages corresponding to described associating web pages URL pattern to set up associating web pages database.
In specific implementation, associating web pages corresponding to associating web pages URL pattern can comprise homepage associating web pages and other associating web pages, can be the whole of all associating web pages, can be also the part of all associating web pages, and the embodiment of the present invention is not limited this.
As a kind of preferred exemplary, the web page files that can capture spider carries out data processing, specifically can comprise:
1, Web page structural.The HTML code that is associating web pages deletes, and extracts web page contents.
2, de-noising.In Web page structural, deleted HTML code, be left web page contents, what de-noising referred to so is exactly the subject content that leaves webpage, deletes content useless, such as copyright.
3, look into heavily.Search webpage and the content of repetition, if find the page of repetition, just delete.
4, participle.Extract web page contents, be then divided into N word, arrange out, deposit index database in, also can calculate this word and occur how many times at this page simultaneously.
5, link analysis.The backward chaining of query page, derives chain and is connected to how many and interior chain, then gives the how many weight of this page etc.
After having carried out the data processing of top, the data that just these can be handled well are stored in associating web pages database.
The present invention is based on the current webpage grabbing and extract associating web pages URL pattern, adopt associating web pages corresponding to associating web pages URL pattern to set up associating web pages database, avoided repeating to capture webpage, reduced taking of system resource, greatly improved the efficiency of setting up of database.
Step 300, judges whether described webpage is associating web pages; If so, perform step 400;
In specific implementation, judge whether described webpage comprises that associating web pages URL pattern can judge whether described webpage is associating web pages.When described webpage comprises associating web pages URL pattern, judge that described webpage is associating web pages.
Step 400, returns to the First page information of described webpage and described Webpage correlation.
The embodiment of the present invention can store the corresponding relation of the webpage of associating web pages URL pattern and correspondence thereof, as long as the corresponding relation of the associating web pages URL pattern of the described webpage of inquiry and the webpage of correspondence thereof can obtain the homepage of described Webpage correlation.
After obtaining Search Results, search engine can be illustrated in Search Results on the interface of user's reading and use for user.
When the webpage that the present invention is mated with keyword in judgement acquisition is associating web pages, return to the First page information of this webpage and this Webpage correlation, avoided user's repeat search or searched the process of homepage, further reduced the operation of system, reduce taking of system resource, improved the efficiency of search.
For embodiment of the method, for simple description, therefore it is all expressed as to a series of combination of actions, but those skilled in the art should know, the present invention is not subject to the restriction of described sequence of movement, because according to the present invention, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in instructions all belongs to preferred embodiment, and related action and module might not be that the present invention is necessary.
With reference to Fig. 4, show the structured flowchart of a kind of associating web pages search system embodiment of one embodiment of the invention, specifically can comprise as lower module:
Searching request receiver module 401, is suitable for receiving searching request; Described request comprises searched key word;
Coupling webpage obtains module 402, is suitable for searching in preset associating web pages database according to described searched key word, obtains the webpage mating with described keyword;
Multipage associating web pages judge module 403, is suitable for judging whether described webpage is associating web pages; If so, recalls information is returned to module 404;
Information is returned to module 404, is suitable for returning the First page information of described webpage and described Webpage correlation.
In a preferred embodiment of the present invention, described associating web pages database is set up by calling following submodule:
Associating web pages URL judges submodule, is suitable for judging whether the webpage grabbing comprises associating web pages URL pattern; If so, call associating web pages URL pattern and obtain submodule;
Associating web pages URL pattern is obtained submodule, is suitable for obtaining described associating web pages URL pattern;
Associating web pages obtains submodule, is suitable for adopting described associating web pages URL pattern to obtain corresponding associating web pages;
Associating web pages Database submodule, is suitable for adopting associating web pages corresponding to described associating web pages URL pattern to set up associating web pages database.
In a preferred embodiment of the present invention, described associating web pages URL pattern submodule specifically can comprise as lower unit:
Page turning feature string judging unit, is suitable for judging in the page elements of current web page whether have page turning feature string; If so, call URL extraction unit;
URL extraction unit, is suitable for extracting the URL of described page turning feature string link;
First Characteristic RUL prefix obtains unit, and the digital block in the URL that is suitable for adopting preset substitute character to replace current web page obtains First Characteristic URL prefix; Wherein, described digital block is to be spaced apart individual digit or a plurality of numeral that sign is partitioned into;
Second Characteristic RUL prefix obtains unit, is suitable for adopting preset substitute character to replace the digital block in the URL of described page turning feature string link, obtains Second Characteristic URL prefix;
Identifying unit, is suitable for associating web pages URL pattern when described First Characteristic URL prefix is identical with described Second Characteristic URL prefix, judges whether the webpage grabbing comprises associating web pages URL pattern.
In a preferred embodiment of the present invention, described page turning feature string judging unit can also be suitable for:
Adopt page turning feature string to mate in the dom tree node of current web page;
When the match is successful, judge that current web page has page turning feature string.
In a preferred embodiment of the present invention, described First Characteristic RUL prefix obtains unit and can also be suitable for:
Adopt identical substitute character to replace the digital block of diverse location in the URL of current web page, obtain First Characteristic URL prefix;
Described Second Characteristic RUL prefix obtains unit and is also suitable for:
Adopt identical substitute character to replace the digital block of diverse location in the URL of described feature string link, obtain Second Characteristic URL prefix.
In a preferred embodiment of the present invention, described First Characteristic RUL prefix obtains unit and can also be suitable for:
Adopt respectively different substitute characters, the digital block of diverse location in the URL of replacement current web page, obtains First Characteristic URL prefix;
Described Second Characteristic RUL prefix obtains unit and is also suitable for:
Adopt respectively the substitute character identical with First Characteristic URL to replace URL that described page turning feature string links in the digital block of same position, acquisition Second Characteristic URL prefix.
In a preferred embodiment of the present invention, described associating web pages URL pattern acquisition module can also be suitable for:
Corresponding associating web pages URL pattern using described First Characteristic URL prefix or Second Characteristic URL prefix as described current web page.
In a preferred embodiment of the present invention, associated webpage acquisition module can also be suitable for:
By the general character in associating web pages URL pattern, partly carry out structure analysis, extract the page turning piece in associating web pages URL pattern, described page turning piece is replaced with to the URL that homepage sign obtains homepage associating web pages; Wherein, described page turning piece is the identical but digital different digital block in position in a plurality of associating web pages URL patterns;
The URL that accesses described homepage associating web pages obtains described homepage associating web pages.
In a kind of preferred exemplary of the embodiment of the present invention, described homepage sign comprise 0,1 and/or current associating web pages in greatest measure.
For the system embodiment of Fig. 4, because it is substantially similar to the embodiment of the method for Fig. 1, so description is fairly simple, relevant part is referring to the part explanation of embodiment of the method.
The algorithm providing at this is intrinsic not relevant to any certain computer, virtual system or miscellaneous equipment with demonstration.Various general-purpose systems also can with based on using together with this teaching.According to description above, it is apparent constructing the desired structure of this type systematic.In addition, the present invention is not also for any certain programmed language.It should be understood that and can utilize various programming languages to realize content of the present invention described here, and the description of above language-specific being done is in order to disclose preferred forms of the present invention.
In the instructions that provided herein, a large amount of details have been described.Yet, can understand, embodiments of the invention can not put into practice in the situation that there is no these details.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the disclosure and to help to understand one or more in each inventive aspect, in the above in the description of exemplary embodiment of the present invention, each feature of the present invention is grouped together into single embodiment, figure or sometimes in its description.Yet, the method for the disclosure should be construed to the following intention of reflection: the present invention for required protection requires than the more feature of feature of clearly recording in each claim.Or rather, as reflected in claims below, inventive aspect is to be less than all features of disclosed single embodiment above.Therefore, claims of following embodiment are incorporated to this embodiment thus clearly, and wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that and can the module in the equipment in embodiment are adaptively changed and they are arranged in one or more equipment different from this embodiment.Module in embodiment or unit or assembly can be combined into a module or unit or assembly, and can put them into a plurality of submodules or subelement or sub-component in addition.At least some in such feature and/or process or unit are mutually repelling, and can adopt any combination to combine all processes or the unit of disclosed all features in this instructions (comprising claim, summary and the accompanying drawing followed) and disclosed any method like this or equipment.Unless clearly statement in addition, in this instructions (comprising claim, summary and the accompanying drawing followed) disclosed each feature can be by providing identical, be equal to or the alternative features of similar object replaces.
In addition, those skilled in the art can understand, although embodiment more described herein comprise some feature rather than further feature included in other embodiment, the combination of the feature of different embodiment means within scope of the present invention and forms different embodiment.For example, in the following claims, the one of any of embodiment required for protection can be used with array mode arbitrarily.
All parts embodiment of the present invention can realize with hardware, or realizes with the software module moved on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that and can use in practice microprocessor or digital signal processor (DSP) to realize according to the some or all functions of the some or all parts in the associating web pages search equipment of the embodiment of the present invention.The present invention for example can also be embodied as, for carrying out part or all equipment or device program (, computer program and computer program) of method as described herein.Realizing program of the present invention and can be stored on computer-readable medium like this, or can there is the form of one or more signal.Such signal can be downloaded and obtain from internet website, or provides on carrier signal, or provides with any other form.
It should be noted above-described embodiment the present invention will be described rather than limit the invention, and those skilled in the art can design alternative embodiment in the situation that do not depart from the scope of claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and is not listed as element or step in the claims.Being positioned at word " " before element or " one " does not get rid of and has a plurality of such elements.The present invention can be by means of including the hardware of some different elements and realizing by means of the computing machine of suitably programming.In having enumerated the unit claim of some devices, several in these devices can be to carry out imbody by same hardware branch.The use of word first, second and C grade does not represent any order.Can be title by these word explanations.

Claims (10)

1. an associating web pages searching method, comprising:
Receive searching request; Described request comprises searched key word;
According to described searched key word, in preset associating web pages database, search, obtain the webpage mating with described keyword;
Judge whether described webpage is associating web pages; If so, return to the First page information of described webpage and described Webpage correlation.
2. the method for claim 1, is characterized in that, described associating web pages database is set up in the following manner:
Whether the webpage that judgement grabs comprises associating web pages URL pattern; If so, obtain described associating web pages URL pattern;
Adopt described associating web pages URL pattern to obtain corresponding associating web pages;
Adopt associating web pages corresponding to described associating web pages URL pattern to set up associating web pages database.
3. method as claimed in claim 2, is characterized in that, whether the webpage that described judgement grabs comprises that the step of associating web pages URL pattern comprises:
Judge in the page elements of current web page and whether there is page turning feature string; If so, extract the URL of described page turning feature string link;
Adopt preset substitute character to replace the digital block in the URL of current web page, obtain First Characteristic URL prefix; Wherein, described digital block is to be spaced apart individual digit or a plurality of numeral that sign is partitioned into;
Adopt preset substitute character to replace the digital block in the URL of described page turning feature string link, obtain Second Characteristic URL prefix;
When described First Characteristic URL prefix is identical with described Second Characteristic URL prefix, judge whether the webpage grabbing comprises associating web pages URL pattern.
4. method as claimed in claim 3, is characterized in that, the step whether in the described page elements that judges current web page with page turning feature string comprises:
Adopt page turning feature string to mate in the dom tree node of current web page;
When the match is successful, judge that current web page has page turning feature string.
5. method as claimed in claim 3, is characterized in that, the digital block in the URL of the substitute character replacement current web page that described employing is preset, and the step that obtains First Characteristic URL prefix is:
Adopt identical substitute character to replace the digital block of diverse location in the URL of current web page, obtain First Characteristic URL prefix;
The preset substitute character of described employing is replaced the digital block in the URL of described page turning feature string link, and the step that obtains Second Characteristic URL prefix is:
Adopt identical substitute character to replace the digital block of diverse location in the URL of described feature string link, obtain Second Characteristic URL prefix.
6. method as claimed in claim 3, is characterized in that, the digital block in the URL of the substitute character replacement current web page that described employing is preset, and the step that obtains First Characteristic URL prefix is:
Adopt respectively different substitute characters, the digital block of diverse location in the URL of replacement current web page, obtains First Characteristic URL prefix;
The preset substitute character of described employing is replaced the digital block in the URL of described page turning feature string link, and the step that obtains Second Characteristic URL prefix is:
Adopt respectively the substitute character identical with First Characteristic URL to replace URL that described page turning feature string links in the digital block of same position, acquisition Second Characteristic URL prefix.
7. the method as described in claim 3 or 4 or 5 or 6, is characterized in that, the step that the described associating web pages URL of described employing pattern is obtained corresponding associating web pages comprises:
Corresponding associating web pages URL pattern using described First Characteristic URL prefix or Second Characteristic URL prefix as described current web page.
8. the method as described in claim 3 or 4 or 5 or 6, is characterized in that, described in obtain associating web pages corresponding to described associating web pages URL pattern step comprise:
By the general character in associating web pages URL pattern, partly carry out structure analysis, extract the page turning piece in associating web pages URL pattern, described page turning piece is replaced with to the URL that homepage sign obtains homepage associating web pages; Wherein, described page turning piece is the identical but digital different digital block in position in a plurality of associating web pages URL patterns;
The URL that accesses described homepage associating web pages obtains described homepage associating web pages.
9. method as claimed in claim 8, is characterized in that, described homepage sign comprise 0,1 and/or current associating web pages in greatest measure.
10. an associating web pages search system, comprising:
Searching request receiver module, is suitable for receiving searching request; Described request comprises searched key word;
Coupling webpage obtains module, is suitable for searching in preset associating web pages database according to described searched key word, obtains the webpage mating with described keyword;
Multipage associating web pages judge module, is suitable for judging whether described webpage is associating web pages; If so, recalls information is returned to module;
Information is returned to module, is suitable for returning the First page information of described webpage and described Webpage correlation.
CN201310603918.0A 2013-11-25 2013-11-25 A kind of associating web pages searching method and system Expired - Fee Related CN103617225B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201310603918.0A CN103617225B (en) 2013-11-25 2013-11-25 A kind of associating web pages searching method and system
PCT/CN2014/086522 WO2015074455A1 (en) 2013-11-25 2014-09-15 Method and apparatus for computing url pattern of associated webpage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310603918.0A CN103617225B (en) 2013-11-25 2013-11-25 A kind of associating web pages searching method and system

Publications (2)

Publication Number Publication Date
CN103617225A true CN103617225A (en) 2014-03-05
CN103617225B CN103617225B (en) 2019-03-08

Family

ID=50167928

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310603918.0A Expired - Fee Related CN103617225B (en) 2013-11-25 2013-11-25 A kind of associating web pages searching method and system

Country Status (1)

Country Link
CN (1) CN103617225B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015074455A1 (en) * 2013-11-25 2015-05-28 北京奇虎科技有限公司 Method and apparatus for computing url pattern of associated webpage
WO2015154540A1 (en) * 2014-08-01 2015-10-15 中兴通讯股份有限公司 Content searching method and device, and terminal
CN106611022A (en) * 2015-10-27 2017-05-03 北京国双科技有限公司 Method and device for increasing website search efficiency
CN109597927A (en) * 2018-12-05 2019-04-09 贵阳高新数通信息有限公司 Bidding related web page page info extracting method and system
CN110781497A (en) * 2019-10-21 2020-02-11 新华三信息安全技术有限公司 Method for detecting web page link and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117275A (en) * 2009-12-31 2011-07-06 北大方正集团有限公司 Method and device for collecting webpage data of direction site based on internet
CN102355488A (en) * 2011-08-15 2012-02-15 北京星网锐捷网络技术有限公司 Crawler seed obtaining method and equipment and crawler crawling method and equipment
CN102810110A (en) * 2012-05-07 2012-12-05 北京京东世纪贸易有限公司 Method and system for acquiring web text data
CN103123640A (en) * 2012-02-22 2013-05-29 深圳市谷古科技有限公司 Method and device for searching novel
CN103365924A (en) * 2012-04-09 2013-10-23 北京大学 Method, device and terminal for searching information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117275A (en) * 2009-12-31 2011-07-06 北大方正集团有限公司 Method and device for collecting webpage data of direction site based on internet
CN102355488A (en) * 2011-08-15 2012-02-15 北京星网锐捷网络技术有限公司 Crawler seed obtaining method and equipment and crawler crawling method and equipment
CN103123640A (en) * 2012-02-22 2013-05-29 深圳市谷古科技有限公司 Method and device for searching novel
CN103365924A (en) * 2012-04-09 2013-10-23 北京大学 Method, device and terminal for searching information
CN102810110A (en) * 2012-05-07 2012-12-05 北京京东世纪贸易有限公司 Method and system for acquiring web text data

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015074455A1 (en) * 2013-11-25 2015-05-28 北京奇虎科技有限公司 Method and apparatus for computing url pattern of associated webpage
WO2015154540A1 (en) * 2014-08-01 2015-10-15 中兴通讯股份有限公司 Content searching method and device, and terminal
CN106611022A (en) * 2015-10-27 2017-05-03 北京国双科技有限公司 Method and device for increasing website search efficiency
CN106611022B (en) * 2015-10-27 2020-03-03 北京国双科技有限公司 Method and device for improving search efficiency in website
CN109597927A (en) * 2018-12-05 2019-04-09 贵阳高新数通信息有限公司 Bidding related web page page info extracting method and system
CN110781497A (en) * 2019-10-21 2020-02-11 新华三信息安全技术有限公司 Method for detecting web page link and storage medium

Also Published As

Publication number Publication date
CN103617225B (en) 2019-03-08

Similar Documents

Publication Publication Date Title
CN107808000B (en) System and method for collecting and extracting data of dark net
CN100478949C (en) Query rewriting with entity detection
US8185530B2 (en) Method and system for web document clustering
CN102831252B (en) A kind of method for upgrading index data base and device, searching method and system
CN106095979B (en) URL merging processing method and device
CN1952929A (en) Extraction method and system of structured data of internet based on sample &amp; faced to regime
CN103605688A (en) Intercept method and intercept device for homepage advertisements and browser
JP2007122732A (en) Method for searching dates efficiently in collection of web documents, computer program, and service method (system and method for searching dates efficiently in collection of web documents)
CN102200980A (en) Method and system for providing network resources
CN107437026B (en) Malicious webpage advertisement detection method based on advertisement network topology
CN104715064A (en) Method and server for marking keywords on webpage
CN103617225A (en) Associated webpage searching method and system
CN102880711A (en) Processing method and processing device for input data in browser address bar
CN103577566A (en) Web reading content loading method and device
CN102982117A (en) Information search method and device
US20150100563A1 (en) Method for retaining search engine optimization in a transferred website
CN102982118A (en) Searching method and device based on favorites
CN103631906A (en) Method and device for recognizing page number identification in webpage URL
CN104391978A (en) Method and device for storing and processing web pages of browsers
CN103970800A (en) Method and system for extracting and processing webpage related keywords
CN102567521A (en) Webpage data capturing and filtering method
CN103618742A (en) Method and system for acquiring sub domain names and webmaster permission verification method
CN103366011A (en) Method and device for visiting authenticated websites by browser address bar
CN104778232B (en) Searching result optimizing method and device based on long query
WO2015074455A1 (en) Method and apparatus for computing url pattern of associated webpage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190308

Termination date: 20211125

CF01 Termination of patent right due to non-payment of annual fee