CN104391978A - Method and device for storing and processing web pages of browsers - Google Patents

Method and device for storing and processing web pages of browsers Download PDF

Info

Publication number
CN104391978A
CN104391978A CN201410742954.XA CN201410742954A CN104391978A CN 104391978 A CN104391978 A CN 104391978A CN 201410742954 A CN201410742954 A CN 201410742954A CN 104391978 A CN104391978 A CN 104391978A
Authority
CN
China
Prior art keywords
browser
collection webpage
webpage
text
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410742954.XA
Other languages
Chinese (zh)
Other versions
CN104391978B (en
Inventor
伯诺克
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201410742954.XA priority Critical patent/CN104391978B/en
Publication of CN104391978A publication Critical patent/CN104391978A/en
Application granted granted Critical
Publication of CN104391978B publication Critical patent/CN104391978B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for storing and processing web pages of browsers. The method for storing and processing the web pages of the browsers includes receiving search keywords for searching the required-to-be-browsed web pages from the stored web pages of the browsers; matching the search keywords with the stored web pages of the browsers to obtain addresses of the matched stored web pages; outputting the addresses of the matched stored web pages. The method and the device have the advantages that the problem of low efficiency when target web pages are searched from stored web pages of existing browsers can be solved, and accordingly an effect of improving the efficiency when the target web pages are searched from the stored web pages of the browsers can be realized.

Description

For web page storage disposal route and the device of browser
Technical field
The present invention relates to internet arena, in particular to a kind of web page storage disposal route for browser and device.
Background technology
Existing browser has the function of collection webpage.The URL address of webpage and the title of this webpage of user's preservation is have recorded in web page storage folder.When user needs the webpage of again accessing collection, these webpages can be found to conduct interviews by the title of the network address in collection or webpage.Although aforesaid way can allow user find the webpage of collection, when collection record is a lot, the webpage identifying needs can only be removed by the title in collection.But the title of webpage usually can not represent web page contents, or some keyword of the web page contents of user's care is not included in the title of the webpage of collection, makes user be difficult to the webpage finding needs to access fast in the webpage of a large amount of collection.
For the inefficient problem of searching target web in correlation technique from the collection webpage of browser, at present effective solution is not yet proposed.
Summary of the invention
Fundamental purpose of the present invention is to provide a kind of web page storage disposal route for browser and device, to solve the inefficient problem of searching target web from the collection webpage of browser.
To achieve these goals, according to an aspect of the present invention, a kind of web page storage disposal route for browser is provided.
Web page storage disposal route for browser according to the present invention comprises: receive search key, and wherein, search key is used for searching from the collection webpage of browser the webpage needing to browse; Search key is mated with the collection webpage of browser, obtains the address of the collection webpage mated; The address of the collection webpage of output matching.
Further, search key is carried out mating comprising with the collection webpage of browser: the title and the content of text that obtain the collection webpage of browser; And the title of the collection webpage of browser and content of text are mated with search key, wherein, if title and the content of text of the collection webpage of browser mate with search key, then deterministic retrieval keyword mates with the collection webpage of browser, if title and the content of text of the collection webpage of browser do not mate with search key, then deterministic retrieval keyword does not mate with the collection webpage of browser.
Further, before the title of the collection webpage by browser and content of text mate with search key, method also comprises: the content of text obtaining the collection webpage of browser; Obtain network address and the title of the collection webpage of browser; And the content of text of the collection webpage of storage browser, network address and title.
Further, the content of text obtaining the collection webpage of browser comprises: the address obtaining the collection webpage of browser; According to the address access collection webpage of the collection webpage of browser; And content of text is crawled from collection webpage in the process of access collection webpage, obtain the content of text of the collection webpage of browser.
Further, from the process at access collection webpage, crawl content of text from collection webpage, the content of text obtaining the collection webpage of browser comprises: the HTML (Hypertext Markup Language) label filtering the collection webpage of browser; And content of text is crawled from the collection webpage of the browser of filtration HTML (Hypertext Markup Language) label, obtain the content of text of the collection webpage of browser.
Further, content of text is crawled from collection webpage in the process of access collection webpage, after obtaining the content of text of collection webpage of browser, method also comprises: from the content of text of the collection webpage of browser, obtain keyword, obtains the keyword of the collection webpage of browser; Store the keyword of the collection webpage of browser, network address and title, the title of the collection webpage of browser and content of text are carried out mating comprising with search key: the keyword of the collection webpage of browser and title are mated with search key.
To achieve these goals, according to a further aspect in the invention, a kind of web page storage treating apparatus for browser is provided.
Web page storage treating apparatus for browser according to the present invention comprises: receiving element, and for receiving search key, wherein, search key is used for searching from the collection webpage of browser the webpage needing to browse; Matching unit, for being mated with the collection webpage of browser by search key, obtains the address of the collection webpage mated; And output unit, for the address of the collection webpage of output matching.
Further, matching unit comprises: the first acquisition module, for obtaining title and the content of text of the collection webpage of browser; And matching module, mate with search key for the title of the collection webpage by browser and content of text, wherein, if title and the content of text of the collection webpage of browser mate with search key, then deterministic retrieval keyword mates with the collection webpage of browser, if title and the content of text of the collection webpage of browser do not mate with search key, then deterministic retrieval keyword does not mate with the collection webpage of browser.
Further, device also comprises: the first acquiring unit, for obtaining the content of text of the collection webpage of browser; Second acquisition unit, for obtaining network address and the title of the collection webpage of browser; And storage unit, for storing the content of text of the collection webpage of browser, network address and title.
Further, the first acquiring unit comprises: the second acquisition module, obtains the address of the collection webpage of browser; Access modules, for the address access collection webpage of the collection webpage according to browser; And crawl module, for crawling content of text from collection webpage in the process of access collection webpage, obtain the content of text of the collection webpage of browser.
Pass through the present invention, adopt the mode of retrieval from the collection webpage of browser, search the collection webpage needing access, solve the inefficient problem of searching target web from the collection webpage of browser, and then reach the effect improving and search the efficiency of target web from the collection webpage of browser.
Accompanying drawing explanation
The accompanying drawing forming a application's part is used to provide a further understanding of the present invention, and schematic description and description of the present invention, for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the process flow diagram of the web page storage disposal route for browser according to the embodiment of the present invention; And
Fig. 2 is the schematic diagram of the web page storage treating apparatus for browser according to the embodiment of the present invention.
Embodiment
It should be noted that, when not conflicting, the embodiment in the application and the feature in embodiment can combine mutually.Below with reference to the accompanying drawings and describe the present invention in detail in conjunction with the embodiments.
The application's scheme is understood better in order to make those skilled in the art person, below in conjunction with the accompanying drawing in the embodiment of the present application, technical scheme in the embodiment of the present application is clearly and completely described, obviously, described embodiment is only the embodiment of the application's part, instead of whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not making the every other embodiment obtained under creative work prerequisite, all should belong to the scope of the application's protection.
It should be noted that, term " first ", " second " etc. in the instructions of the application and claims and above-mentioned accompanying drawing are for distinguishing similar object, and need not be used for describing specific order or precedence.Should be appreciated that the data used like this can be exchanged, in the appropriate case so that the embodiment of the application described herein.In addition, term " comprises " and " having " and their any distortion, intention is to cover not exclusive comprising, such as, contain those steps or unit that the process of series of steps or unit, method, system, product or equipment is not necessarily limited to clearly list, but can comprise clearly do not list or for intrinsic other step of these processes, method, product or equipment or unit.
Embodiments provide a kind of web page storage disposal route for browser, Fig. 1 is the process flow diagram of the web page storage disposal route for browser according to the embodiment of the present invention.
As shown in Figure 1, the method comprises following step S102 to step S106:
Step S102: receive search key, wherein, search key is used for searching from the collection webpage of browser the webpage needing to browse.
Search key can be that search key can be a keyword, also can be multiple keyword arbitrarily for searching the keyword needing the webpage browsed from the collection webpage of browser.Particularly, by arranging a frame retrieval in the region of the collection webpage of browser, the search key of user's input can be received by this frame retrieval.
Step S104: mated with the collection webpage of browser by search key, obtains the address of the collection webpage mated.
The collection webpage of browser is usually located in the collection of browser, saves address and the title of collection webpage in the collection of existing browser.Search key and the collection webpage of browser being carried out mating can be mated by the title of search key with collection webpage, if there is search key in the title of collection webpage, illustrates that the webpage that this collection webpage and user need to access is relevant.The reason of mating with search key in the collection webpage of record browser collects webpage.Preferably, in order to improve the accuracy of being searched the collection webpage needing access by search key, search key is carried out mating comprising with the collection webpage of browser: the title and the content of text that obtain the collection webpage of browser; And the title of the collection webpage of browser and content of text are mated with search key, wherein, if title and the content of text of the collection webpage of browser mate with search key, then deterministic retrieval keyword mates with the collection webpage of browser, if title and the content of text of the collection webpage of browser do not mate with search key, then deterministic retrieval keyword does not mate with the collection webpage of browser.
The content of collection webpage can be obtained by access collection webpage, also can be in advance the content of text of each collection webpage in the collection webpage of browser is stored in local data base or other storage areas, by obtaining the content of text of collection webpage from database or other storage areas.The content of text of collection webpage can be the full text content of collection webpage, also can be the keyword of the extraction in the full text content of collection webpage.Because the title collecting webpage can not represent the content of collection webpage sometimes, or the keyword of the content of the collection webpage that user is concerned about may not be included in the title of collection webpage, now, if mated by means of only by the title of search key with collection webpage, the collection webpage that cannot retrieve and need access can be caused, and user may repeatedly retrieve the collection webpage that also cannot retrieve and need access by changing multiple search key, mated with search key by the title of the collection webpage by browser and content of text, the problems referred to above can be avoided.Particularly, first the title of collection webpage can be mated with search key, if the title of collection webpage mates with search key, the content no longer can carrying out collecting webpage is mated with search key, if the title of collection webpage does not mate with search key, then the content of collection webpage is mated with search key.By said method, collection webpage and the probability mated of search key can be improved, improve the accuracy of being searched the collection webpage needing access by search key further.
Preferably, in order to improve the title of collection webpage and the efficiency of content of text of above-mentioned acquisition browser, before the title of the collection webpage by browser and content of text mate with search key, the method also comprises: the content of text obtaining the collection webpage of browser; Obtain network address and the title of the collection webpage of browser; And the content of text of the collection webpage of storage browser, network address and title.
By obtaining the title of the content of text of the collection webpage of browser, the network address of collection webpage and collection webpage in advance and be stored in local storage area before retrieving the collection webpage of browser, such as local data base, particularly, in the process storing content of text, network address and the title of collecting webpage, the content of text of the collection webpage of browser, network address and title can be associated, namely set up the corresponding relation of content of text, network address and the title belonging to same collection webpage.Pass through said method, when the collection webpage of user to browser carry out retrieval be time, can be get the collection content of text of webpage fast, title mates with search key, if the address of this collection webpage can be obtained when there is the collection webpage mated with search key fast, improve effectiveness of retrieval.
Alternatively, the content of text obtaining the collection webpage of browser comprises: the address obtaining the collection webpage of browser; According to the address access collection webpage of the collection webpage of browser; And content of text is crawled from collection webpage in the process of access collection webpage, obtain the content of text of the collection webpage of browser.
Network address and the title of the collection webpage of browser have been stored in the collection of browser, particularly, can by calling application programming interfaces (the ApplicationProgramming Interface of the address for obtaining collection webpage that browser provides, API) address of collection webpage is obtained, i.e. URL(uniform resource locator) (UniformResource Locator, URL).This collection webpage can be accessed by the address of collecting webpage, in the process of access collection webpage, crawl content of text from collection webpage, obtain the content of text of the collection webpage of browser.Particularly, content of text can be crawled by web crawlers from collection webpage.Web crawlers is automatically crawl program or the script of information on network according to setting rule, such as, can arrange web crawlers and only crawl content of text on webpage, also can arrange web crawlers and only crawl picture on webpage, wait for.Only crawled the content of text of collection webpage by web crawlers in the embodiment of the present invention.Preferably, in order to improve the efficiency of the content of text crawling collection webpage, in the process of access collection webpage, crawl content of text from collection webpage, the content of text obtaining the collection webpage of browser comprises: the HTML (Hypertext Markup Language) label filtering the collection webpage of browser; And content of text is crawled from the collection webpage of the browser of filtration HTML (Hypertext Markup Language) label, obtain the content of text of the collection webpage of browser.
HTML (Hypertext Markup Language) (Hyper Text Markup Language, HTML) label is unit minimum in HTML (Hypertext Markup Language), the display format of webpage can be set by this HTML (Hypertext Markup Language) label, such as, the display position etc. of the title of webpage, key word, web page contents is set by HTML (Hypertext Markup Language) label.Particularly, can in the address by collection webpage after server request accessed web page, the content returned by server is mated with the regular expression preset, filter out the HTML (Hypertext Markup Language) label of collection webpage, wherein, regular expression be use single character string describe, mate a series of character string meeting certain syntactic rule, such as, one for mate China Post coding regular expression for " [1-9] d{5} (?! D) ", character string to be matched is " Chinabeijing100081haidian ", then can go out the character " 100081 " representing postcode in character string to be detected by Rapid matching by this regular expression, other characters are then filtered.
Preferably, content of text is crawled from collection webpage in the process of access collection webpage, after obtaining the content of text of collection webpage of browser, method also comprises: from the content of text of the collection webpage of browser, obtain keyword, obtains the keyword of the collection webpage of browser; Store the keyword of the collection webpage of browser, network address and title, the title of the collection webpage of browser and content of text are carried out mating comprising with search key: the keyword of the collection webpage of browser and title are mated with search key.
The keyword of the collection webpage of browser can be some words that in the content of text of collection webpage, occurrence number is more, also can be the word of the content of text that in the content of text of collection webpage, position is forward, such as, collect the summary etc. of the content of text of webpage.Particularly, the embodiment of the present invention is described for more some words of occurrence number in the content of text collecting webpage as the keyword of this collection webpage, after the content of text getting collection webpage, word can be cut to the content of text of collection webpage, the content of text being about to collection webpage is divided into independently word, some stop words can be filtered out in advance, stop words and modal particle, conjunctions etc. are without the word of physical meaning, by the word composition set of words obtained after filtration, add up the word and this word occurrence number repeated that repeat in this set of words, if the occurrence number of this word repeated is greater than predetermined threshold value, the word then this repeated is as the keyword of collection webpage.After obtaining the keyword of collection webpage of browser, similarly, the corresponding relation of the keyword of collection webpage, network address and title can be set up when storing keyword, network address and the title process of collecting webpage.Because the content of text collecting webpage may be more, it is comparatively consuming time when search key mates with the content of text of collection webpage, on the other hand, also the matching result of too much mistake may be there is, namely the collection webpage mated with search key is not the collection webpage that user needs to access, mated with search key by the keyword extracted in the content of text of collection webpage, not only can improve the efficiency of coupling, and the accuracy of matching result can be improved.
Step S106: the address of the collection webpage of output matching.
Can be obtained the address with the collection webpage mated of search key in the collection webpage of browser by above-mentioned steps, the address exporting the collection webpage of this coupling is checked for user.
As can be seen from the above description, present invention achieves following technique effect:
The embodiment of the present invention is by receiving search key, search key is mated with the collection webpage of browser, obtain the address of the collection webpage mated, the address of the collection webpage of output matching, from the collection webpage of browser, the collection webpage needing access is searched by the mode of retrieval, search compared to the collection webpage by user successively open any browser in prior art, improve the efficiency of searching target web from the collection webpage of browser, solve the inefficient problem of searching target web in correlation technique from the collection webpage of browser.
It should be noted that, can perform in the computer system of such as one group of computer executable instructions in the step shown in the process flow diagram of accompanying drawing, and, although show logical order in flow charts, but in some cases, can be different from the step shown or described by order execution herein.
According to the another aspect of the embodiment of the present invention, provide a kind of web page storage treating apparatus for browser, this device may be used for the web page storage disposal route for browser performing the embodiment of the present invention, and the method for the embodiment of the present invention also can be performed by the web page storage treating apparatus for browser of the embodiment of the present invention.
Fig. 2 is the schematic diagram of the web page storage treating apparatus for browser according to the embodiment of the present invention.As shown in Figure 2, this web page storage treating apparatus being used for browser comprises: receiving element 10, matching unit 20 and output unit 30.
Receiving element 10, for receiving search key, wherein, search key is used for searching from the collection webpage of browser the webpage needing to browse.
Search key can be that search key can be a keyword, also can be multiple keyword arbitrarily for searching the keyword needing the webpage browsed from the collection webpage of browser.Particularly, by arranging a frame retrieval in the region of the collection webpage of browser, the search key of user's input can be received by this frame retrieval.
Matching unit 20, for being mated with the collection webpage of browser by search key, obtains the address of the collection webpage mated.
The collection webpage of browser is usually located in the collection of browser, saves address and the title of collection webpage in the collection of existing browser.Search key and the collection webpage of browser being carried out mating can be mated by the title of search key with collection webpage, if there is search key in the title of collection webpage, illustrates that the webpage that this collection webpage and user need to access is relevant.
Output unit 30, for the address of the collection webpage of output matching.
With behind the address of the collection webpage mated of search key in the collection webpage obtaining browser, the address exporting the collection webpage of this coupling is checked for user.
The embodiment of the present invention receives search key by receiving element 10, and search key mates with the collection webpage of browser by matching unit 20, obtains the address of the collection webpage mated, the address of the collection webpage of output unit 30 output matching.The embodiment of the present invention searches the collection webpage needing access from the collection webpage of browser by the mode of retrieval, search compared to the collection webpage by user successively open any browser in prior art, improve the efficiency of searching target web from the collection webpage of browser, solve the inefficient problem of searching target web in correlation technique from the collection webpage of browser.
Preferably, matching unit 20 comprises: the first acquisition module, for obtaining title and the content of text of the collection webpage of browser; And matching module, mate with search key for the title of the collection webpage by browser and content of text, wherein, if title and the content of text of the collection webpage of browser mate with search key, then deterministic retrieval keyword mates with the collection webpage of browser, if title and the content of text of the collection webpage of browser do not mate with search key, then deterministic retrieval keyword does not mate with the collection webpage of browser.
Preferably, this device also comprises: the first acquiring unit, for obtaining the content of text of the collection webpage of browser; Second acquisition unit, for obtaining network address and the title of the collection webpage of browser; And storage unit, for storing the content of text of the collection webpage of browser, network address and title.
Preferably, the first acquiring unit comprises: the second acquisition module, obtains the address of the collection webpage of browser; Access modules, for the address access collection webpage of the collection webpage according to browser; And crawl module, for crawling content of text from collection webpage in the process of access collection webpage, obtain the content of text of the collection webpage of browser.
Obviously, those skilled in the art should be understood that, above-mentioned of the present invention each module or each step can realize with general calculation element, they can concentrate on single calculation element, or be distributed on network that multiple calculation element forms, alternatively, they can realize with the executable program code of calculation element, thus, they can be stored and be performed by calculation element in the storage device, or they are made into each integrated circuit modules respectively, or the multiple module in them or step are made into single integrated circuit module to realize.Like this, the present invention is not restricted to any specific hardware and software combination.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1., for a web page storage disposal route for browser, it is characterized in that, comprising:
Receive search key, wherein, described search key is used for searching from the collection webpage of browser the webpage needing to browse;
Described search key is mated with the collection webpage of described browser, obtains the address of the collection webpage mated; And export the address of collection webpage of described coupling.
2. the web page storage disposal route for browser according to claim 1, is characterized in that, is carried out mating comprising by described search key with the collection webpage of described browser:
Obtain title and the content of text of the collection webpage of described browser; And the title of the collection webpage of described browser and content of text are mated with described search key,
Wherein, if title and the content of text of the collection webpage of described browser mate with described search key, then determine that described search key mates with the collection webpage of described browser, if title and the content of text of the collection webpage of described browser do not mate with described search key, then determine that described search key does not mate with the collection webpage of described browser.
3. the web page storage disposal route for browser according to claim 1, is characterized in that, before the title of the collection webpage by described browser and content of text mate with described search key, described method also comprises:
Obtain the content of text of the collection webpage of described browser;
Obtain network address and the title of the collection webpage of described browser; And store the content of text of collection webpage of described browser, network address and title.
4. the web page storage disposal route for browser according to claim 3, is characterized in that, the content of text obtaining the collection webpage of described browser comprises:
Obtain the address of the collection webpage of described browser;
Described collection webpage is accessed according to the address of the collection webpage of described browser; And content of text is crawled from described collection webpage in the process of the described collection webpage of access, obtain the content of text of the collection webpage of described browser.
5. the web page storage disposal route for browser according to claim 4, is characterized in that, from the process at the described collection webpage of access, crawl content of text from described collection webpage, the content of text obtaining the collection webpage of described browser comprises:
Filter the HTML (Hypertext Markup Language) label of the collection webpage of described browser; And content of text is crawled from the collection webpage of the described browser of filtration HTML (Hypertext Markup Language) label, obtain the content of text of the collection webpage of described browser.
6. the web page storage disposal route for browser according to claim 4, is characterized in that,
Content of text is crawled from described collection webpage in the process of the described collection webpage of access, after obtaining the content of text of collection webpage of described browser, described method also comprises: from the content of text of the collection webpage of described browser, obtain keyword, obtains the keyword of the collection webpage of described browser; Store the keyword of the collection webpage of described browser, network address and title, the title of the collection webpage of described browser and content of text are carried out mating comprising with described search key: the keyword of the collection webpage of described browser and title are mated with described search key.
7., for a web page storage treating apparatus for browser, it is characterized in that, comprising:
Receiving element, for receiving search key, wherein, described search key is used for searching from the collection webpage of browser the webpage needing to browse;
Matching unit, for being mated with the collection webpage of described browser by described search key, obtains the address of the collection webpage mated; And output unit, for exporting the address of the collection webpage of described coupling.
8. the web page storage treating apparatus for browser according to claim 7, it is characterized in that, described matching unit comprises:
First acquisition module, for obtaining title and the content of text of the collection webpage of described browser; And matching module, mate with described search key for the title of the collection webpage by described browser and content of text, wherein, if title and the content of text of the collection webpage of described browser mate with described search key, then determine that described search key mates with the collection webpage of described browser, if title and the content of text of the collection webpage of described browser do not mate with described search key, then determine that described search key does not mate with the collection webpage of described browser.
9. the web page storage treating apparatus for browser according to claim 7, it is characterized in that, described device also comprises:
First acquiring unit, for obtaining the content of text of the collection webpage of described browser;
Second acquisition unit, for obtaining network address and the title of the collection webpage of described browser; And storage unit, for storing the content of text of the collection webpage of described browser, network address and title.
10. the web page storage treating apparatus for browser according to claim 9, is characterized in that, described first acquiring unit comprises:
Second acquisition module, obtains the address of the collection webpage of described browser;
Access modules, described collection webpage is accessed in the address for the collection webpage according to described browser; And crawl module, for crawling content of text from described collection webpage in the process of the described collection webpage of access, obtain the content of text of the collection webpage of described browser.
CN201410742954.XA 2014-12-05 2014-12-05 Web page storage processing method and processing device for browser Active CN104391978B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410742954.XA CN104391978B (en) 2014-12-05 2014-12-05 Web page storage processing method and processing device for browser

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410742954.XA CN104391978B (en) 2014-12-05 2014-12-05 Web page storage processing method and processing device for browser

Publications (2)

Publication Number Publication Date
CN104391978A true CN104391978A (en) 2015-03-04
CN104391978B CN104391978B (en) 2018-05-15

Family

ID=52609882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410742954.XA Active CN104391978B (en) 2014-12-05 2014-12-05 Web page storage processing method and processing device for browser

Country Status (1)

Country Link
CN (1) CN104391978B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426224A (en) * 2015-12-28 2016-03-23 上海银天下科技有限公司 Method and device for opening web pages in application program
CN105740417A (en) * 2016-01-29 2016-07-06 青岛海信移动通信技术股份有限公司 Webpage based target data search method and module, browser and terminal
CN106547821A (en) * 2016-09-29 2017-03-29 广东工业大学 A kind of method in browser according to keyword search related web page
CN107229705A (en) * 2017-05-25 2017-10-03 北京小米移动软件有限公司 Information resources lookup method, device and computer-readable recording medium
CN108491420A (en) * 2018-02-06 2018-09-04 平安科技(深圳)有限公司 Configuration method, application server and the computer readable storage medium of web page crawl
CN109657168A (en) * 2018-11-30 2019-04-19 维沃移动通信有限公司 A kind of collection record display methods and device
CN110020335A (en) * 2017-07-28 2019-07-16 北京搜狗科技发展有限公司 The treating method and apparatus of collection
CN110069667A (en) * 2017-11-03 2019-07-30 北京搜狗科技发展有限公司 A kind of searching method, device and the device for search
CN113268184A (en) * 2021-05-29 2021-08-17 五八到家有限公司 Browser tab switching method and device, electronic equipment and readable medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010115003A1 (en) * 2009-04-03 2010-10-07 Avichai Flombaum System and method for identifying and retrieving targeted advertisements or other related documents
CN102830894A (en) * 2012-05-11 2012-12-19 北京奇虎科技有限公司 Method and apparatus for bookmarking webpage
CN102982134A (en) * 2012-11-16 2013-03-20 北京奇虎科技有限公司 System enabling recommended web site information to be displayed in browser address bar
CN103246746A (en) * 2013-05-23 2013-08-14 百度在线网络技术(北京)有限公司 Method, device and system for searching information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010115003A1 (en) * 2009-04-03 2010-10-07 Avichai Flombaum System and method for identifying and retrieving targeted advertisements or other related documents
CN102830894A (en) * 2012-05-11 2012-12-19 北京奇虎科技有限公司 Method and apparatus for bookmarking webpage
CN102982134A (en) * 2012-11-16 2013-03-20 北京奇虎科技有限公司 System enabling recommended web site information to be displayed in browser address bar
CN103246746A (en) * 2013-05-23 2013-08-14 百度在线网络技术(北京)有限公司 Method, device and system for searching information

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426224A (en) * 2015-12-28 2016-03-23 上海银天下科技有限公司 Method and device for opening web pages in application program
CN105740417A (en) * 2016-01-29 2016-07-06 青岛海信移动通信技术股份有限公司 Webpage based target data search method and module, browser and terminal
CN106547821A (en) * 2016-09-29 2017-03-29 广东工业大学 A kind of method in browser according to keyword search related web page
CN107229705A (en) * 2017-05-25 2017-10-03 北京小米移动软件有限公司 Information resources lookup method, device and computer-readable recording medium
CN107229705B (en) * 2017-05-25 2024-05-31 北京小米移动软件有限公司 Information resource searching method, device and computer readable storage medium
CN110020335A (en) * 2017-07-28 2019-07-16 北京搜狗科技发展有限公司 The treating method and apparatus of collection
CN110020335B (en) * 2017-07-28 2022-04-26 北京搜狗科技发展有限公司 Favorite processing method and device
CN110069667A (en) * 2017-11-03 2019-07-30 北京搜狗科技发展有限公司 A kind of searching method, device and the device for search
CN108491420A (en) * 2018-02-06 2018-09-04 平安科技(深圳)有限公司 Configuration method, application server and the computer readable storage medium of web page crawl
CN109657168A (en) * 2018-11-30 2019-04-19 维沃移动通信有限公司 A kind of collection record display methods and device
CN109657168B (en) * 2018-11-30 2021-04-23 维沃移动通信有限公司 Collection record display method and device
CN113268184A (en) * 2021-05-29 2021-08-17 五八到家有限公司 Browser tab switching method and device, electronic equipment and readable medium

Also Published As

Publication number Publication date
CN104391978B (en) 2018-05-15

Similar Documents

Publication Publication Date Title
CN104391978A (en) Method and device for storing and processing web pages of browsers
CN102930059B (en) Method for designing focused crawler
US8060538B2 (en) Method and system for creating a concept-object database
US8185530B2 (en) Method and system for web document clustering
CN101908071B (en) Method and device thereof for improving search efficiency of search engine
CN102270331B (en) Network shopping navigating method based on visual search
CN106126648B (en) It is a kind of based on the distributed merchandise news crawler method redo log
CN108052632B (en) Network information acquisition method and system and enterprise information search system
US20070198727A1 (en) Method, apparatus and system for extracting field-specific structured data from the web using sample
CN109376291B (en) Website fingerprint information scanning method and device based on web crawler
CN104516982A (en) Method and system for extracting Web information based on Nutch
CN103744856A (en) Method, device and system for linkage extended search
CN102710795A (en) Hotspot collecting method and device
US20150206101A1 (en) System for determining infringement of copyright based on the text reference point and method thereof
CN105095175A (en) Method and device for obtaining truncated web title
CN106874502A (en) A kind of method of video search, device and terminal
KR100671077B1 (en) Server, Method and System for Providing Information Search Service by Using Sheaf of Pages
US11334592B2 (en) Self-orchestrated system for extraction, analysis, and presentation of entity data
CN114443928B (en) Web text data crawler method and system
Klein et al. Evaluating methods to rediscover missing web pages from the web infrastructure
CN104281629A (en) Method and device for extracting picture from webpage and client equipment
CN104778232B (en) Searching result optimizing method and device based on long query
CN114117242A (en) Data query method and device, computer equipment and storage medium
CN106959995A (en) Compatible two-way automatic web page contents acquisition method
CN103605742A (en) Method and device for recognizing network resource entity content page

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Method and device for storing and processing web pages of browsers

Effective date of registration: 20190531

Granted publication date: 20180515

Pledgee: Shenzhen Black Horse World Investment Consulting Co.,Ltd.

Pledgor: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd.

Registration number: 2019990000503

PE01 Entry into force of the registration of the contract for pledge of patent right
CP02 Change in the address of a patent holder

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Patentee after: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd.

Address before: 100086 Beijing city Haidian District Shuangyushu Area No. 76 Zhichun Road cuigongfandian 8 layer A

Patentee before: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd.

CP02 Change in the address of a patent holder
PP01 Preservation of patent right

Effective date of registration: 20240604

Granted publication date: 20180515