CN103514221A - Web site resource management method and device - Google Patents

Web site resource management method and device Download PDF

Info

Publication number
CN103514221A
CN103514221A CN201210222539.2A CN201210222539A CN103514221A CN 103514221 A CN103514221 A CN 103514221A CN 201210222539 A CN201210222539 A CN 201210222539A CN 103514221 A CN103514221 A CN 103514221A
Authority
CN
China
Prior art keywords
page
index
web website
resource
index page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201210222539.2A
Other languages
Chinese (zh)
Other versions
CN103514221B (en
Inventor
刘承诚
薛晶晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201210222539.2A priority Critical patent/CN103514221B/en
Publication of CN103514221A publication Critical patent/CN103514221A/en
Application granted granted Critical
Publication of CN103514221B publication Critical patent/CN103514221B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a web site resource management method and device. The method comprises the following steps that the number of web sites is checked; if one web site exists, whether the web sites have index pages is further checked; if the index pages exist, the index pages are optimized to generate first index pages; if no index page exists, second index pages are generated according to the structures of the web sites; if the web sites are more than two in number, cross-site index pages are built according to semantics. According to the web site resource management method, through detection of the web sites, the index pages are built through different methods according to whether the index pages exist and according to different numbers of the web sites, organization relations of site resources can be excavated completely, the polymerization degree of the site resources is improved, and the showing effect of the pages of the sites is improved.

Description

A kind of web site resource management method and device
Technical field
The present invention relates to web site resource membership credentials analysis mining field, particularly a kind of web site resource management method and device.
Background technology
Nowadays, web appization technology is also more and more common, and web website is converted to resource organizations's relation that app need to provide this website, and therefore need to carry out analysis mining to the Zhan Nei resource organizations relation of web website, obtain structurized resource organizations relation data.
To resource organizations's relation excavation of web website, be at present mainly to excavate by manually checking, do not have ripe prior art, therefore have following shortcoming:
(1) excavation of resource organizations's relation of website is not just adopted to distinct methods according to classification is different, excavate comprehensively not, and the degree of polymerization is not high;
(2) there is no fixing method for digging, the resource organizations's relation obtaining is clear not and more chaotic, more conveniently structuring.
Summary of the invention
The present invention is intended at least one of solve the problems of the technologies described above.
For this reason, first object of the present invention is to propose a kind of web site resource management method.
Second object of the present invention is to propose a kind of web site resource management devices.
To achieve these goals, the web site resource management method of embodiment comprises the following steps according to a first aspect of the invention: the number that checks described web website; If the number of described web website is one, further check whether described web website has index page; If there is index page, described index page is optimized to generate the first index page; If there is no index page, according to structural generation second index page of described web website; And if the number of described web website is more than two, set up cross-site index page based on semanteme.
According to the web site resource management method of the embodiment of the present invention, by the detection to web website, according to whether having index page to carry out from three different all situations of the number of web website the index page of setting up according to distinct methods, can fully excavate website resource organizations relation and improve the degree of polymerization of site resource and improve site page bandwagon effect.
For achieving the above object, the web site resource management devices of the embodiment of second aspect present invention comprises: the first checking module, and described the first checking module is for checking the number of described web website; The second checking module, described the second checking module, in the situation that the number of described web website is one, checks whether described web website has index page; Optimize module, described optimization module, in the situation that described web website has index page, is optimized to generate the first index page to described index page; Generation module, described the first generation module is not in the situation that described web website has index page, according to structural generation second index page of described web website; And set up module, the described module of setting up is in plural situation for the number at described web website, based on semanteme, sets up cross-site index page.
According to the web site resource management devices of the embodiment of the present invention, by the detection to web website, according to whether having index page to carry out from three kinds of different situations of the number of web website the index page of setting up according to distinct methods, can fully excavate website resource organizations relation and improve the degree of polymerization of site resource and improve site page bandwagon effect.
Additional aspect of the present invention and advantage in the following description part provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Accompanying drawing explanation
Above-mentioned and/or the additional aspect of the present invention and advantage will become from the following description of the accompanying drawings of embodiments and obviously and easily understand, wherein:
Fig. 1 is a kind of according to an embodiment of the invention process flow diagram of web site resource management method;
Fig. 2 is a kind of according to an embodiment of the invention process flow diagram of web site resource management method;
Fig. 3 is a kind of according to an embodiment of the invention process flow diagram of web site resource management method;
Fig. 4 is a kind of according to an embodiment of the invention process flow diagram of web site resource management method;
Fig. 5 is the structural representation of web site resource management devices according to an embodiment of the invention;
Fig. 6 is the structural representation of web site resource management devices according to an embodiment of the invention;
Fig. 7 is the structural representation of web site resource management devices according to an embodiment of the invention; And
Fig. 8 is the structural representation of web site resource management devices according to an embodiment of the invention.
Embodiment
Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has the element of identical or similar functions from start to finish.Below by the embodiment being described with reference to the drawings, be exemplary, only for explaining the present invention, and can not be interpreted as limitation of the present invention.
With reference to description and accompanying drawing below, these and other aspects of embodiments of the invention will be known.These describe and accompanying drawing in, specifically disclose some particular implementation in embodiments of the invention, represent to implement some modes of the principle of embodiments of the invention, still should be appreciated that the scope of embodiments of the invention is not limited.On the contrary, embodiments of the invention comprise spirit and all changes within the scope of intension, modification and the equivalent that falls into additional claims.
Below with reference to Figure of description, describe according to the web site resource management method of the embodiment of the present invention
A site resource management method, comprises the following steps: the number that checks web website; If the number of web website is one, further check whether web website has index page; If there is index page, index page is optimized to generate the first index page; If there is no index page, according to structural generation second index page of web website; And if the number of web website is more than two, set up cross-site index page based on semanteme.
Fig. 1 is the process flow diagram of the web site resource management method of one embodiment of the invention.
As shown in Figure 1, according to the web site resource management method of the embodiment of the present invention, comprise the steps:
Step S101: the number that checks web website.
Particularly, detection need to be obtained the number of the web website of structurized resource organizations relation.
Step S102: if the number of web website is one, further check whether web website has index page.
Particularly, if be checked through the web website that will obtain structurized resource organizations relation, be one, start so to excavate the page of this web website, check the index page that whether has the index information that comprises this web website.
Step S103: if there is index page, index page is optimized to generate the first index page.
Particularly, if this web website has index page, obtain so the page info in this index page, this page info is optimized to obtain and sets up the required information of the first index page, generate the first index page of this web website.
Step S104: if there is no index page, according to structural generation second index page of web website.
Particularly, if this web website does not have index page, the resource page of this web website is excavated, according to the structure of the acquisition of information web website of resource page, according to the structural information of web website, generate the second index page of this web website.
Step S105: if the number of web website is more than two, set up cross-site index page based on semanteme.
Particularly, if it is more than two being checked through the web website that will obtain structurized resource organizations relation, according to the semantic dependency of resource page, contact these web websites, cross over station for acquiring resource index organizational information, and generate index page according to the resource index organizational information getting.
According to the web site resource management method of the embodiment of the present invention, by the detection to web website, according to whether having index page to carry out from three kinds of different situations of the number of web website the index page of setting up according to distinct methods, can fully excavate website resource organizations relation and improve the degree of polymerization of site resource and improve site page bandwagon effect.
Fig. 2 is the process flow diagram of the web site resource management method of another embodiment of the present invention.
As shown in Figure 2, according to the web site resource management method of the embodiment of the present invention, comprise the steps.
Step S201: the number that checks web website.
Particularly, detection need to be obtained the number of the web website of structurized resource organizations relation.
Step S202: if the number of web website is one, further check whether web website has index page.
Particularly, if be checked through the web website that will obtain structurized resource organizations relation, be one, start so to excavate the page of this web website, check the index page that whether has the index information that comprises this web website.
Step S203: if there is index page, delete the non-index information in index page.
Particularly, if this web website has index page, obtain the full detail of this index pages and this index pages information is analyzed, non-index information is wherein deleted.
Step S204: delete the index entry that can not be connected to the resource page in web website in index page.
Particularly, index entry information remaining in index page information is checked see whether these index entries can be connected to the resource page in the web website of its sensing, deletion can not be connected to the index entry of resource page in himself web website pointed.
Step S205: the effective index entry in extraction index page is to generate the first index page.
Particularly, extract effective index information remaining in index page and be incorporated on a page, generate the first index pages.
Step S206: if there is no index page, according to structural generation second index page of web website.
Particularly, if this web website does not have index page, the resource page of this web website is excavated, according to the structure of the acquisition of information web website of resource page, according to the structural information of web website, generate the second index page of this web website.
Step S207: if the number of web website is more than two, set up cross-site index page based on semanteme.
Particularly, if it is more than two being checked through the web website that will obtain structurized resource organizations relation, according to the semantic dependency of resource page, contact these web websites, cross over station for acquiring resource index organizational information, and generate index page according to the resource index organizational information getting.
In one embodiment of the invention, non-index information comprises advertisement and animation.
According to the web site resource management method of the embodiment of the present invention, by the non-index information of index pages and invalid index are deleted, according to effective index generating indexes page, can effectively obtain resource organizations's relation of web website, and relation is clear, the degree of polymerization is higher.
Fig. 3 is the process flow diagram of the web site resource management method of another embodiment of the present invention.
As shown in Figure 3, according to the web site resource management method of the embodiment of the present invention, comprise the steps.
Step S301: the number that checks web website.
Particularly, detection need to be obtained the number of the web website of structurized resource organizations relation.
Step S302: if the number of web website is one, further check whether web website has index page.
Particularly, if be checked through the web website that will obtain structurized resource organizations relation, be one, start so to excavate the page of this web website, check the index page that whether has the index information that comprises this web website.
Step S303: if there is index page, delete the non-index information in index page.
Particularly, if this web website has index page, obtain the full detail of this index pages and this index pages information is analyzed, non-index information is wherein deleted.
Step S304: delete the index entry that can not be connected to the resource page in web website in index page.
Particularly, index entry information remaining in index page information is checked see whether these index entries can be connected to the resource page in the web website of its sensing, deletion can not be connected to the index entry of resource page in himself web website pointed.
Step S305: the effective index entry in extraction index page is to generate the first index page.
Particularly, extract effective index information remaining in index page and be incorporated on a page, generate the first index pages.
Step S306: if there is no index page, judge whether the resource page in web website has title.
Particularly, if this web website does not have index page, from homepage, start to excavate the resource page of this web website, obtain the information of the resource page of this web website, judge whether resource page has heading message.
Step S307: if extract the title of resource page as index entry.
Particularly, if resource page has heading message, extract title in this resource page as index entry.
Step S308: if not, generate the summary info of resource page as index entry.
Particularly, if resource page does not have heading message, the main information comprising according to resource page generates summary info, and using this summary info as index entry.
Step S309: generate the second index page according to index entry.
Particularly, the index entry that obtains all resource pages is incorporated on a page, generates the second index page.
Step S310: if the number of web website is more than two, set up cross-site index page based on semanteme.
Particularly, if it is more than two being checked through the web website that will obtain structurized resource organizations relation, according to the semantic dependency of resource page, contact these web websites, cross over station for acquiring resource index organizational information, and generate index page according to the resource index organizational information getting.
In one embodiment of the invention, non-index information comprises advertisement and animation.
According to the web site resource management method of the embodiment of the present invention, by by the heading message of resource page or summary info generating indexes item, according to these index entries, generate index pages again, improved the degree of polymerization of resource organizations's relation and the sharpness between relation.
Fig. 4 is the process flow diagram of the web site resource management method of another embodiment of the present invention.
As shown in Figure 4, according to the web site resource management method of the embodiment of the present invention, comprise the steps.
Step S401: the number that checks web website.
Particularly, detection need to be obtained the number of the web website of structurized resource organizations relation.
Step S402: if the number of web website is one, further check whether web website has index page.
Particularly, if be checked through the web website that will obtain structurized resource organizations relation, be one, start so to excavate the page of this web website, check the index page that whether has the index information that comprises this web website.
Step S403: if there is index page, delete the non-index information in index page.
Particularly, if this web website has index page, obtain the full detail of this index pages and this index pages information is analyzed, non-index information is wherein deleted.
Step S404: delete the index entry that can not be connected to the resource page in web website in index page.
Particularly, index entry information remaining in index page information is checked see whether these index entries can be connected to the resource page in the web website of its sensing, deletion can not be connected to the index entry of resource page in himself web website pointed.
Step S405: the effective index entry in extraction index page is to generate the first index page.
Particularly, extract effective index information remaining in index page and be incorporated on a page, generate the first index pages.
Step S406: if there is no index page, judge whether the resource page in web website has title.
Particularly, if this web website does not have index page, from homepage, start to excavate the resource page of this web website, obtain the information of the resource page of this web website, judge whether resource page has heading message.
Step S407: if extract the title of resource page as index entry.
Particularly, if resource page has heading message, extract title in this resource page as index entry
Step S408: if not, generate the summary info of resource page as index entry.
Particularly, if resource page does not have heading message, the main information comprising according to resource page generates summary info, and using this summary info as index entry.
Step S409: generate the second index page according to index entry.
Particularly, the index entry that obtains all resource pages is incorporated on a page, generates the second index page.
Step S410: if the number of web website is more than two, a plurality of templates of predefine and different semantic corresponding.
Particularly, if be checked through the web website that will obtain structurized resource organizations relation, be more than two, according to semantic dependency, preset the template relevant to this semantic corresponding.
Step S411: the resource page in plural web website is classified according to semantic dependency and be organized in a web website.
Particularly, obtain the information comprising in the resource page in each web website, and according to the semantic dependency in resource page information, the resource page of each web website is classified, by semantic relevant Information Organization to a web website in each resource page.
Step S412: find the first template that a web website is corresponding according to a web website semantic dependency.
Particularly, according to the semantic dependency of the resource page of organizing in a web website, in predefined semantic template, search, obtain first template corresponding with the semanteme of a web website.
Step S413: will be filled into respectively in the sub-column of difference of the first template according to the resource page of semantic dependency classification in a web website.
Particularly, according to the attribute of keyword, the information of each resource page in a web website is filled in the first template according to semantic correlativity, according to the difference of column, adds corresponding resource page information.
Step S414: set up cross-site index page according to the information in different templates.
Particularly, according to inserting relevant information in different semantic templates, the keyword of integrating each template as index entry, is set up cross-site index page with semantic.
Illustrate step S410 below to the specific implementation process of S414.
For example, define the semantic template of a books information, the inside comprises the sub-columns such as books essential information, popular comment, businessman's rate of exchange, e-sourcing and other modules; Then supposing has a book big talk Design Mode in each site resource page, then the relevant information of talking about Design Mode in each resource page is incorporated in a web website as a class; Then according to the keyword lookup of the information of this this book of big talk Design Mode, arrive books information model, according to the sub-column module in template, the relevant information in the one web website is filled into each the sub-column in template, then extract key message in template as index entry, so set up a plurality of pages and extract index entry, can integrate index entry and set up cross-site index page.
In one embodiment of the invention, different semantemes comprises novel title, news title, video name and trade name etc.
In one embodiment of the invention, non-index information comprises advertisement and animation.
According to the web site resource management method of the embodiment of the present invention, by by the information classification tissue filling of a plurality of websites to the generation of carrying out index entry in template, improved the degree of polymerization of resource organizations's relation and the sharpness between relation.
Below with reference to Figure of description, describe according to the web site resource management devices of the embodiment of the present invention.
Site resource management devices comprises: first checking module, and the first checking module is for checking the number of web website; The second checking module, the second checking module, in the situation that the number of web website is one, checks whether web website has index page; Optimize module, optimize module in the situation that web website has index page, index page is optimized to generate the first index page; Generation module, the first generation module is not in the situation that web website has index page, according to structural generation second index page of web website; And set up module, setting up module is in plural situation for the number at web website, based on semanteme, sets up cross-site index page.。
Fig. 5 is the structural representation of the web site resource management devices of one embodiment of the invention.
As shown in Figure 5, the web site resource management devices according to the embodiment of the present invention, comprising: the first checking module 110, the second checking modules 120, optimize module 130, generation module 140 and set up module 150.
Particularly, the first checking module 110 is for checking the number of web website; The second checking module 120, in the situation that the number of web website is one, checks whether web website has index page; Optimize module 130 in the situation that web website has index page, index page is optimized to generate the first index page; Generation module 140 is not in the situation that web website has index page, according to structural generation second index page of web website; And to set up module 150 be in plural situation for the number at web website, based on semanteme, set up cross-site index page.
More specifically, the first checking module 110 is for detection of the number that need to obtain the web website of structurized resource organizations relation; If the second checking module 120 is in the situation that the first checking module 110 is checked through the web website that will obtain structurized resource organizations relation is one, start to excavate the page of this web website, check the index page that whether has the index information that comprises this web website; The in the situation that if optimization module 130 having index page for being checked through this web website at the second checking module 120, obtain the page info in this index page, this page info is optimized, and obtains and set up the required information of the first index page, generate the first index page of this web website; The in the situation that if generation module 140 not having index page for examining this web website in the second inspection 120 modules, resource page to this web website excavates, according to the structure of the acquisition of information web website of resource page, according to the structural information of web website, generate the second index page of this web website; And if setting up module 150, will obtain the web website of structurized resource organizations relation for being checked through at the first checking module 110 be in plural situation, according to the semantic dependency of resource page, contact these web websites, cross over station for acquiring resource index organizational information, and generate index page according to the resource index organizational information getting.
According to the web site resource management devices of the embodiment of the present invention, the index page of the detection of web website being set up by three different modules according to distinct methods according to the three kinds of different situations that whether have index page and the number of web website by two checking modules, can fully excavate website resource organizations relation and improve the degree of polymerization and the raising site page bandwagon effect of site resource.
Fig. 6 is the structural representation of the web site resource management devices of another embodiment of the present invention.
As shown in Figure 6, the web site resource management devices according to the embodiment of the present invention, comprising: the first checking module 110, the second checking module 120, optimize module 130, generation module 140 and set up module 150, wherein optimizes module 130 and comprises delete cells 131 and the first extracting unit 132.
Particularly, the first checking module 110 is for checking the number of web website; The second checking module 120, in the situation that the number of web website is one, checks whether web website has index page; Optimize module 130 in the situation that web website has index page, index page is optimized to generate the first index page; Generation module 140 is not in the situation that web website has index page, according to structural generation second index page of web website; And to set up module 150 be in plural situation for the number at web website, based on semanteme, set up cross-site index page.Wherein delete cells 131 is for deleting the index entry that can not be connected to the resource page in web website in the non-index information of index page and index page; The first extracting unit 132 for effective index entry of extracting index page to generate the first index page.
More specifically, the first checking module 110 is for detection of the number that need to obtain the web website of structurized resource organizations relation; If the second checking module 120 is in the situation that the first checking module 110 is checked through the web website that will obtain structurized resource organizations relation is one, start to excavate the page of this web website, check the index page that whether has the index information that comprises this web website; The in the situation that if optimization module 130 having index page for being checked through this web website at the second checking module 120, obtain the page info in this index page, this page info is optimized, and obtains and set up the required information of the first index page, generate the first index page of this web website; The in the situation that if generation module 140 not having index page for examining this web website in the second inspection 120 modules, resource page to this web website excavates, according to the structure of the acquisition of information web website of resource page, according to the structural information of web website, generate the second index page of this web website; And if setting up module 150, will obtain the web website of structurized resource organizations relation for being checked through at the first checking module 110 be in plural situation, according to the semantic dependency of resource page, contact these web websites, cross over station for acquiring resource index organizational information, and generate index page according to the resource index organizational information getting.Optimizing module this page info is optimized, and obtain the required information of the first index page of setting up, while generating the first index page of this web website, by removing module 131, obtain the full detail of this index pages and this index pages information is analyzed, non-index information is wherein deleted, index entry information remaining in index page information is checked simultaneously, see whether these index entries can be connected to the resource page in the web website of its sensing, deletion can not be connected to the index entry of resource page in himself web website pointed; Then by the first abstraction module 132, extract effective index information remaining in index page and be incorporated on a page, generate the first index pages.
In one embodiment of the invention, non-index information comprises advertisement and animation.
According to the web site resource management devices of the embodiment of the present invention, by removing module, the non-index information of index pages and invalid index are deleted, and by the effective index generating indexes page of abstraction module basis, can effectively obtain resource organizations's relation of web website, and relation is clear, the degree of polymerization is higher.
Fig. 7 is the structural representation of the web site resource management devices of another embodiment of the present invention.
As shown in Figure 7, according to the web site resource management devices of the embodiment of the present invention, comprise: the first checking module 110, the second checking module 120, optimize module 130, generation module 140 and set up module 150, wherein optimizes module 130 and comprises delete cells 131 and the first extracting unit 132, generation module 140 comprises judging unit 141, the second extracting units 142 and generation unit 143.
Particularly, the first checking module 110 is for checking the number of web website; The second checking module 120, in the situation that the number of web website is one, checks whether web website has index page; Optimize module 130 in the situation that web website has index page, index page is optimized to generate the first index page; Generation module 140 is not in the situation that web website has index page, according to structural generation second index page of web website; And to set up module 150 be in plural situation for the number at web website, based on semanteme, set up cross-site index page.Wherein delete cells 131 is for deleting the index entry that can not be connected to the resource page in web website in the non-index information of index page and index page; The first extracting unit 132 for effective index entry of extracting index page to generate the first index page.Wherein judging unit 141 is for judging whether the resource page in web website has title; The second extracting unit 142, in the headed situation of resource page tool in web website, extracts the title of resource page as index entry; And generation unit 143, generates the summary info of resource page as index entry, and generates the second index page according to index entry not in the headed situation of tool for the resource page in web website.
More specifically, the first checking module 110 is for detection of the number that need to obtain the web website of structurized resource organizations relation; If the second checking module 120 is in the situation that the first checking module 110 is checked through the web website that will obtain structurized resource organizations relation is one, start to excavate the page of this web website, check the index page that whether has the index information that comprises this web website; The in the situation that if optimization module 130 having index page for being checked through this web website at the second checking module 120, obtain the page info in this index page, this page info is optimized, and obtains and set up the required information of the first index page, generate the first index page of this web website; The in the situation that if generation module 140 not having index page for examining this web website in the second inspection 120 modules, resource page to this web website excavates, according to the structure of the acquisition of information web website of resource page, according to the structural information of web website, generate the second index page of this web website; And if setting up module 150, will obtain the web website of structurized resource organizations relation for being checked through at the first checking module 110 be in plural situation, according to the semantic dependency of resource page, contact these web websites, cross over station for acquiring resource index organizational information, and generate index page according to the resource index organizational information getting.Optimizing module this page info is optimized, and obtain the required information of the first index page of setting up, while generating the first index page of this web website, by removing module 131, obtain the full detail of this index pages and this index pages information is analyzed, non-index information is wherein deleted, index entry information remaining in index page information is checked simultaneously, see whether these index entries can be connected to the resource page in the web website of its sensing, deletion can not be connected to the index entry of resource page in himself web website pointed; Then by the first abstraction module 132, extract effective index information remaining in index page and be incorporated on a page, generate the first index pages.At generation module 140 in the situation that web website does not have index page, according in structural generation second index page of web website, specifically by judging unit 141, from homepage, start to excavate the resource page of this web website, obtain the information of the resource page of this web website, judge whether resource page has heading message; If then wherein resource page has heading message, by the second extracting unit 142, obtain index entry, extract title in this resource page as index entry, if resource page does not have heading message, the main information comprising according to resource page by generation unit 143 generates summary info, and using this summary info as index entry, the index entry that obtains all resource pages is incorporated on a page, generates the second index page.
In one embodiment of the invention, non-index information comprises advertisement and animation.
According to the web site resource management devices of the embodiment of the present invention, by generation module by the heading message of resource page or summary info generating indexes item, according to these index entries, generate index pages again, improved the degree of polymerization of resource organizations's relation and the sharpness between relation.
Fig. 7 is the structural representation of the web site resource management devices of another embodiment of the present invention.
As shown in Figure 7, the web site resource management devices according to the embodiment of the present invention, comprising: the first checking module 110, the second checking module 120, optimize module 130, generation module 140 and set up module 150, wherein optimizes module 130 and comprises delete cells 131 and the first extracting unit 132; Generation module 140 comprises judging unit 141, the second extracting units 142 and generation unit 143; Set up module 150 and comprise definition unit 151, taxon 152, retrieval unit 153, filler cells 154 and set up unit 155.
Particularly, the first checking module 110 is for checking the number of web website; The second checking module 120, in the situation that the number of web website is one, checks whether web website has index page; Optimize module 130 in the situation that web website has index page, index page is optimized to generate the first index page; Generation module 140 is not in the situation that web website has index page, according to structural generation second index page of web website; And to set up module 150 be in plural situation for the number at web website, based on semanteme, set up cross-site index page.Optimize delete cells 131 in module 130 for deleting the index entry that can not be connected to the resource page in web website in the non-index information of index page and index page; The first extracting unit 132 for effective index entry of extracting index page to generate the first index page.Judging unit 141 in generation module 140 is for judging whether the resource page in web website has title; The second extracting unit 142, in the headed situation of resource page tool in web website, extracts the title of resource page as index entry; And generation unit 143, generates the summary info of resource page as index entry, and generates the second index page according to index entry not in the headed situation of tool for the resource page in web website.Set up definition unit 151 in module 150 for predefine a plurality of templates from different semantic corresponding; Taxon 152 is for classifying the resource page in plural web website and being organized in a web website according to semantic dependency; Retrieval unit 153 is for finding a template for correspondence with it according to the semantic dependency of a web website; Filler cells 154 is for being filled into respectively the different columns of corresponding templates according to the resource page of semantic dependency classification in a web website; And set up unit 155 for setting up cross-site index page according to the information of different templates.
More specifically, the first checking module 110 is for detection of the number that need to obtain the web website of structurized resource organizations relation; If the second checking module 120 is in the situation that the first checking module 110 is checked through the web website that will obtain structurized resource organizations relation is one, start to excavate the page of this web website, check the index page that whether has the index information that comprises this web website; The in the situation that if optimization module 130 having index page for being checked through this web website at the second checking module 120, obtain the page info in this index page, this page info is optimized, and obtains and set up the required information of the first index page, generate the first index page of this web website; The in the situation that if generation module 140 not having index page for examining this web website in the second inspection 120 modules, resource page to this web website excavates, according to the structure of the acquisition of information web website of resource page, according to the structural information of web website, generate the second index page of this web website; And if setting up module 150, will obtain the web website of structurized resource organizations relation for being checked through at the first checking module 110 be in plural situation, according to the semantic dependency of resource page, contact these web websites, cross over station for acquiring resource index organizational information, and generate index page according to the resource index organizational information getting.Optimizing module this page info is optimized, and obtain the required information of the first index page of setting up, while generating the first index page of this web website, by removing module 131, obtain the full detail of this index pages and this index pages information is analyzed, non-index information is wherein deleted, index entry information remaining in index page information is checked simultaneously, see whether these index entries can be connected to the resource page in the web website of its sensing, deletion can not be connected to the index entry of resource page in himself web website pointed; Then by the first abstraction module 132, extract effective index information remaining in index page and be incorporated on a page, generate the first index pages.At generation module 140 in the situation that web website does not have index page, according in structural generation second index page of web website, specifically by judging unit 141, from homepage, start to excavate the resource page of this web website, obtain the information of the resource page of this web website, judge whether resource page has heading message; If then wherein resource page has heading message, by the second extracting unit 142, obtain index entry, extract title in this resource page as index entry, if resource page does not have heading message, the main information comprising according to resource page by generation unit 143 generates summary info, and using this summary info as index entry, the index entry that obtains all resource pages is incorporated on a page, generates the second index page.When setting up module 150 and set up cross-site index page based on semanteme, a plurality of templates by definition unit 151 predefines from different semantic corresponding, then by taxon 152, obtain the information comprising in the resource page in each web website, and according to the semantic dependency in resource page information, the resource page of each web website is classified, by in semantic relevant Information Organization to a web website in each resource page, then by retrieval unit 153, according to the semantic dependency of the resource page of organizing in a web website, in predefined semantic template, search, obtain first template corresponding with the semanteme of a web website, then pass through filler cells 154 according to the attribute of keyword, the information of each resource page in the one web website is filled in the first template according to semantic correlativity, according to the difference of column, add corresponding resource page information, finally by setting up unit 155 according to inserting relevant information in different semantic templates, the keyword of integrating each template with semantic as index entry, set up cross-site index page.
Illustrate the specific implementation process of setting up module 150 below.
For example, define the semantic template of a books information, the inside comprises the sub-columns such as books essential information, popular comment, businessman's rate of exchange, e-sourcing and other modules; Then supposing has a book big talk Design Mode in each site resource page, then the relevant information of talking about Design Mode in each resource page is incorporated in a web website as a class; Then according to the keyword lookup of the information of this this book of big talk Design Mode, arrive books information model, according to the sub-column module in template, the relevant information in the one web website is filled into each the sub-column in template, then extract key message in template as index entry, so set up a plurality of pages and extract index entry, can integrate index entry and set up cross-site index page.
In one embodiment of the invention, different semantemes comprises novel title, news title, video name and trade name etc.
In one embodiment of the invention, non-index information comprises advertisement and animation.
According to the web site resource management devices of the embodiment of the present invention, by by the information classification tissue filling of a plurality of websites to the generation of carrying out index entry in template, improved the degree of polymerization of resource organizations's relation and the sharpness between relation.
In the description of this instructions, the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means to be contained at least one embodiment of the present invention or example in conjunction with specific features, structure, material or the feature of this embodiment or example description.In this manual, the schematic statement of above-mentioned term is not necessarily referred to identical embodiment or example.And the specific features of description, structure, material or feature can be with suitable mode combinations in any one or more embodiment or example.
Although illustrated and described embodiments of the invention, for the ordinary skill in the art, be appreciated that without departing from the principles and spirit of the present invention and can carry out multiple variation, modification, replacement and modification to these embodiment, scope of the present invention is by claims and be equal to and limit.

Claims (12)

1. a web site resource management method, is characterized in that, comprises the following steps:
Check the number of described web website;
If the number of described web website is one, further check whether described web website has index page;
If there is index page, described index page is optimized to generate the first index page;
If there is no index page, according to structural generation second index page of described web website; And
If the number of described web website is more than two, based on semanteme, set up cross-site index page.
2. site resource management method according to claim 1, is characterized in that, the step that described index page is optimized to generate to the first index page comprises:
Delete the non-index information in described index page;
Delete the index entry that can not be connected to the resource page in described web website in described index page; And
Extract effective index entry in described index page to generate described the first index page.
3. site resource management method according to claim 1 and 2, is characterized in that, according to the step of structural generation second index page of described web website, comprises:
Judge whether the resource page in described web website has title;
If so the title that, extracts described resource page is as index entry;
If not, the summary info that generates described resource page is as index entry; And
According to described index entry, generate described the second index page.
4. site resource management method according to claim 1 and 2, is characterized in that, the step of setting up cross-site index page based on semanteme comprises:
A plurality of templates of predefine and different semantic corresponding;
Resource page in described plural web website is classified according to semantic dependency and be organized in a web website;
According to a described web website semantic dependency, find the first template that a web website is corresponding;
To in a described web website, according to the resource page of semantic dependency classification, be filled into respectively in the sub-column of difference of the first template; And
According to the information in described different template, set up described cross-site index page.
5. site resource management method according to claim 4, is characterized in that, described different semanteme comprises novel title, news title, video name and trade name.
6. site resource management method according to claim 1 and 2, is characterized in that, described non-index information comprises advertisement and animation.
7. a web site resource management devices, is characterized in that, comprising:
The first checking module, described the first checking module is for checking the number of described web website;
The second checking module, described the second checking module, in the situation that the number of described web website is one, checks whether described web website has index page;
Optimize module, described optimization module, in the situation that described web website has index page, is optimized to generate the first index page to described index page;
Generation module, described the first generation module is not in the situation that described web website has index page, according to structural generation second index page of described web website; And
Set up module, the described module of setting up is in plural situation for the number at described web website, based on semanteme, sets up cross-site index page.
8. site resource management devices according to claim 7, is characterized in that, described optimization module comprises:
Delete cells, described delete cells is for deleting the index entry that can not be connected to the resource page in described web website in the non-index information of described index page and described index page; And
The first extracting unit, effective index entry that described the first extracting unit is used for extracting described index page is to generate described the first index page.
9. according to the site resource management devices described in claim 7 or 8, it is characterized in that, described generation module comprises:
Judging unit, described judging unit is for judging whether the resource page in described web website has title;
The second extracting unit, described the second extracting unit, in the headed situation of resource page tool in described web website, extracts the title of described resource page as index entry; And
Generation unit, described generation unit, generates the summary info of described resource page as index entry, and generates described the second index page according to described index entry not in the headed situation of tool for resource page in described web website.
10. according to the site resource management devices described in claim 7 or 8, it is characterized in that, set up module and comprise:
Definition unit, described definition unit is a plurality of templates from different semantic corresponding for predefine;
Taxon, described taxon is for classifying the resource page in described plural web website and being organized in a web website according to semantic dependency;
Retrieval unit, finds a template for correspondence with it according to the semantic dependency of a web website;
Filler cells, described filler cells is for being filled into respectively the different columns of corresponding templates according to the resource page of semantic dependency classification in a described web website; And
Set up unit, the described unit of setting up is for setting up described cross-site index page according to the information of described different template.
11. site resource management devices according to claim 10, is characterized in that, described different semanteme comprises novel title, news title and video name.
12. according to the site resource management devices described in claim 7 or 8, it is characterized in that, described non-index information comprises advertisement and animation.
CN201210222539.2A 2012-06-28 2012-06-28 A kind of web site resource management method and device Active CN103514221B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210222539.2A CN103514221B (en) 2012-06-28 2012-06-28 A kind of web site resource management method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210222539.2A CN103514221B (en) 2012-06-28 2012-06-28 A kind of web site resource management method and device

Publications (2)

Publication Number Publication Date
CN103514221A true CN103514221A (en) 2014-01-15
CN103514221B CN103514221B (en) 2016-12-28

Family

ID=49896954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210222539.2A Active CN103514221B (en) 2012-06-28 2012-06-28 A kind of web site resource management method and device

Country Status (1)

Country Link
CN (1) CN103514221B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001077886A1 (en) * 2000-04-10 2001-10-18 Blueskyfrog Pty Ltd A method of filtering the contents of a virtual page
CN1732459A (en) * 2002-11-01 2006-02-08 Lg电子株式会社 Web content transcoding system and method for small display device
US20070143283A1 (en) * 2005-12-09 2007-06-21 Stephan Spencer Method of optimizing search engine rankings through a proxy website
CN101097578A (en) * 2007-06-07 2008-01-02 北京金山软件有限公司 Network resource searching method and system
US20080275877A1 (en) * 2007-05-04 2008-11-06 International Business Machines Corporation Method and system for variable keyword processing based on content dates on a web page
CN101887422A (en) * 2009-05-13 2010-11-17 北京博越世纪科技有限公司 Technique for keeping synchronous update of data of web site and wap site

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001077886A1 (en) * 2000-04-10 2001-10-18 Blueskyfrog Pty Ltd A method of filtering the contents of a virtual page
CN1732459A (en) * 2002-11-01 2006-02-08 Lg电子株式会社 Web content transcoding system and method for small display device
US20070143283A1 (en) * 2005-12-09 2007-06-21 Stephan Spencer Method of optimizing search engine rankings through a proxy website
US20080275877A1 (en) * 2007-05-04 2008-11-06 International Business Machines Corporation Method and system for variable keyword processing based on content dates on a web page
CN101097578A (en) * 2007-06-07 2008-01-02 北京金山软件有限公司 Network resource searching method and system
CN101887422A (en) * 2009-05-13 2010-11-17 北京博越世纪科技有限公司 Technique for keeping synchronous update of data of web site and wap site

Also Published As

Publication number Publication date
CN103514221B (en) 2016-12-28

Similar Documents

Publication Publication Date Title
KR101872564B1 (en) Borderless table detection engine
CN104572849A (en) Automatic standardized filing method based on text semantic mining
JP5587989B2 (en) Providing patent maps by viewpoint
CN105468744A (en) Big data platform for realizing tax public opinion analysis and full text retrieval
CN105589936A (en) Data query method and system
CN103324622A (en) Method and device for automatic generating of front page abstract
MX2011005771A (en) Method and device for intercepting spam.
CN105912645A (en) Intelligent question and answer method and apparatus
CN103425770A (en) Event multi-dimensional information display device and method
CN103500158A (en) Method and device for annotating electronic document
CN104866527A (en) Dynamic webpage template matching method and device
CN105138538A (en) Cross-domain knowledge discovery-oriented topic mining method
CN105808722A (en) Information discrimination method and system
KR20120047632A (en) Context-aware apparatus and method thereof
CN104778238A (en) Video saliency analysis method and video saliency analysis device
CN104216979A (en) Chinese technology patent automatic classification system and method for patent classification by using system
CN103377225A (en) Method and device for building knowledge base system
CN107391684A (en) A kind of method and system for threatening information generation
CN106055641A (en) Human-computer interaction method and device oriented to intelligent robot
CN104156430A (en) Device and method for fast extracting Android mobile phone data
CN113342989A (en) Knowledge graph construction method and device of patent data, storage medium and terminal
CN103455964A (en) Case clue analyzing system and method based on case information
CN106326090A (en) Method and device for realizing construction of test use case
CN103514221A (en) Web site resource management method and device
CN106446293A (en) Rapid geographic name census data base establishing method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant