CN103514221A - Web site resource management method and device - Google Patents
Web site resource management method and device Download PDFInfo
- Publication number
- CN103514221A CN103514221A CN201210222539.2A CN201210222539A CN103514221A CN 103514221 A CN103514221 A CN 103514221A CN 201210222539 A CN201210222539 A CN 201210222539A CN 103514221 A CN103514221 A CN 103514221A
- Authority
- CN
- China
- Prior art keywords
- page
- index
- web website
- resource
- index page
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention provides a web site resource management method and device. The method comprises the following steps that the number of web sites is checked; if one web site exists, whether the web sites have index pages is further checked; if the index pages exist, the index pages are optimized to generate first index pages; if no index page exists, second index pages are generated according to the structures of the web sites; if the web sites are more than two in number, cross-site index pages are built according to semantics. According to the web site resource management method, through detection of the web sites, the index pages are built through different methods according to whether the index pages exist and according to different numbers of the web sites, organization relations of site resources can be excavated completely, the polymerization degree of the site resources is improved, and the showing effect of the pages of the sites is improved.
Description
Technical field
The present invention relates to web site resource membership credentials analysis mining field, particularly a kind of web site resource management method and device.
Background technology
Nowadays, web appization technology is also more and more common, and web website is converted to resource organizations's relation that app need to provide this website, and therefore need to carry out analysis mining to the Zhan Nei resource organizations relation of web website, obtain structurized resource organizations relation data.
To resource organizations's relation excavation of web website, be at present mainly to excavate by manually checking, do not have ripe prior art, therefore have following shortcoming:
(1) excavation of resource organizations's relation of website is not just adopted to distinct methods according to classification is different, excavate comprehensively not, and the degree of polymerization is not high;
(2) there is no fixing method for digging, the resource organizations's relation obtaining is clear not and more chaotic, more conveniently structuring.
Summary of the invention
The present invention is intended at least one of solve the problems of the technologies described above.
For this reason, first object of the present invention is to propose a kind of web site resource management method.
Second object of the present invention is to propose a kind of web site resource management devices.
To achieve these goals, the web site resource management method of embodiment comprises the following steps according to a first aspect of the invention: the number that checks described web website; If the number of described web website is one, further check whether described web website has index page; If there is index page, described index page is optimized to generate the first index page; If there is no index page, according to structural generation second index page of described web website; And if the number of described web website is more than two, set up cross-site index page based on semanteme.
According to the web site resource management method of the embodiment of the present invention, by the detection to web website, according to whether having index page to carry out from three different all situations of the number of web website the index page of setting up according to distinct methods, can fully excavate website resource organizations relation and improve the degree of polymerization of site resource and improve site page bandwagon effect.
For achieving the above object, the web site resource management devices of the embodiment of second aspect present invention comprises: the first checking module, and described the first checking module is for checking the number of described web website; The second checking module, described the second checking module, in the situation that the number of described web website is one, checks whether described web website has index page; Optimize module, described optimization module, in the situation that described web website has index page, is optimized to generate the first index page to described index page; Generation module, described the first generation module is not in the situation that described web website has index page, according to structural generation second index page of described web website; And set up module, the described module of setting up is in plural situation for the number at described web website, based on semanteme, sets up cross-site index page.
According to the web site resource management devices of the embodiment of the present invention, by the detection to web website, according to whether having index page to carry out from three kinds of different situations of the number of web website the index page of setting up according to distinct methods, can fully excavate website resource organizations relation and improve the degree of polymerization of site resource and improve site page bandwagon effect.
Additional aspect of the present invention and advantage in the following description part provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Accompanying drawing explanation
Above-mentioned and/or the additional aspect of the present invention and advantage will become from the following description of the accompanying drawings of embodiments and obviously and easily understand, wherein:
Fig. 1 is a kind of according to an embodiment of the invention process flow diagram of web site resource management method;
Fig. 2 is a kind of according to an embodiment of the invention process flow diagram of web site resource management method;
Fig. 3 is a kind of according to an embodiment of the invention process flow diagram of web site resource management method;
Fig. 4 is a kind of according to an embodiment of the invention process flow diagram of web site resource management method;
Fig. 5 is the structural representation of web site resource management devices according to an embodiment of the invention;
Fig. 6 is the structural representation of web site resource management devices according to an embodiment of the invention;
Fig. 7 is the structural representation of web site resource management devices according to an embodiment of the invention; And
Fig. 8 is the structural representation of web site resource management devices according to an embodiment of the invention.
Embodiment
Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has the element of identical or similar functions from start to finish.Below by the embodiment being described with reference to the drawings, be exemplary, only for explaining the present invention, and can not be interpreted as limitation of the present invention.
With reference to description and accompanying drawing below, these and other aspects of embodiments of the invention will be known.These describe and accompanying drawing in, specifically disclose some particular implementation in embodiments of the invention, represent to implement some modes of the principle of embodiments of the invention, still should be appreciated that the scope of embodiments of the invention is not limited.On the contrary, embodiments of the invention comprise spirit and all changes within the scope of intension, modification and the equivalent that falls into additional claims.
Below with reference to Figure of description, describe according to the web site resource management method of the embodiment of the present invention
A site resource management method, comprises the following steps: the number that checks web website; If the number of web website is one, further check whether web website has index page; If there is index page, index page is optimized to generate the first index page; If there is no index page, according to structural generation second index page of web website; And if the number of web website is more than two, set up cross-site index page based on semanteme.
Fig. 1 is the process flow diagram of the web site resource management method of one embodiment of the invention.
As shown in Figure 1, according to the web site resource management method of the embodiment of the present invention, comprise the steps:
Step S101: the number that checks web website.
Particularly, detection need to be obtained the number of the web website of structurized resource organizations relation.
Step S102: if the number of web website is one, further check whether web website has index page.
Particularly, if be checked through the web website that will obtain structurized resource organizations relation, be one, start so to excavate the page of this web website, check the index page that whether has the index information that comprises this web website.
Step S103: if there is index page, index page is optimized to generate the first index page.
Particularly, if this web website has index page, obtain so the page info in this index page, this page info is optimized to obtain and sets up the required information of the first index page, generate the first index page of this web website.
Step S104: if there is no index page, according to structural generation second index page of web website.
Particularly, if this web website does not have index page, the resource page of this web website is excavated, according to the structure of the acquisition of information web website of resource page, according to the structural information of web website, generate the second index page of this web website.
Step S105: if the number of web website is more than two, set up cross-site index page based on semanteme.
Particularly, if it is more than two being checked through the web website that will obtain structurized resource organizations relation, according to the semantic dependency of resource page, contact these web websites, cross over station for acquiring resource index organizational information, and generate index page according to the resource index organizational information getting.
According to the web site resource management method of the embodiment of the present invention, by the detection to web website, according to whether having index page to carry out from three kinds of different situations of the number of web website the index page of setting up according to distinct methods, can fully excavate website resource organizations relation and improve the degree of polymerization of site resource and improve site page bandwagon effect.
Fig. 2 is the process flow diagram of the web site resource management method of another embodiment of the present invention.
As shown in Figure 2, according to the web site resource management method of the embodiment of the present invention, comprise the steps.
Step S201: the number that checks web website.
Particularly, detection need to be obtained the number of the web website of structurized resource organizations relation.
Step S202: if the number of web website is one, further check whether web website has index page.
Particularly, if be checked through the web website that will obtain structurized resource organizations relation, be one, start so to excavate the page of this web website, check the index page that whether has the index information that comprises this web website.
Step S203: if there is index page, delete the non-index information in index page.
Particularly, if this web website has index page, obtain the full detail of this index pages and this index pages information is analyzed, non-index information is wherein deleted.
Step S204: delete the index entry that can not be connected to the resource page in web website in index page.
Particularly, index entry information remaining in index page information is checked see whether these index entries can be connected to the resource page in the web website of its sensing, deletion can not be connected to the index entry of resource page in himself web website pointed.
Step S205: the effective index entry in extraction index page is to generate the first index page.
Particularly, extract effective index information remaining in index page and be incorporated on a page, generate the first index pages.
Step S206: if there is no index page, according to structural generation second index page of web website.
Particularly, if this web website does not have index page, the resource page of this web website is excavated, according to the structure of the acquisition of information web website of resource page, according to the structural information of web website, generate the second index page of this web website.
Step S207: if the number of web website is more than two, set up cross-site index page based on semanteme.
Particularly, if it is more than two being checked through the web website that will obtain structurized resource organizations relation, according to the semantic dependency of resource page, contact these web websites, cross over station for acquiring resource index organizational information, and generate index page according to the resource index organizational information getting.
In one embodiment of the invention, non-index information comprises advertisement and animation.
According to the web site resource management method of the embodiment of the present invention, by the non-index information of index pages and invalid index are deleted, according to effective index generating indexes page, can effectively obtain resource organizations's relation of web website, and relation is clear, the degree of polymerization is higher.
Fig. 3 is the process flow diagram of the web site resource management method of another embodiment of the present invention.
As shown in Figure 3, according to the web site resource management method of the embodiment of the present invention, comprise the steps.
Step S301: the number that checks web website.
Particularly, detection need to be obtained the number of the web website of structurized resource organizations relation.
Step S302: if the number of web website is one, further check whether web website has index page.
Particularly, if be checked through the web website that will obtain structurized resource organizations relation, be one, start so to excavate the page of this web website, check the index page that whether has the index information that comprises this web website.
Step S303: if there is index page, delete the non-index information in index page.
Particularly, if this web website has index page, obtain the full detail of this index pages and this index pages information is analyzed, non-index information is wherein deleted.
Step S304: delete the index entry that can not be connected to the resource page in web website in index page.
Particularly, index entry information remaining in index page information is checked see whether these index entries can be connected to the resource page in the web website of its sensing, deletion can not be connected to the index entry of resource page in himself web website pointed.
Step S305: the effective index entry in extraction index page is to generate the first index page.
Particularly, extract effective index information remaining in index page and be incorporated on a page, generate the first index pages.
Step S306: if there is no index page, judge whether the resource page in web website has title.
Particularly, if this web website does not have index page, from homepage, start to excavate the resource page of this web website, obtain the information of the resource page of this web website, judge whether resource page has heading message.
Step S307: if extract the title of resource page as index entry.
Particularly, if resource page has heading message, extract title in this resource page as index entry.
Step S308: if not, generate the summary info of resource page as index entry.
Particularly, if resource page does not have heading message, the main information comprising according to resource page generates summary info, and using this summary info as index entry.
Step S309: generate the second index page according to index entry.
Particularly, the index entry that obtains all resource pages is incorporated on a page, generates the second index page.
Step S310: if the number of web website is more than two, set up cross-site index page based on semanteme.
Particularly, if it is more than two being checked through the web website that will obtain structurized resource organizations relation, according to the semantic dependency of resource page, contact these web websites, cross over station for acquiring resource index organizational information, and generate index page according to the resource index organizational information getting.
In one embodiment of the invention, non-index information comprises advertisement and animation.
According to the web site resource management method of the embodiment of the present invention, by by the heading message of resource page or summary info generating indexes item, according to these index entries, generate index pages again, improved the degree of polymerization of resource organizations's relation and the sharpness between relation.
Fig. 4 is the process flow diagram of the web site resource management method of another embodiment of the present invention.
As shown in Figure 4, according to the web site resource management method of the embodiment of the present invention, comprise the steps.
Step S401: the number that checks web website.
Particularly, detection need to be obtained the number of the web website of structurized resource organizations relation.
Step S402: if the number of web website is one, further check whether web website has index page.
Particularly, if be checked through the web website that will obtain structurized resource organizations relation, be one, start so to excavate the page of this web website, check the index page that whether has the index information that comprises this web website.
Step S403: if there is index page, delete the non-index information in index page.
Particularly, if this web website has index page, obtain the full detail of this index pages and this index pages information is analyzed, non-index information is wherein deleted.
Step S404: delete the index entry that can not be connected to the resource page in web website in index page.
Particularly, index entry information remaining in index page information is checked see whether these index entries can be connected to the resource page in the web website of its sensing, deletion can not be connected to the index entry of resource page in himself web website pointed.
Step S405: the effective index entry in extraction index page is to generate the first index page.
Particularly, extract effective index information remaining in index page and be incorporated on a page, generate the first index pages.
Step S406: if there is no index page, judge whether the resource page in web website has title.
Particularly, if this web website does not have index page, from homepage, start to excavate the resource page of this web website, obtain the information of the resource page of this web website, judge whether resource page has heading message.
Step S407: if extract the title of resource page as index entry.
Particularly, if resource page has heading message, extract title in this resource page as index entry
Step S408: if not, generate the summary info of resource page as index entry.
Particularly, if resource page does not have heading message, the main information comprising according to resource page generates summary info, and using this summary info as index entry.
Step S409: generate the second index page according to index entry.
Particularly, the index entry that obtains all resource pages is incorporated on a page, generates the second index page.
Step S410: if the number of web website is more than two, a plurality of templates of predefine and different semantic corresponding.
Particularly, if be checked through the web website that will obtain structurized resource organizations relation, be more than two, according to semantic dependency, preset the template relevant to this semantic corresponding.
Step S411: the resource page in plural web website is classified according to semantic dependency and be organized in a web website.
Particularly, obtain the information comprising in the resource page in each web website, and according to the semantic dependency in resource page information, the resource page of each web website is classified, by semantic relevant Information Organization to a web website in each resource page.
Step S412: find the first template that a web website is corresponding according to a web website semantic dependency.
Particularly, according to the semantic dependency of the resource page of organizing in a web website, in predefined semantic template, search, obtain first template corresponding with the semanteme of a web website.
Step S413: will be filled into respectively in the sub-column of difference of the first template according to the resource page of semantic dependency classification in a web website.
Particularly, according to the attribute of keyword, the information of each resource page in a web website is filled in the first template according to semantic correlativity, according to the difference of column, adds corresponding resource page information.
Step S414: set up cross-site index page according to the information in different templates.
Particularly, according to inserting relevant information in different semantic templates, the keyword of integrating each template as index entry, is set up cross-site index page with semantic.
Illustrate step S410 below to the specific implementation process of S414.
For example, define the semantic template of a books information, the inside comprises the sub-columns such as books essential information, popular comment, businessman's rate of exchange, e-sourcing and other modules; Then supposing has a book big talk Design Mode in each site resource page, then the relevant information of talking about Design Mode in each resource page is incorporated in a web website as a class; Then according to the keyword lookup of the information of this this book of big talk Design Mode, arrive books information model, according to the sub-column module in template, the relevant information in the one web website is filled into each the sub-column in template, then extract key message in template as index entry, so set up a plurality of pages and extract index entry, can integrate index entry and set up cross-site index page.
In one embodiment of the invention, different semantemes comprises novel title, news title, video name and trade name etc.
In one embodiment of the invention, non-index information comprises advertisement and animation.
According to the web site resource management method of the embodiment of the present invention, by by the information classification tissue filling of a plurality of websites to the generation of carrying out index entry in template, improved the degree of polymerization of resource organizations's relation and the sharpness between relation.
Below with reference to Figure of description, describe according to the web site resource management devices of the embodiment of the present invention.
Site resource management devices comprises: first checking module, and the first checking module is for checking the number of web website; The second checking module, the second checking module, in the situation that the number of web website is one, checks whether web website has index page; Optimize module, optimize module in the situation that web website has index page, index page is optimized to generate the first index page; Generation module, the first generation module is not in the situation that web website has index page, according to structural generation second index page of web website; And set up module, setting up module is in plural situation for the number at web website, based on semanteme, sets up cross-site index page.。
Fig. 5 is the structural representation of the web site resource management devices of one embodiment of the invention.
As shown in Figure 5, the web site resource management devices according to the embodiment of the present invention, comprising: the first checking module 110, the second checking modules 120, optimize module 130, generation module 140 and set up module 150.
Particularly, the first checking module 110 is for checking the number of web website; The second checking module 120, in the situation that the number of web website is one, checks whether web website has index page; Optimize module 130 in the situation that web website has index page, index page is optimized to generate the first index page; Generation module 140 is not in the situation that web website has index page, according to structural generation second index page of web website; And to set up module 150 be in plural situation for the number at web website, based on semanteme, set up cross-site index page.
More specifically, the first checking module 110 is for detection of the number that need to obtain the web website of structurized resource organizations relation; If the second checking module 120 is in the situation that the first checking module 110 is checked through the web website that will obtain structurized resource organizations relation is one, start to excavate the page of this web website, check the index page that whether has the index information that comprises this web website; The in the situation that if optimization module 130 having index page for being checked through this web website at the second checking module 120, obtain the page info in this index page, this page info is optimized, and obtains and set up the required information of the first index page, generate the first index page of this web website; The in the situation that if generation module 140 not having index page for examining this web website in the second inspection 120 modules, resource page to this web website excavates, according to the structure of the acquisition of information web website of resource page, according to the structural information of web website, generate the second index page of this web website; And if setting up module 150, will obtain the web website of structurized resource organizations relation for being checked through at the first checking module 110 be in plural situation, according to the semantic dependency of resource page, contact these web websites, cross over station for acquiring resource index organizational information, and generate index page according to the resource index organizational information getting.
According to the web site resource management devices of the embodiment of the present invention, the index page of the detection of web website being set up by three different modules according to distinct methods according to the three kinds of different situations that whether have index page and the number of web website by two checking modules, can fully excavate website resource organizations relation and improve the degree of polymerization and the raising site page bandwagon effect of site resource.
Fig. 6 is the structural representation of the web site resource management devices of another embodiment of the present invention.
As shown in Figure 6, the web site resource management devices according to the embodiment of the present invention, comprising: the first checking module 110, the second checking module 120, optimize module 130, generation module 140 and set up module 150, wherein optimizes module 130 and comprises delete cells 131 and the first extracting unit 132.
Particularly, the first checking module 110 is for checking the number of web website; The second checking module 120, in the situation that the number of web website is one, checks whether web website has index page; Optimize module 130 in the situation that web website has index page, index page is optimized to generate the first index page; Generation module 140 is not in the situation that web website has index page, according to structural generation second index page of web website; And to set up module 150 be in plural situation for the number at web website, based on semanteme, set up cross-site index page.Wherein delete cells 131 is for deleting the index entry that can not be connected to the resource page in web website in the non-index information of index page and index page; The first extracting unit 132 for effective index entry of extracting index page to generate the first index page.
More specifically, the first checking module 110 is for detection of the number that need to obtain the web website of structurized resource organizations relation; If the second checking module 120 is in the situation that the first checking module 110 is checked through the web website that will obtain structurized resource organizations relation is one, start to excavate the page of this web website, check the index page that whether has the index information that comprises this web website; The in the situation that if optimization module 130 having index page for being checked through this web website at the second checking module 120, obtain the page info in this index page, this page info is optimized, and obtains and set up the required information of the first index page, generate the first index page of this web website; The in the situation that if generation module 140 not having index page for examining this web website in the second inspection 120 modules, resource page to this web website excavates, according to the structure of the acquisition of information web website of resource page, according to the structural information of web website, generate the second index page of this web website; And if setting up module 150, will obtain the web website of structurized resource organizations relation for being checked through at the first checking module 110 be in plural situation, according to the semantic dependency of resource page, contact these web websites, cross over station for acquiring resource index organizational information, and generate index page according to the resource index organizational information getting.Optimizing module this page info is optimized, and obtain the required information of the first index page of setting up, while generating the first index page of this web website, by removing module 131, obtain the full detail of this index pages and this index pages information is analyzed, non-index information is wherein deleted, index entry information remaining in index page information is checked simultaneously, see whether these index entries can be connected to the resource page in the web website of its sensing, deletion can not be connected to the index entry of resource page in himself web website pointed; Then by the first abstraction module 132, extract effective index information remaining in index page and be incorporated on a page, generate the first index pages.
In one embodiment of the invention, non-index information comprises advertisement and animation.
According to the web site resource management devices of the embodiment of the present invention, by removing module, the non-index information of index pages and invalid index are deleted, and by the effective index generating indexes page of abstraction module basis, can effectively obtain resource organizations's relation of web website, and relation is clear, the degree of polymerization is higher.
Fig. 7 is the structural representation of the web site resource management devices of another embodiment of the present invention.
As shown in Figure 7, according to the web site resource management devices of the embodiment of the present invention, comprise: the first checking module 110, the second checking module 120, optimize module 130, generation module 140 and set up module 150, wherein optimizes module 130 and comprises delete cells 131 and the first extracting unit 132, generation module 140 comprises judging unit 141, the second extracting units 142 and generation unit 143.
Particularly, the first checking module 110 is for checking the number of web website; The second checking module 120, in the situation that the number of web website is one, checks whether web website has index page; Optimize module 130 in the situation that web website has index page, index page is optimized to generate the first index page; Generation module 140 is not in the situation that web website has index page, according to structural generation second index page of web website; And to set up module 150 be in plural situation for the number at web website, based on semanteme, set up cross-site index page.Wherein delete cells 131 is for deleting the index entry that can not be connected to the resource page in web website in the non-index information of index page and index page; The first extracting unit 132 for effective index entry of extracting index page to generate the first index page.Wherein judging unit 141 is for judging whether the resource page in web website has title; The second extracting unit 142, in the headed situation of resource page tool in web website, extracts the title of resource page as index entry; And generation unit 143, generates the summary info of resource page as index entry, and generates the second index page according to index entry not in the headed situation of tool for the resource page in web website.
More specifically, the first checking module 110 is for detection of the number that need to obtain the web website of structurized resource organizations relation; If the second checking module 120 is in the situation that the first checking module 110 is checked through the web website that will obtain structurized resource organizations relation is one, start to excavate the page of this web website, check the index page that whether has the index information that comprises this web website; The in the situation that if optimization module 130 having index page for being checked through this web website at the second checking module 120, obtain the page info in this index page, this page info is optimized, and obtains and set up the required information of the first index page, generate the first index page of this web website; The in the situation that if generation module 140 not having index page for examining this web website in the second inspection 120 modules, resource page to this web website excavates, according to the structure of the acquisition of information web website of resource page, according to the structural information of web website, generate the second index page of this web website; And if setting up module 150, will obtain the web website of structurized resource organizations relation for being checked through at the first checking module 110 be in plural situation, according to the semantic dependency of resource page, contact these web websites, cross over station for acquiring resource index organizational information, and generate index page according to the resource index organizational information getting.Optimizing module this page info is optimized, and obtain the required information of the first index page of setting up, while generating the first index page of this web website, by removing module 131, obtain the full detail of this index pages and this index pages information is analyzed, non-index information is wherein deleted, index entry information remaining in index page information is checked simultaneously, see whether these index entries can be connected to the resource page in the web website of its sensing, deletion can not be connected to the index entry of resource page in himself web website pointed; Then by the first abstraction module 132, extract effective index information remaining in index page and be incorporated on a page, generate the first index pages.At generation module 140 in the situation that web website does not have index page, according in structural generation second index page of web website, specifically by judging unit 141, from homepage, start to excavate the resource page of this web website, obtain the information of the resource page of this web website, judge whether resource page has heading message; If then wherein resource page has heading message, by the second extracting unit 142, obtain index entry, extract title in this resource page as index entry, if resource page does not have heading message, the main information comprising according to resource page by generation unit 143 generates summary info, and using this summary info as index entry, the index entry that obtains all resource pages is incorporated on a page, generates the second index page.
In one embodiment of the invention, non-index information comprises advertisement and animation.
According to the web site resource management devices of the embodiment of the present invention, by generation module by the heading message of resource page or summary info generating indexes item, according to these index entries, generate index pages again, improved the degree of polymerization of resource organizations's relation and the sharpness between relation.
Fig. 7 is the structural representation of the web site resource management devices of another embodiment of the present invention.
As shown in Figure 7, the web site resource management devices according to the embodiment of the present invention, comprising: the first checking module 110, the second checking module 120, optimize module 130, generation module 140 and set up module 150, wherein optimizes module 130 and comprises delete cells 131 and the first extracting unit 132; Generation module 140 comprises judging unit 141, the second extracting units 142 and generation unit 143; Set up module 150 and comprise definition unit 151, taxon 152, retrieval unit 153, filler cells 154 and set up unit 155.
Particularly, the first checking module 110 is for checking the number of web website; The second checking module 120, in the situation that the number of web website is one, checks whether web website has index page; Optimize module 130 in the situation that web website has index page, index page is optimized to generate the first index page; Generation module 140 is not in the situation that web website has index page, according to structural generation second index page of web website; And to set up module 150 be in plural situation for the number at web website, based on semanteme, set up cross-site index page.Optimize delete cells 131 in module 130 for deleting the index entry that can not be connected to the resource page in web website in the non-index information of index page and index page; The first extracting unit 132 for effective index entry of extracting index page to generate the first index page.Judging unit 141 in generation module 140 is for judging whether the resource page in web website has title; The second extracting unit 142, in the headed situation of resource page tool in web website, extracts the title of resource page as index entry; And generation unit 143, generates the summary info of resource page as index entry, and generates the second index page according to index entry not in the headed situation of tool for the resource page in web website.Set up definition unit 151 in module 150 for predefine a plurality of templates from different semantic corresponding; Taxon 152 is for classifying the resource page in plural web website and being organized in a web website according to semantic dependency; Retrieval unit 153 is for finding a template for correspondence with it according to the semantic dependency of a web website; Filler cells 154 is for being filled into respectively the different columns of corresponding templates according to the resource page of semantic dependency classification in a web website; And set up unit 155 for setting up cross-site index page according to the information of different templates.
More specifically, the first checking module 110 is for detection of the number that need to obtain the web website of structurized resource organizations relation; If the second checking module 120 is in the situation that the first checking module 110 is checked through the web website that will obtain structurized resource organizations relation is one, start to excavate the page of this web website, check the index page that whether has the index information that comprises this web website; The in the situation that if optimization module 130 having index page for being checked through this web website at the second checking module 120, obtain the page info in this index page, this page info is optimized, and obtains and set up the required information of the first index page, generate the first index page of this web website; The in the situation that if generation module 140 not having index page for examining this web website in the second inspection 120 modules, resource page to this web website excavates, according to the structure of the acquisition of information web website of resource page, according to the structural information of web website, generate the second index page of this web website; And if setting up module 150, will obtain the web website of structurized resource organizations relation for being checked through at the first checking module 110 be in plural situation, according to the semantic dependency of resource page, contact these web websites, cross over station for acquiring resource index organizational information, and generate index page according to the resource index organizational information getting.Optimizing module this page info is optimized, and obtain the required information of the first index page of setting up, while generating the first index page of this web website, by removing module 131, obtain the full detail of this index pages and this index pages information is analyzed, non-index information is wherein deleted, index entry information remaining in index page information is checked simultaneously, see whether these index entries can be connected to the resource page in the web website of its sensing, deletion can not be connected to the index entry of resource page in himself web website pointed; Then by the first abstraction module 132, extract effective index information remaining in index page and be incorporated on a page, generate the first index pages.At generation module 140 in the situation that web website does not have index page, according in structural generation second index page of web website, specifically by judging unit 141, from homepage, start to excavate the resource page of this web website, obtain the information of the resource page of this web website, judge whether resource page has heading message; If then wherein resource page has heading message, by the second extracting unit 142, obtain index entry, extract title in this resource page as index entry, if resource page does not have heading message, the main information comprising according to resource page by generation unit 143 generates summary info, and using this summary info as index entry, the index entry that obtains all resource pages is incorporated on a page, generates the second index page.When setting up module 150 and set up cross-site index page based on semanteme, a plurality of templates by definition unit 151 predefines from different semantic corresponding, then by taxon 152, obtain the information comprising in the resource page in each web website, and according to the semantic dependency in resource page information, the resource page of each web website is classified, by in semantic relevant Information Organization to a web website in each resource page, then by retrieval unit 153, according to the semantic dependency of the resource page of organizing in a web website, in predefined semantic template, search, obtain first template corresponding with the semanteme of a web website, then pass through filler cells 154 according to the attribute of keyword, the information of each resource page in the one web website is filled in the first template according to semantic correlativity, according to the difference of column, add corresponding resource page information, finally by setting up unit 155 according to inserting relevant information in different semantic templates, the keyword of integrating each template with semantic as index entry, set up cross-site index page.
Illustrate the specific implementation process of setting up module 150 below.
For example, define the semantic template of a books information, the inside comprises the sub-columns such as books essential information, popular comment, businessman's rate of exchange, e-sourcing and other modules; Then supposing has a book big talk Design Mode in each site resource page, then the relevant information of talking about Design Mode in each resource page is incorporated in a web website as a class; Then according to the keyword lookup of the information of this this book of big talk Design Mode, arrive books information model, according to the sub-column module in template, the relevant information in the one web website is filled into each the sub-column in template, then extract key message in template as index entry, so set up a plurality of pages and extract index entry, can integrate index entry and set up cross-site index page.
In one embodiment of the invention, different semantemes comprises novel title, news title, video name and trade name etc.
In one embodiment of the invention, non-index information comprises advertisement and animation.
According to the web site resource management devices of the embodiment of the present invention, by by the information classification tissue filling of a plurality of websites to the generation of carrying out index entry in template, improved the degree of polymerization of resource organizations's relation and the sharpness between relation.
In the description of this instructions, the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means to be contained at least one embodiment of the present invention or example in conjunction with specific features, structure, material or the feature of this embodiment or example description.In this manual, the schematic statement of above-mentioned term is not necessarily referred to identical embodiment or example.And the specific features of description, structure, material or feature can be with suitable mode combinations in any one or more embodiment or example.
Although illustrated and described embodiments of the invention, for the ordinary skill in the art, be appreciated that without departing from the principles and spirit of the present invention and can carry out multiple variation, modification, replacement and modification to these embodiment, scope of the present invention is by claims and be equal to and limit.
Claims (12)
1. a web site resource management method, is characterized in that, comprises the following steps:
Check the number of described web website;
If the number of described web website is one, further check whether described web website has index page;
If there is index page, described index page is optimized to generate the first index page;
If there is no index page, according to structural generation second index page of described web website; And
If the number of described web website is more than two, based on semanteme, set up cross-site index page.
2. site resource management method according to claim 1, is characterized in that, the step that described index page is optimized to generate to the first index page comprises:
Delete the non-index information in described index page;
Delete the index entry that can not be connected to the resource page in described web website in described index page; And
Extract effective index entry in described index page to generate described the first index page.
3. site resource management method according to claim 1 and 2, is characterized in that, according to the step of structural generation second index page of described web website, comprises:
Judge whether the resource page in described web website has title;
If so the title that, extracts described resource page is as index entry;
If not, the summary info that generates described resource page is as index entry; And
According to described index entry, generate described the second index page.
4. site resource management method according to claim 1 and 2, is characterized in that, the step of setting up cross-site index page based on semanteme comprises:
A plurality of templates of predefine and different semantic corresponding;
Resource page in described plural web website is classified according to semantic dependency and be organized in a web website;
According to a described web website semantic dependency, find the first template that a web website is corresponding;
To in a described web website, according to the resource page of semantic dependency classification, be filled into respectively in the sub-column of difference of the first template; And
According to the information in described different template, set up described cross-site index page.
5. site resource management method according to claim 4, is characterized in that, described different semanteme comprises novel title, news title, video name and trade name.
6. site resource management method according to claim 1 and 2, is characterized in that, described non-index information comprises advertisement and animation.
7. a web site resource management devices, is characterized in that, comprising:
The first checking module, described the first checking module is for checking the number of described web website;
The second checking module, described the second checking module, in the situation that the number of described web website is one, checks whether described web website has index page;
Optimize module, described optimization module, in the situation that described web website has index page, is optimized to generate the first index page to described index page;
Generation module, described the first generation module is not in the situation that described web website has index page, according to structural generation second index page of described web website; And
Set up module, the described module of setting up is in plural situation for the number at described web website, based on semanteme, sets up cross-site index page.
8. site resource management devices according to claim 7, is characterized in that, described optimization module comprises:
Delete cells, described delete cells is for deleting the index entry that can not be connected to the resource page in described web website in the non-index information of described index page and described index page; And
The first extracting unit, effective index entry that described the first extracting unit is used for extracting described index page is to generate described the first index page.
9. according to the site resource management devices described in claim 7 or 8, it is characterized in that, described generation module comprises:
Judging unit, described judging unit is for judging whether the resource page in described web website has title;
The second extracting unit, described the second extracting unit, in the headed situation of resource page tool in described web website, extracts the title of described resource page as index entry; And
Generation unit, described generation unit, generates the summary info of described resource page as index entry, and generates described the second index page according to described index entry not in the headed situation of tool for resource page in described web website.
10. according to the site resource management devices described in claim 7 or 8, it is characterized in that, set up module and comprise:
Definition unit, described definition unit is a plurality of templates from different semantic corresponding for predefine;
Taxon, described taxon is for classifying the resource page in described plural web website and being organized in a web website according to semantic dependency;
Retrieval unit, finds a template for correspondence with it according to the semantic dependency of a web website;
Filler cells, described filler cells is for being filled into respectively the different columns of corresponding templates according to the resource page of semantic dependency classification in a described web website; And
Set up unit, the described unit of setting up is for setting up described cross-site index page according to the information of described different template.
11. site resource management devices according to claim 10, is characterized in that, described different semanteme comprises novel title, news title and video name.
12. according to the site resource management devices described in claim 7 or 8, it is characterized in that, described non-index information comprises advertisement and animation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210222539.2A CN103514221B (en) | 2012-06-28 | 2012-06-28 | A kind of web site resource management method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210222539.2A CN103514221B (en) | 2012-06-28 | 2012-06-28 | A kind of web site resource management method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103514221A true CN103514221A (en) | 2014-01-15 |
CN103514221B CN103514221B (en) | 2016-12-28 |
Family
ID=49896954
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210222539.2A Active CN103514221B (en) | 2012-06-28 | 2012-06-28 | A kind of web site resource management method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103514221B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001077886A1 (en) * | 2000-04-10 | 2001-10-18 | Blueskyfrog Pty Ltd | A method of filtering the contents of a virtual page |
CN1732459A (en) * | 2002-11-01 | 2006-02-08 | Lg电子株式会社 | Web content transcoding system and method for small display device |
US20070143283A1 (en) * | 2005-12-09 | 2007-06-21 | Stephan Spencer | Method of optimizing search engine rankings through a proxy website |
CN101097578A (en) * | 2007-06-07 | 2008-01-02 | 北京金山软件有限公司 | Network resource searching method and system |
US20080275877A1 (en) * | 2007-05-04 | 2008-11-06 | International Business Machines Corporation | Method and system for variable keyword processing based on content dates on a web page |
CN101887422A (en) * | 2009-05-13 | 2010-11-17 | 北京博越世纪科技有限公司 | Technique for keeping synchronous update of data of web site and wap site |
-
2012
- 2012-06-28 CN CN201210222539.2A patent/CN103514221B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001077886A1 (en) * | 2000-04-10 | 2001-10-18 | Blueskyfrog Pty Ltd | A method of filtering the contents of a virtual page |
CN1732459A (en) * | 2002-11-01 | 2006-02-08 | Lg电子株式会社 | Web content transcoding system and method for small display device |
US20070143283A1 (en) * | 2005-12-09 | 2007-06-21 | Stephan Spencer | Method of optimizing search engine rankings through a proxy website |
US20080275877A1 (en) * | 2007-05-04 | 2008-11-06 | International Business Machines Corporation | Method and system for variable keyword processing based on content dates on a web page |
CN101097578A (en) * | 2007-06-07 | 2008-01-02 | 北京金山软件有限公司 | Network resource searching method and system |
CN101887422A (en) * | 2009-05-13 | 2010-11-17 | 北京博越世纪科技有限公司 | Technique for keeping synchronous update of data of web site and wap site |
Also Published As
Publication number | Publication date |
---|---|
CN103514221B (en) | 2016-12-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101872564B1 (en) | Borderless table detection engine | |
CN104572849A (en) | Automatic standardized filing method based on text semantic mining | |
JP5587989B2 (en) | Providing patent maps by viewpoint | |
CN105468744A (en) | Big data platform for realizing tax public opinion analysis and full text retrieval | |
CN105589936A (en) | Data query method and system | |
CN103324622A (en) | Method and device for automatic generating of front page abstract | |
MX2011005771A (en) | Method and device for intercepting spam. | |
CN105912645A (en) | Intelligent question and answer method and apparatus | |
CN103425770A (en) | Event multi-dimensional information display device and method | |
CN103500158A (en) | Method and device for annotating electronic document | |
CN104866527A (en) | Dynamic webpage template matching method and device | |
CN105138538A (en) | Cross-domain knowledge discovery-oriented topic mining method | |
CN105808722A (en) | Information discrimination method and system | |
KR20120047632A (en) | Context-aware apparatus and method thereof | |
CN104778238A (en) | Video saliency analysis method and video saliency analysis device | |
CN104216979A (en) | Chinese technology patent automatic classification system and method for patent classification by using system | |
CN103377225A (en) | Method and device for building knowledge base system | |
CN107391684A (en) | A kind of method and system for threatening information generation | |
CN106055641A (en) | Human-computer interaction method and device oriented to intelligent robot | |
CN104156430A (en) | Device and method for fast extracting Android mobile phone data | |
CN113342989A (en) | Knowledge graph construction method and device of patent data, storage medium and terminal | |
CN103455964A (en) | Case clue analyzing system and method based on case information | |
CN106326090A (en) | Method and device for realizing construction of test use case | |
CN103514221A (en) | Web site resource management method and device | |
CN106446293A (en) | Rapid geographic name census data base establishing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |