Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Based on the embodiment in the present invention, those of ordinary skills, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.
The embodiment of the present invention provides a kind of WEB information processing method, as shown in Figure 1, comprises the following steps:
101, obtain pending WEB information, described pending WEB packets of information is containing the target information from information resources corresponding to one or more URL(uniform resource locator).
In the present embodiment, pending WEB information can be that WEB information acquisition device directly gathers, and can be also that WEB information acquisition device receives the data that third party sends.Pending WEB information can be news information, the forum information etc. from one or more website.
102, described pending WEB information is sorted according to the first default rule.
Getting after pending WEB information, according to the first default rule, pending WEB information is sorted, the first rule can be set as required, such as the visit capacity sequence of the WEB information according to pending, or according to the issue sequencing sequence of pending WEB information, or according to the rate of change sequence of the visit capacity of pending WEB information etc.
The WEB information processing method that the present embodiment provides, sorts the WEB information from information resources corresponding to one or more URL(uniform resource locator) getting according to default rule.Make user can obtain the WEB information from one or more web service, and described WEB sequence of information from one or more webservice is to sequence according to default ordering rule simultaneously.
The embodiment of the present invention provides another kind of WEB information processing method, as shown in Figure 2, comprises the following steps:
201, obtain pending WEB information, adopt the mode of unit's search to obtain target information from resource corresponding to default one or more URL(uniform resource locator), using the target information getting as pending WEB information, pending WEB packets of information is containing the target information from information resources corresponding to one or more URL(uniform resource locator).
In present embodiment, can adopt metasearch system to obtain target information from resource corresponding to default one or more URL(uniform resource locator).Target information can be the information that the various hope such as news information, forum information is obtained.Resource corresponding to one or more URL(uniform resource locator) can be the various information resources in website.
Take the disposal route of obtaining news information from predefined one or more website as example, the WEB information processing method that the embodiment of the present invention provides is described below:
First webService Data Source interface and the list of websites of the search access of configuration unit, then start Internet news monitoring thread timing scan Data Source, waits for the Internet news information that receives.As shown in Figure 3, in the present embodiment, step 201 can comprise the following steps:
301, whether query aim information monitoring thread gets target information.
Internet news information monitoring thread timing scan news information Data Source interface, for example, every T minute run-down data source, T can arrange as required.Whether inquiry gets news information, in the time that Internet news information monitoring thread gets news information, and execution step 302; Otherwise, execution step 301.
302, read the target information getting.In the present embodiment, reading out data carrys out the news information that source interface provides.
303, judge according to three sigma rule whether the target information getting is the target information within the scope of seniority among brothers and sisters.
Three sigma rule can be the rule of elimination presetting, and such as presetting, the file of the forms such as the picture information getting, audio data, video data is eliminated.
In the present embodiment, whether the news information that judgement gets meets three sigma rule, i.e. filtering rule, and filtering rule can be one group of channel information that user sets in advance.As long as the news information getting is the information in this group channel, do not need to participate in seniority among brothers and sisters, abandon this information.If the news information getting is not the information in this group channel, enter step 304.
Can make like this seniority among brothers and sisters have more specific aim, also more accurate.
304, in the time that the described target information getting is the target information within the scope of seniority among brothers and sisters, upgrade the property value of described target information, and the target information getting described in storage.
In the present embodiment, the attribute of news information can comprise: title, url (URL(uniform resource locator)), today visit capacity, yesterday synchronization access amount, summary, seniority among brothers and sisters order, similar document mark, date, author, source, the attributes such as visit capacity rate of change.
From database, read the historical data of this news information according to the value of attribute url, and according to the property value of this news information in historical data, upgrade the property value such as visit capacity yesterday of this news information.
The data of this news information getting according to Internet news information monitoring thread, upgrade summary property value, the similar document property value etc. of this news information.
305, the news information getting is sorted according to preset rules.
Preset rules in this step can be the order such as visit capacity, the issuing time of news information of news information.
202, from described pending information, obtain effective target information according to default Second Rule.
Second Rule can be set as required, such as, can preset and will be from url address
http: // 0001, http: // 00012 He
http: // 0003the target information getting filters out.
In the present embodiment, according to the rule of elimination setting in advance, the Internet news information receiving is filtered, the invalid data belonging in the rule of elimination setting in advance is abandoned, retain effective news information.
As shown in Figure 4, in the present embodiment, step 202 can comprise the following steps:
401, from pending information, read described target information, and convert URL(uniform resource locator) corresponding described target information to predetermined format.
In the present embodiment, set the standard format of url property value of news information, such as the standard format of url being set as to http: // form, read news information and convert the url property value of news information to predetermined format.
402, judge whether the URL(uniform resource locator) after conversion matches with described Second Rule.When URL(uniform resource locator) after changing and described Second Rule match, this target information is abandoned, flow process finishes.When URL(uniform resource locator) after changing and described Second Rule do not match, carry out 403.
In the present embodiment, the url property value that judges the news information after conversion whether with default Second Rule in one or more url form match.If one or more url form in the url property value of the news information after conversion and default Second Rule matches, perform step 403, abandon this news information; Otherwise carry out 401.
If 403 these target informations are effective news information, obtain and store this news information.
This step filters out invalid news information, make obtain news information rank results more targetedly, more accurate.
203, upgrade the property value of the effective target information of being obtained by step 201 and 202, and store target information.As a kind of embodiment of this step, this step can comprise:
Search the property value that target information need to be upgraded, and upgrade the property value that target information need to be upgraded.
In the present embodiment, search the attribute that the news information obtained need to be more capable.If news information is the news information having existed in database, upgrade the property values such as visit capacity yesterday, summary, similar document mark of this news information, if news information is unexistent news information in database, upgrade the similar document identity property value of this news information.
The news information of having upgraded is uploaded in database.
204, for the sequence that makes pending WEB information more targetedly, more accurate, the target information repeating part that in pending WEB information, content is identical is deleted, and then pending WEB information is sorted according to the first default rule.
As shown in Figure 5, in the present embodiment, step 204 can comprise following steps:
501, from pending information, search and obtain the target information that content is identical.In the present embodiment, from pending news information, search and obtain the news information that content is identical.
502, the target information identical content getting is merged to the objective information that becomes.In the present embodiment, the news information identical content getting is merged to the news item information that becomes.Also can be by a reservation in news information identical content, other deletion.
503, pending information is sorted according to the first default rule.In the present embodiment, after the duplicate contents of having deleted in the news information getting, by pending news information according to the first predetermined rule compositor.
The first rule in this step can be set as required, such as can be that number according to the visit capacity of news information sorts, also can sort according to the issuing time the earliest of news information, can also sort according to the rate of change of the visit capacity of news information.
205, the news information that sequences order is uploaded in database.
206, show according to different sortords the news information that sequences order.
In order to meet the needs of different user, the news information having sorted is shown according to certain sortord, such as can be the fastest according to rising the same day, the same day mode such as wall scroll seniority among brothers and sisters rank and show.
WEB information processing method and device that the embodiment of the present invention provides, the news information getting from one or more website is ranked, by sorting according to default rule from news information corresponding to one or more URL(uniform resource locator) of getting.Make user can obtain the news information from one or more webservice, and the order of the described news information from one or more web service is to sequence according to default ordering rule simultaneously.
WEB information processing method described in the present embodiment, collection network news information as required, and its content is analyzed, and the each data item after analyzing is uploaded in database.Adopt WEB information processing method of the present invention, the news information independently showing in each website is concentrated and collected together, and it is analyzed and is processed, sort according to up-to-date, the hottest ordering rule, solve user's news in specified scope website and carry out the problem of comprehensive seniority among brothers and sisters, contribute to find in time current hot news.
The embodiment of the present invention provides a kind of WEB signal conditioning package, as shown in Figure 6, comprising: the first acquiring unit 61, sequencing unit 62.
Wherein, described the first acquiring unit 61 obtains pending information, and pending packets of information is containing the target information from information resources corresponding to one or more URL(uniform resource locator); Described sequencing unit 62 sorts pending information according to the first default rule.
As an embodiment of the present embodiment, in the present embodiment, pending WEB information can be that WEB information acquisition device directly gathers, and can be also that WEB information acquisition device receives the data that third party sends.Pending WEB information can be news information, the forum information etc. from one or more website.
Getting after pending WEB information, according to the first default rule, pending WEB information is sorted, the first rule can be set as required, such as the visit capacity sequence of the WEB information according to pending, or according to the issue sequencing sequence of pending WEB information, or according to the rate of change sequence of the visit capacity of pending WEB information etc.
The WEB signal conditioning package that the present embodiment provides, sorts the WEB information from information resources corresponding to one or more URL(uniform resource locator) getting according to default rule.Make user can obtain the WEB information from one or more web service, and described WEB sequence of information from one or more webservice is to sequence according to default ordering rule simultaneously.
The embodiment of the present invention provides another kind of WEB signal conditioning package, as shown in Figure 7, comprising: the first acquiring unit 71, second acquisition unit 72, updating block 73, sequencing unit 74.
Wherein, described the first acquiring unit 71 comprises: the first enquiry module 711, read module 712, judge module 713, the first update module 714.
Described second acquisition unit 72 comprises: converting unit 721, judging unit 722, delete cells 723.
Described updating block 73 comprises: the second enquiry module 731, the second update module 732.
Described sequencing unit 74 comprises: acquisition module 741, merging module 742, order module 743.
Described the first acquiring unit 71 adopts the mode of unit's search to obtain target information from resource corresponding to default one or more URL(uniform resource locator).Be specially: whether described the first enquiry module 711 query aim information monitoring threads get described target information; In the time that described target information monitoring thread gets described target information, described read module 712 reads the target information getting; Whether the target information getting described in described judge module 713 judges according to three sigma rule is the target information within the scope of seniority among brothers and sisters; In the time that the described target information getting is the target information within the scope of seniority among brothers and sisters, described the first update module is upgraded the property value of described target information, and the target information getting described in storage.
So that the news information of obtaining from one or more website is ranked as example, in present embodiment, the first acquiring unit 71 can adopt metasearch system to obtain target information from resource corresponding to default one or more URL(uniform resource locator).Target information can be the information that the various hope such as news information, forum information is obtained.Resource corresponding to one or more URL(uniform resource locator) can be the various information resources in website.
Three sigma rule can be the rule of elimination presetting, and such as presetting, the file of the forms such as the picture information getting, audio data, video data is eliminated.
In the present embodiment, whether the news information that judgement gets meets three sigma rule, i.e. filtering rule, and filtering rule can be one group of channel information that user sets in advance.As long as the news information getting is the information in this group channel, do not need to participate in seniority among brothers and sisters, abandon this information.
Described second acquisition unit 72 obtains effective target information according to default Second Rule from described pending information.Comprise: described converting unit 721 reads described target information from described pending information, and converts URL(uniform resource locator) corresponding described target information to predetermined format; Described judging unit 722 judges whether the URL(uniform resource locator) after described conversion matches with described Second Rule; Described delete cells 723, in the time that the URL(uniform resource locator) after described conversion and described Second Rule match, is deleted described target information.
Second Rule can be set as required, such as, can preset and will be from url address
http: // 00011, http: // 00012 He
http: // 00013the target information getting filters out.
In the present embodiment, the url property value that judges the news information after conversion whether with default Second Rule in one or more url form match.If one or more url form in the url property value of the news information after conversion and default Second Rule matches, abandon this news information.
Described updating block 73 upgrades the property value of described target information.Comprise: described the second enquiry module 731 is searched the property value that described target information need to be upgraded; Described the second update module 732 is upgraded the property value that described target information need to be upgraded.
So that the news information of obtaining from one or more website is ranked as example, in the present embodiment, the attribute of news information can comprise: title, url (URL(uniform resource locator)), today visit capacity, yesterday synchronization access amount, summary, seniority among brothers and sisters order, similar document mark, date, author, source, the attributes such as visit capacity rate of change.
From database, read the historical data of this news information according to the value of attribute url, and according to the property value of this news information in historical data, upgrade the property value such as visit capacity yesterday of this news information.
The data of this news information getting according to Internet news information monitoring thread, upgrade summary property value, the similar document property value etc. of this news information.
Described sequencing unit 74 sorts pending information according to the first default rule.Comprise: the target information that content is identical is searched and obtained to described acquisition module 741 from described pending information; The target information identical content getting is merged the objective information that becomes by described merging module 742; Described order module 743 sorts described pending information according to the first default rule.
So that the news information of obtaining from one or more website is ranked as example, the first rule can be set as required, such as can be that number according to the visit capacity of news information sorts, also can sort according to the issuing time the earliest of news information, can also sort according to the rate of change of the visit capacity of news information.
The WEB signal conditioning package that the present embodiment provides, sorts the WEB information from information resources corresponding to one or more URL(uniform resource locator) getting according to default rule.Make user can obtain the WEB information from one or more web service, and described WEB sequence of information from one or more webservice is to sequence according to default ordering rule simultaneously.
Through the above description of the embodiments, those skilled in the art can be well understood to the mode that the present invention can add essential common hardware by software and realize, and can certainly pass through hardware, but in a lot of situation, the former is better embodiment.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product is stored in the storage medium can read, as the floppy disk of computing machine, hard disk or CD etc., comprise that some instructions are in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) carry out the method described in each embodiment of the present invention.
The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited to this, any be familiar with those skilled in the art the present invention disclose technical scope in; can expect easily changing or replacing, within all should being encompassed in protection scope of the present invention.Therefore, protection scope of the present invention should described be as the criterion with the protection domain of claim.