CN102298621A - System for obtaining page user focus degree PageFocus by method for aggregating and displaying same source information search engine based on focus degree - Google Patents

System for obtaining page user focus degree PageFocus by method for aggregating and displaying same source information search engine based on focus degree Download PDF

Info

Publication number
CN102298621A
CN102298621A CN 201110228853 CN201110228853A CN102298621A CN 102298621 A CN102298621 A CN 102298621A CN 201110228853 CN201110228853 CN 201110228853 CN 201110228853 A CN201110228853 A CN 201110228853A CN 102298621 A CN102298621 A CN 102298621A
Authority
CN
China
Prior art keywords
pagefocus
webpage
search
content
browser
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201110228853
Other languages
Chinese (zh)
Other versions
CN102298621B (en
Inventor
王东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN 201110228853 priority Critical patent/CN102298621B/en
Publication of CN102298621A publication Critical patent/CN102298621A/en
Application granted granted Critical
Publication of CN102298621B publication Critical patent/CN102298621B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to a method for aggregating and displaying the same source information search engine based on a focus degree and a system. The method comprises the following steps that a search engine finds all of target websites which are matched with the condition as original search results; the original search results are aggregated into a title search result according to the essential elements of content quality, displayed account information of a weighed power purchaser, service quality, and the like; and only the title research result is used as the research result to display to an inquirer, and the full research results are only unfolded to the inquirer as required. The system adopts a statistic server to match with the network browser to convert all of the operations of the user into a focus degree score value PageFocus to the page and to send the focus degree score value back to the statistic server to represent the content quality, so that the system can be used as a method for the search engine for selecting 'title search result' and performing result display ranking. The invention further relates to a method capable of automatically identifying user state and providing proper page style and content.

Description

The system that obtains web page user attention rate PageFocus based on the homologous information search engine aggregation display method of attention rate
Technical field
The present invention relates to computer networking technology, particularly utilize computing machine in the internet or enterprises the search engine technique of search service is provided on the net.The invention still further relates to a kind of system and web site contents style self-reacting device and method of obtaining the web page user attention rate.
Background technology
On Internet, exist at present a large amount of " webpage or the network service in identical (or similar) source ", for example: 1 by the writing of same individual or entity by the article of massive duplication, viewpoint, Intelligence Page; 2 by same individual or entity interview (or issue) by the news report webpage of massive duplication; 3 are pasted by same individual or the commentaries on classics that is organized in BBS forum speech model; 5 different data formats, the multimedia file of compression factor by the generation of same individual or entity; 6 executable program, data, design documents by the generation of same individual or entity; 7 other modes information content that produce and that extensively duplicated.These " webpage or the network services in identical (or similar) source " are enumerated in present search engine search results one by one, occupy a large amount of lengths, and content is identical, and inconvenient inquiry browses.
Present various search engine and webpage seniority among brothers and sisters service system, all only adopted click traffic and the mode of the webpage residence time to weigh the popular degree of webpage, and the method for taking is main: 1) search engine class: rely on the inquiry that the popular degree of webpage, for example ***, Baidu are calculated in the click of Search Results.2) ALEXA website seniority among brothers and sisters class: rely on the toolbar software that is embedded on the browser, the user is sent it back server (parameter comprises current web page address, page open time) to the click and webpage residence time of hyperlink, but do not comprise other appraisal procedures.The Alexa principle of work can referring to:
http://www.singtaonet.com/it/it_sp/t20051110_43674.html
http://www.people.com.cn/GB/it/8219/41552/41597/3109586.html
Present various website can be divided into following classification:
Classification one: all web site contents (for example: news website) all have same style and content to Any user at synchronization.
Classification two: can (for example: the news website of ***) show different styles and content according to user's setting.
But these websites can not provide different display styles and content at real-time different conditions according to the user.
Summary of the invention
In order to improve the deficiency of the problems referred to above, the invention provides a kind of like this searching method, it can be aggregating into a record because of the identical Search Results that the searchers is had identical use value of content, be the title search result, launch the apparatus and method check other results as required again, thereby avoid " title search result " clickedly to cause that the destination server visit capacity is excessive paralyse, " title search result " click is distributed to apparatus and method on other Search Results targets automatically owing to frequent.The present invention also provides a kind of like this system, the web browser that its utilization can cooperate with the statistical server on the network, whole operation behaviors of user are converted into scoring to this webpage, and send it back statistical server, as scoring, thereby can be used as the arrangement method and the instrument of search engine to the degree of concern of webpage.The present invention also provides a kind of like this method: utilize the various information of can be obtainable, helping to judge user's environment of living in and state, in synchronization, same website in addition the time the same page in, provide different display styles and contents to the user of different conditions.
To achieve these goals, a kind of searching method that the polymerization of homologous information site search engine is shown, it comprises the following steps:
(1) inquiry passes through Web browser or accessible with application software search engine, and input needs the keyword of inquiry;
(2) find whole qualified targeted sites as original searching results by search engine;
(3) the power buyer's who " becomes the title search result " by " homologous information processing module " inquiry accounts information, and in original searching results, choose the object that is used as " title search result " in conjunction with other judgment rules;
(4) " the title search result " that only will be chosen by search engine Web server or application server shows the inquiry as Search Results, and has " the button of " details or other information are checked in expansion " implication for it provides one;
(5) inquiry also can press corresponding with it " button ", and search engine is illustrated in the original searching results that finds in (2) to it again.
" homologous information processing module " has a plurality of " (the corresponding information kind) homologous information processing module " to form, for example: " with the source web page processing module ", " homology multimedia processing module ", " homology picture processing module ", " homology document process module ", " homology software processing module ", " with source data or database processing module ", " homology GIS message processing module ", " with the value network service processing module ", " with being worth the business information processing module " etc.
Described " homologous information processing module " comprises the steps:
(1) information of at first by " information category judge module " the web search device being received is carried out the kind judgement;
(2) with concentrated send to " (the corresponding information kind) the homologous information processing module " of the information of identical type;
(3) will enter " non-homogeneous (the corresponding information kind) object information storehouse " or " homology (the corresponding information kind) object information storehouse " by the search information filing after " (the corresponding information kind) homologous information processing module " processing.
(4) by system " non-homogeneous (the corresponding information kind) object information storehouse " and " homology (the corresponding information kind) object information storehouse " is published on the Web server, for inquiry's inquiry.As implementation method in another, also can directly provide inquiry service according to these two databases based on dynamic web page to the inquiry.
Described step by " with the source web page processing module " processing info web is as follows:
(1) when " search engine searches part " receives the keyword that needs inquiry, at first judge by " Search Results has been distributed on the decision device on the Web server " whether this keyword was inquired about by other people in the recent period, if inquired about, and the result goes up issue at " search engine search results Web server ", then directly return Search Results, the webpage that will have identical source among this result aggregates into a Search Results, after clicking " same source web page " button, can on " search engine search results Web server ", see the search result web page that another comprises whole Search Results, finish whole query script;
(2) if when " search engine searches part " receives the keyword that needs inquiry, judge that by " Search Results has been distributed on the decision device on the Web server " this keyword do not inquired about by other people in the recent period, and also do not have corresponding Query Result to go up issue then at " search engine search results Web server ":
A. start " Webpage search device " search " non-homogeneous web results database " and " homology web results database " and find the web page address that meets searching key word, and obtain the content of these webpages;
If B. " Webpage search device " do not find the web page address that meets searching key word in " non-homogeneous web results database " and " homology web results database ", then return the result that the inquiry " does not have eligible webpage ", and this searching key word is joined next round to be upgraded in the task of " non-homogeneous web results database " and " homology web results database ", select into " non-homogeneous web results database " or " homology web results database " if in renewal process, found qualified web page address then whether had with source web page according to it, if so again the someone to search for same keyword be just can find the result;
(3) by " web page contents separation vessel " web page contents and the hyperlink target that finds resolved into: kinds such as multimedia, picture, literal, hyperlink;
(4) produce court verdict by various content decision devices respectively:
A. produce target web contained " identical multimedia file degree SMS (Same Media Score) " by " content of multimedia decision device ";
B. produce target web contained " the degree SPS of identical picture (Same Photo Score) " by " image content decision device ";
C. produce target web contained " the degree STS of same text (Same Text Score) " by " word content decision device ";
D. produce target web contained " the degree SHS of identical super connection (Same Hyperlinks Score) " by " linked contents decision device ";
(5) obtain " multimedia judgement weight SMP ", " picture is adjudicated weight SPP ", " literal judgement weight STP ", " link judgement weight SHP " from " with source web page decision rule storehouse " respectively and go on foot " identical multimedia file degree SMS ", " the degree SPS of identical picture ", " the degree STS of same text ", " the degree SHS of identical super connection " the doing mathematics multiplication that generates with (4) respectively;
(6) the mathematics multiplication result that (5) step was obtained is done addition, obtains " the homology degree SSS (Same of webpage
Sourc Score) ", homology degree SSS=(SMS*SMP)+(SPS*SPP)+(STS*STP)+(SHS*SHP);
(7) whether " the homology degree SSS " that judges this webpage exceeds thresholding, if exceed thresholding then be judged to be " same source web page " with other webpage, if do not exceed thresholding then be judged to be " non-homogeneous webpage ";
(8) " the non-homogeneous webpage " that (7) step was produced gone into " non-homogeneous web results database " by " non-homogeneous webpage processing module "; " same source web page " that (7) step produced gone into " homology web results database " by " with the source web page processing module ";
(9) dynamically generate the static Web page of Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to inquiring user by browser again;
(10) as the another kind of implementation method in (9) step, also can directly present to inquiring user by " dynamic web page Web server " by browser.
Describedly also can comprise the steps: by " homologous information processing module "
(1) receiving inquiry's searching key word, and judging file or the network service that needs are searched according to key words content and keyword grammer by software;
(2) judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results, will meet search condition among this result and have the file in identical source or the inlet that obtains of network service aggregates into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since (3) step;
(3) return the prompting that the inquiry " does not have qualified result ";
(4) this searching key word is joined next round and upgrade in the task of " homologous information index data base " and " non-homogeneous information index database ", and regularly start the renewal process of two databases;
(5) renewal process of " homologous information index data base " and " non-homogeneous information index database ":
A. by emerging file destination of searcher search and webpage or service entrance, enter this inlet by software and obtain this document or network service;
B. by " content decision device " judge new-found information " belonging to same content? " with the content of current " homologous information index data base " if "Yes" then it is included into this classification of " homologous information index data base " as a new element; If "No" then judge that by " content decision device " content of its " with current non-homogeneous information index database " belongs to same content? "
If C. "Yes" then: " for current information and with it homology and be stored in information in ' non-homogeneous information index database ', a newly-built classification is also all transferred to ' homologous information index data base ' ";
If D. "No" then: " be the current newly-built classification of information, and deposit in ' non-homogeneous information index database ' ";
(6) dynamically generate the static Web page of Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to the inquiry who comes to search for by browser again;
(7) as the another kind of implementation method in (6) step, also can directly present to inquiring user by " dynamic web page Web server " by browser.
Described when handling document by the homologous information processing module, the renewal process of " homologous information index data base " and " non-homogeneous information index database " is:
A. by emerging document files of " document searching device " search and webpage or link inlet, enter this inlet by software and obtain this document or service;
B. by " word content decision device " and " image content decision device " judge new-found document content " belonging to same content? " with the content of current ' homology document index database ' if "Yes" then it is included into this classification of " homology document index database " as a new element; If "No" then judge that by " document content decision device " content of its " with current non-homogeneous document index database " belongs to same content? "
If C. "Yes" then: " for current document and with it homology and be stored in document in ' non-homogeneous document index database ', a newly-built classification is also all transferred to ' homology document index database ' "; If "No" then " be the current newly-built classification of document, and deposit in ' non-homogeneous document index database ' ";
Described related content decision device module comprises the steps:
(1) receives " being judged object ": can receive the multimedia in a plurality of sources, and record is judged the quantity I nputQuantity of object;
(2) search " being judged object " set attribute that participates in comparing, write down the quantity SameQuantity that current attribute has identical value " being judged object ";
(3) " weight " value Power of the current attribute of input in deterministic process;
(4) calculate by whole " being judged object " goodness of fit on current attribute: PSame=SameQuantity*Power;
(5) return (1) next " attribute " carried out (1)~(4), obtain the PSame of this attribute, until the PSame value that obtains subordinate's property;
(6) calculate and return the identical content degree value of " being judged object ": SameMediaPower=(all mathematics accumulated values of Psame value)/InputQuantity.
When content decision device module was the word content decision device, it comprised the steps:
(1) finds out the total length value SameLenth of the part that has identical word or sentence in the word content;
(2) find out in a plurality of word contents of input the length value MinLenth of the input characters that length is the shortest;
(3) return literal similarity degree value SameTextPower=SameLenth/MinLenth.
When content decision device module was the linked contents decision device, it comprised the steps:
(1) receives " being judged object ": the URL address of a plurality of hyperlinks;
(2) the target URL number of addresses that on estimative each hyperlink page pointed, all occurred of statistics " being judged object " similarity degree: SameURLPower=;
(3) return SameURLPower.
When content decision device module was business information content decision device, it comprised the steps:
(1) comparison participates in whether the business information of comparison is identical product or service, if " not being " returns " inconsistent ", if "Yes" entered for (2) step.
(2) whether the business information that judge to participate in comparison has geographic position susceptibility, if " not being " returns judged result " unanimity ", if "Yes" then carried out for (3) step.
(3) whether the supplier of the business information of judgement participation comparison is in identical city or zone, if " not being " returns judged result " inconsistent ", if return judged result " unanimity ".
The specific implementation method that " title search result " selects is as follows:
(1) calculate the probability weights PWn that each " homology Search Results " becomes " title search result ":
PWn=TP*PageFocus/(RespDelay-K)
N: this Search Results is the n bar
When (RespDelay-K) smaller or equal to zero the time, (RespDelay-K) answering value is 1
PageFocus: webpage attention rate value
RespDelay: web service operating lag
K: the service response constant, suggestion K is set to 50 milliseconds (ms).
TP: title search is power as a result
(2) summation of the probability weights PWn of statistics summation all original " homology Search Results ": the whole probability weights of PWall;
(3) calculate the probability that every " homology Search Results " becomes " title search result ": Pn=PWn/Pwall;
(4) according to the probability of Pn value,, dynamically select at random " title search result ", present to the searchers along with searchers's visit action.
The computing method of the probability weights PWn of described " title search result " can also be:
A.PWn=(TP+PageFocus)/(RespDelay-K) or,
B.PWn=(TP+PageFocus)/RespDelay/K or,
c.PWn=TP*PageFocus/RespDelay/K。
Described " homologous information processing module ":
A. can be embedded in the search engine;
B. can be placed between " search engine " and " search engine search results Web server ";
C. also can be used as pretreatment module is placed between " search engine " and the searched website.
Described expansion checks that the button of details or other information implications can be super connection or various software interface control.
A kind of system that obtains web page user Search Results attention rate comprises the PageFocus webserver, PageFocus web browser and webpage score server,
The PageFocus webserver comprises PageFocus browser ID registrar, the concern of PageFocusAccServer webpage statistical server, PageFocus browser online upgrading server and data encrypting and deciphering module;
The PageFocus web browser comprises PageFocus browser ID Registering modules, pays close attention to score value PageFocus computing module.
Its job step is as follows:
(1) " PageFocus web browser ", each browser all possesses globally unique ID identification number when mounted, or initiatively seeks " PageFocus browser ID registrar " on the network in use to obtain globally unique ID identification number;
(2) " PageFocus web browser " possesses and has the general networks browser, and the user converted to " paying close attention to score value PageFocus " of webpage and form " PageFocus packet " according to weight to the operation of browser with to the operation of webpage, be passed to " the PageFocusAccServer webpage is paid close attention to statistical server " of this search engine by procotol with cipher mode;
(3) " PageFocusAccServer webpage pay close attention to statistical server " " paying close attention to score value PageFocus " of after " PageFocus packet " that each " PageFocus web browser " of receiving the whole world sent its inside being comprised is added on the corresponding webpage;
(4) " paying close attention to score value PageFocus " of each webpage of the whole world that comprises on " PageFocusAccServer webpage pay close attention to statistical server ", these information can form by various disposal routes: search engine is selected to can be used as the foundation of " title search result ", also can directly be announced out the service of conduct " webpage hot topic degree ranking list " according to, search engine the webpage seniority among brothers and sisters in having the identical content Search Results.
Described PageFocusAccServer webpage is paid close attention to statistical server can adopt mathematics logarithm or scientific notation record score.
Described PageFocus packet can form when browser thoroughly cuts out this webpage, also can regularly form, and forms in the time of also can being accumulated to certain score value again.
Described concern score value PageFocus forms according to the listed weight of following table:
Figure BSA00000554541100081
Figure BSA00000554541100091
Note:
Weighted value in 1 form is embodiment, and other numerical value also can adopt, and is scope of the present invention.
The calculation procedure of described word read speed is as follows:
A. mouse roller rolls: the each literal line number of rolling of word read speed=(viewing area width/set width) */rolling time at interval;
B. keyboard page turning: the literal line number/page turning time interval of word read speed=(viewing area width/set width) each page turning of *;
C. the forms scroll bar rolls: the each literal line number of rolling of word read speed=(viewing area width/set width) */rolling time at interval.
Described PageFocus packet comprises PageFocus browser ID, webpage URL and webpage PageFocus score value field.
Each webpage that possesses " same source web page " is in the page rank process that the participation search engine provides, can use the foundation of the summation of user's attention rate PageFocus score value that each " same source web page " obtain as rank, that is: A can adopt the summation of user's attention rate PageFocus that each " same source web page " obtain as the rank foundation when participating in the search-engine results rank in " the title search result " of " same source web page "; Each webpage in the B " same source web page " also can adopt the summation of user's attention rate PageFocus that each webpage of " the same source web page " of its subordinate obtains as the rank foundation when participating in the search-engine results rank.
A kind of automatic judgement User Status also provides appropriate web page style and the method for content, and it comprises the steps:
(1) after " Website server cluster inlet " receives that the user visits the request of this website webpage first, at first in the access protocal or the IP layer protocol in obtain its IP address;
(2) inquiring about its IP address according to the IP address in " IP address properties database " is " IP address, workplace " or " the IP address of individual or leisure occasion ", if " IP address, workplace " then carried out for (3) step, if then carried out for (4) step " the IP address of individual or leisure occasion ";
(3) obtain " IP address, workplace " residing geographic position, and obtain administrative time of this geographic area, if this IP address affiliated area is in the working time, then its visit is assigned to " work style server " page service that provides suitable workplace to use to it is provided, otherwise carried out for (4) step;
(4) then its visit is assigned to " individual and leisure style server " page service that provides suitable individual and leisure state to use to it is provided.
By such scheme, can be identical and the Search Results that the searchers has identical use value is aggregated into a record content, promptly the title search result launches the apparatus and method of checking other results as required again.Designed and avoided " title search result " clickedly to cause that the destination server visit capacity is excessive paralyses, " title search result " click has been distributed to device on other Search Results targets automatically owing to frequent.The present invention is except possessing existing search engine, the various network services that also possesses search various " multimedias ", " document ", " software ", " hardware and software source code or design document ", " data or database ", " information ", the function of for example file-sharing, FTP service, P2P service etc.
The web browser that utilization can cooperate with the statistical server on the network, whole operation behaviors of user are converted into scoring to this webpage, and send it back statistical server, as scoring, thereby can be used as the rank instrument of search engine to the degree of concern of webpage.
By web site contents style adaptive approach, the user can:
1. 9:00~18:00 in morning of 1~5 belongs to the working time week, and in running order people need see succinctly, rigorous relatively style and as far as possible and the duty related content.
2. week 1~5 18:00 in the evening~morning 9:00 and the whole day in week 6~7 belong to leisure time, and the people who is in the leisure state need see the style and the content of ripple alive, lively, leisure.
3. be in that people from workplace need see succinctly, rigorous relatively style and as far as possible and the duty related content.
4. the people who is in family and leisure place need see ripple alive, the style and the content of livening up, lying fallow.
5. the people who is in other environment or state need see with at that time environment and state adapt style and content.
Brief Description Of Drawings
Fig. 1 is the system works structural drawing of homologous information site search engine aggregation display method;
Fig. 2 is a homologous information processing module cut-away view;
Fig. 3 is with source web page processing module process flow diagram;
Fig. 4 is a homology multimedia processing module process flow diagram;
Fig. 5 is a homology picture processing module process flow diagram;
Fig. 6 is a homology document process module process flow diagram;
Fig. 7 is a homology software processing module process flow diagram;
Fig. 8 is with source data or database processing module process flow diagram;
Fig. 9 is a homology GIS message processing module process flow diagram;
Figure 10 is with value network service processing module process flow diagram;
Figure 11 is with being worth business information processing module process flow diagram;
Figure 12 is for obtaining web page user attention rate system construction drawing;
Figure 13 is not for possessing the existing routine search engine web station system of content and style adaptive technique;
Figure 14 for the present invention possess content and style adaptive technique the search engine web site system.
Embodiment
Now the present invention is described further in conjunction with the accompanying drawings.
Fig. 1 is the system works structural drawing of homologous information site search engine aggregation display method.The 1st step: pass through Web browser or accessible with application software search engine by the inquiry, and input needs the keyword of inquiry.The 2nd step: find whole qualified targeted sites as " original searching results " by search engine.The 3rd step:, and in " original searching results ", choose the object that is used as " title search result ": A " homologous information processing module " in conjunction with other judgment rules and can be embedded in the search engine by " homologous information processing module " inquiry power buyer's that " becomes the title search result " accounts information; " homologous information processing module " can be placed between " search engine " and " search engine search results Web server "; C " homologous information processing module " also can be used as pretreatment module and is placed between " search engine " and the searched website.The 4th step: " the title search result " that only will be chosen by search engine Web server or application server shows the inquiry as Search Results, and has " button (the comprising super connection or various software interface control) " of " details or other information are checked in expansion " implication for it provides one.The 5th step: have only the inquiry to wish further to launch certain bar " title search result ", and when pressing with it corresponding " button ", search engine is illustrated in " original searching results " that finds in " the 2nd step " to it again.
Fig. 2 is a homologous information processing module cut-away view." homologous information processing module " is defined as: be mainly used to 1) judge that whether a plurality of nodes are arranged in the one group of information node that finds according to searching key word is that (these websites have same search to the inquiry and are worth or use value one or more repetition websites with information source, usually needn't all directly represent) to the inquiry, and these are repeated websites aggregate into a Search Results and issue the inquiry, just these Search Results are presented when having only the inquiry to need the website of other equal values.2) mainly to concentrate on the search of webpage different with existing search engine, " homologous information processing module " is except needing to handle the various network services that can also handle various " multimedias ", " document ", " software ", " hardware and software source code or design document ", " data or database ", " information " " Html webpage ", for example: file-sharing, FTP service, P2P service etc.
" homologous information processing module " adopts modular construction, can progressively develop and implement each module wherein as required, and possess extended capability, and each module also can further be strengthened its accuracy of judging automatically simultaneously, comprising:
1 " information category judge module ": judge the kind of information, and information of the same type concentrated send to respective type information processing module, as following module.
2 " with the source web page processing modules ": be used for judging and handle belonging to same source and the inquiry being had the webpage of equal value of finding, for example: Html, ASP, JSP, PHP, the content of BBS forum etc.
3 " homology multimedia processing modules ": be used for judging and handling the same source of finding that belongs to, and the inquiry had the multimedia file or a network service of equal value, for example: .MP3, .AVI, .WMV .MPEG .WAV, .RM wait various video files, and various Video service access interface based on stream media technology.
4 " homology picture processing modules ": be used for judging and handle belonging to same source or having identical content of finding, and the inquiry is had the picture of equal value, for example: .GIF .JPG .BMP .PNG etc.
5 " homology document process modules ": be used for judging and handle belonging to same source, having identical or related content of finding, and the inquiry had the various format file files or a network service of equal value, for example: " .Doc ", " .Txt ", " .Pdf ", " .XLS ", " .PPT " etc.
6 " homology software processing module ": can judge and handle the same software that the computer application software installation procedure that finds belongs to same author that they can be to adapt to similar and different operating system, the software installation procedure of identical or different version.
7 " with source data or database processing modules ": be used for judging and handle belonging to same source or having identical content of finding, and the inquiry is had equal value, the data file of known format or database file, for example: .DAT, .XLS .MDF .DBF etc.
8 " homology GIS message processing modules ": be used for judging and handle belonging to same source or having identical content of finding, and the inquiry is had the numerical map file or the service of equal value.
9 " with the value network service processing module ": be used for judging and handle belonging to same source or having identical content of finding, and the inquiry had a network service of equal value, for example: the FTP download service of same file, relay the IPTV service of a TV station simultaneously, the mail service of 1GB capacity etc. is provided simultaneously.
10 " with being worth the business information processing modules ": be used for judging and handle belonging to same source or having identical content of finding, be in identical geography or administrative region, and the inquiry had equal value, by the commercial product of Web publishing oneself or the ad content of service, for example: the egg that provides in same block is sold information, the haircut that provides in same block service sale information is in the operable telephonic communication service in same city etc." information category judge module "
" information category judge module " is mainly used in the information of collecting, and sorts out its type, and delivers to corresponding message processing module.
The information source that " information category judge module " handled mainly contains 3 kinds of forms:
(1) form web page: information comes from the web page contents of website, also contains the hyperlink of pointing to particular file types in the webpage simultaneously, for example: " http://www.008.org.cn/up/the_quiet_american.mp3 "
(2) network service form: comprise the network service entrance that the various network services device provides, for example: the kind sub-services of ftp file download service, various P2P (Pear To Pear) software (for example: BT download, eMule download), NEWS SERVER service etc.For knowing of network service entrance two kinds of approach can be arranged:
A. the network service that can find on the webpage: the network service entrance that can know by the analyzing web page content.
B. directly submit its network service entrance or content to this search engine by Internet Service Provider.
(3) data or database form: directly provide information typing service to network by search engine, submit the information of oneself to by the network user, the final information that forms data file or database form, when this search engine was inquired about, therefrom inquiry's requirement was satisfied in information extraction.
The kind determination methods of " form web page " information is as follows:
Webpage itself just can directly be exported to " with the source web page processing module " as " webpage " and handle, in addition, " information category judge module " according to the webpage grammer (for example: Html, Java, JSP, ASP, ASPX, PHP or the like language) at the grammer of " hyperlink ", can directly parse the file type of its sensing, can distinguish its information type according to different file types, see following table for details:
For example:
1. contain in the webpage: " Http:// xxx/xxx/song.mp3" hyperlink, can judge that its target is " multimedia " type information.
2. contain in the webpage: " Http:// xxx/xxx/song.rar" hyperlink, decompress after finding this file destination, find that the inside only contains " song.mp3 " can judge that still target is " multimedia " type information.
3. contain in the webpage: " Http:// xxx/xxx/song.rar" hyperlink, decompress after finding this file destination, find that the title of file number, each file of file that the inside is contained and catalogue is all identical with the mounting disc of certain known software with size, can judge that it is " software " type information.
The kind determination methods of " network service form " information is as follows:
The 1st step: visit this service as domestic consumer, to obtain its content.
The 2nd step: the content that obtains is classified according to following table.
Figure BSA00000554541100141
The 3rd step:, then need to launch classifying according to the 2nd step after its content if acquisition is compressed format files.
The kind determination methods of " data or database form " information is as follows:
The 1st step: visit data file or database, to obtain its content.
The 2nd step: directly carry out " the 4th step " from data file or database if the information that obtains is file.
The 3rd step:, then need to visit this position from data file or database to obtain file destination if the information that obtains is the position of depositing file.
The 4th step: the content that obtains is classified according to following table.
Figure BSA00000554541100142
The 5th step:, then need to launch classifying according to 4 steps after its content if acquisition is compressed format files." with the source web page processing module "
Fig. 3 is " with the source web page processing module " process flow diagram." with the source web page processing module " major function: will find according to searching key word, webpage with identical main contents, represent to the inquiry with " title search result " form, and can see the Query Result of the webpage that all inquires by " expansion " implication button with identical main contents.For improving the serviceability of native system substantially, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", directly respond the searching requirement of having been inquired about, avoid generating from database a large amount of calculating of dynamic web page according to request dynamic.
" homologous information processing module " is placed in " non-homogeneous web results database " and " homology web results database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period.
" homologous information processing module " treatment scheme is as follows:
The 1st step: when " search engine searches part " receives the keyword that needs inquiry, at first judge by " Search Results has been distributed on the decision device on the Web server " whether this keyword was inquired about by other people in the recent period, if inquired about, and the result goes up issue at " search engine search results Web server ", then directly return Search Results (seeing figure " M1 " mark), the webpage that will have identical source among this result aggregates into a Search Results, after clicking " same source web page " button, can on " search engine search results Web server ", see the search result web page that another comprises whole Search Results, finish whole query script.
The 2nd step: if when " search engine searches part " receives the keyword that needs inquiry, judge that by " Search Results has been distributed on the decision device on the Web server " this keyword do not inquired about by other people in the recent period, and also do not have corresponding Query Result to go up issue then at " search engine search results Web server ":
Start " Webpage search device " search " non-homogeneous web results database " and " homology web results database " and find the web page address that meets searching key word, and obtain the content of these webpages.
If " Webpage search device " do not find the web page address that meets searching key word in " non-homogeneous web results database " and " homology web results database ", then return the result that the inquiry " does not have eligible webpage ", and this searching key word is joined next round to be upgraded in the task of " non-homogeneous web results database " and " homology web results database ", select into " non-homogeneous web results database " or " homology web results database " if in renewal process, found qualified web page address then whether had with source web page according to it, if so again the someone to search for same keyword be just can find the result.
The 3rd step: by " web page contents separation vessel " web page contents and the hyperlink target that finds resolved into: kinds such as multimedia, picture, literal, hyperlink.
The 4th step: produce court verdict by various content decision devices respectively
A. produce target web contained " identical multimedia file degree SMS " (Same Media Score) (multimedia definition comprises: the broadcast service or the file service of the broadcast service of Flash class, vedio/audio file or file service, IPTV/ direct broadcasting satellite/audio-video monitoring/real-time information such as performance/manual answering, other multimedia services) by " content of multimedia decision device ".
B. produce target web contained " the degree SPS of identical picture " (Same Photo Score) by " image content decision device ".
C. produce target web contained " the degree STS of same text " (Same Text Score) by " word content decision device ".
D. produce target web contained " the degree SHS of identical super connection " (Same Hyperlinks Score) by " linked contents decision device ".
The 5th step: obtain " multimedia judgement weight SMP ", " picture is adjudicated weight SPP ", " literal judgement weight STP ", " link judgement weight SHP " from " with source web page decision rule storehouse " respectively and go on foot " identical multimedia file degree SMS ", " the degree SPS of identical picture ", " the degree STS of same text ", " the degree SHS of identical super connection " the doing mathematics multiplication that generates with the 4th respectively.
The 6th step: the mathematics multiplication result that will obtain in " the 5th step " is done addition, obtains " homology degree SSS (the Same Sourc Score) " of webpage, homology degree SSS=(SMS*SMP)+(SPS*SPP)+(STS*STP)+(SHS*SHP)
The 7th step: whether " the homology degree SSS " that judge this webpage exceeds thresholding, if exceed thresholding then be judged to be " same source web page " with other webpage, if do not exceed thresholding then be judged to be " non-homogeneous webpage ".
The 8th step: " the non-homogeneous webpage " that will produce in " the 7th step " goes into " non-homogeneous web results database " by " non-homogeneous webpage processing module "; " the same source web page " that will produce in " the 7th step " gone into " homology web results database " by " with the source web page processing module ".
The 9th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to inquiring user by browser again.(seeing figure " M2 " mark).
As the another kind of implementation method in the 9th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).
" web page contents sorter " can be realized by software, direct basis " Html grammer ", " ASP/ASPX grammer ", and " PHP ", the syntax parsing that uses on the various webpages such as " JSP " goes out the type of each content.
" homology multimedia processing module "
Fig. 4 is " homology multimedia processing module " process flow diagram.For multimedia file that meets search condition or service, " homology multimedia processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", directly respond the searching requirement of having been inquired about, avoid generating from database a large amount of calculating of dynamic web page according to request dynamic.
" homologous information processing module " is placed in " non-homogeneous multimedia index database " and " homology multimedia index database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period.
" homology multimedia processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and what judge that needs look for according to key words content and keyword grammer by software is multimedia file or service (for example, contain in the keyword searching of " .MP3 " expression needs be .MP3 file rather than the webpage that contains this literal).
The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the multimedia interface that obtains that search condition has identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return the result that the inquiry " does not have eligible multimedia ".
The 4th step: this searching key word is joined next round upgrade in the task of " homology multimedia index database " and " non-homogeneous multimedia index database ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " homology multimedia index database " and " non-homogeneous multimedia index database ":
A. by emerging multimedia file of " multimedia search device " search and webpage or service entrance, enter this inlet by software and obtain this document or service.
B. by " content of multimedia decision device " judge new-found content of multimedia " belonging to same content? " with the content of current " homology multimedia index database " if "Yes" then it is included into this classification of " homology multimedia index database " as a new element; If "No" then judge that by " content of multimedia decision device " content of its " with current non-homogeneous multimedia index database " belongs to same content? "
If C. "Yes" then: " for current multimedia and with it homology and be stored in multimedia in ' non-homogeneous multimedia index database ', a newly-built classification is also all transferred to ' homology multimedia index database ' "; If "No" then " be the current newly-built classification of multimedia, and deposit in ' non-homogeneous multimedia index database ' ";
The 6th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to the inquiry's (seeing figure " M2 " mark) who comes to search for by browser again.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).
" homology picture processing module "
Fig. 5 is a homology picture processing module process flow diagram.For picture file that meets search condition or link, " homology picture processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", directly respond the searching requirement of having been inquired about, avoid generating from database a large amount of calculating of dynamic web page according to request dynamic.
" homologous information processing module " is placed in " non-homogeneous picture indices database " and " homology picture indices database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." homology picture processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and declaring according to key words content and keyword grammer by software
Disconnected needs are looked for is picture file or link (for example, contain in the keyword searching of " .JPG " expression needs be .JPG file rather than the webpage that contains this literal).
The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the picture in identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return the result that the inquiry " does not have eligible picture ".
The 4th step: this searching key word is joined next round upgrade in the task of " homology picture indices database " and " non-homogeneous picture indices database ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " homology picture indices database " and " non-homogeneous picture indices database ":
A. by emerging picture file of " picture searching device " search and webpage or link inlet, enter this inlet by software and obtain this document or service.
B. by " image content decision device " judge new-found image content " belonging to same content? " with the content of current " homology picture indices database " if "Yes" then it is included into this classification of " homology picture indices database " as a new element; If "No" then judge that by " image content decision device " content of its " with current non-homogeneous picture indices database " belongs to same content? "
If C. "Yes" then: " for current picture and with it homology and be stored in picture in ' non-homogeneous picture indices database ', a newly-built classification is also all transferred to ' homology picture indices database ' "; If "No" then " be the current newly-built classification of picture, and deposit in ' non-homogeneous picture indices database ' ";
The 6th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to the inquiry's (seeing figure " M2 " mark) who comes to search for by browser again.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).
" homology document process module "
Fig. 6 is a homology document process module process flow diagram.Homology document process module " support common document format: " .Txt ", " .Doc ", " .PPT ", " .PDF ", " .XLS " or the like.For document files that meets search condition or link, " homology document process module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", directly respond the searching requirement of having been inquired about, avoid generating from database a large amount of calculating of dynamic web page according to request dynamic.
" homologous information processing module " is placed in " non-homogeneous document index database " and " homology document index database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." homology document process module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and what judge that needs look for according to key words content and keyword grammer by software is document files or link (for example, contain in the keyword searching of " .PDF " expression needs be .PDF file rather than the webpage that contains this literal).
The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the document in identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return the result that the inquiry " does not have eligible document ".
The 4th step: this searching key word is joined next round upgrade in the task of " homology document index database " and " non-homogeneous document index database ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " homology document index database " and " non-homogeneous document index database ":
A. by emerging document files of " document searching device " search and webpage or link inlet, enter this inlet by software and obtain this document or service.
B. by " word content decision device " and " image content decision device " judge new-found document content " belonging to same content? " with the content of current ' homology document index database ' if "Yes" then it is included into this classification of " homology document index database " as a new element; If "No" then judge that by " document content decision device " content of its " with current non-homogeneous document index database " belongs to same content? "
If C. "Yes" then: " for current document and with it homology and be stored in document in ' non-homogeneous document index database ', a newly-built classification is also all transferred to ' homology document index database ' "; If "No" then " be the current newly-built classification of document, and deposit in ' non-homogeneous document index database ' ";
The 6th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to the inquiry's (seeing figure " M2 " mark) who comes to search for by browser again.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).
" homology software processing module "
Fig. 7 is a homology software processing module process flow diagram.For software document that meets search condition or link, " homology software processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", directly respond the searching requirement of having been inquired about, avoid generating from database a large amount of calculating of dynamic web page according to request dynamic.
" homologous information processing module " is placed in " non-homogeneous software index data base " and " with the source software index data base " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." homology software processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and what judge that needs look for according to key words content and keyword grammer by software is software document or link (for example, contain in the keyword searching of " .EXE " expression needs be .EXE file rather than the webpage that contains this literal).
The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the software in identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return the result that the inquiry " does not have eligible software ".
The 4th step: this searching key word is joined next round upgrade in the task of " with the source software index data base " and " non-homogeneous software index data base ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " with the source software index data base " and " non-homogeneous software index data base ":
A. by emerging software document of " software search device " search and webpage or link inlet, enter this inlet by software and obtain this document or service.
B. by " software content decision device " judge new-found software content " belonging to same content? " with the content of current " with the source software index data base " if "Yes" then it is included into this classification of " with the source software index data base " as a new element; If "No" then judge that by " software content decision device " content of its " with current non-homogeneous software index data base " belongs to same content? "
If C. "Yes" then: " for current software and with it homology and be stored in software in ' non-homogeneous software index data base ', a newly-built classification is also all transferred to ' with the source software index data base ' "; If "No" then " be the current newly-built classification of software, and deposit in ' non-homogeneous software index data base ' ";
The 6th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to the inquiry's (seeing figure " M2 " mark) who comes to search for by browser again.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).
" with source data or database processing module "
Fig. 8 is with source data or database processing module process flow diagram.For software document that meets search condition or link, " homology data processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", directly respond the searching requirement of having been inquired about, avoid generating from database a large amount of calculating of dynamic web page according to request dynamic.
" homologous information processing module " is placed in " non-homogeneous data directory database " and " homology data directory database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." homology data processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and judge that by data based key words content and keyword grammer what need look for is data file or link (for example, contain in the keyword searching of " .DBF " expression needs be ..DBF file rather than the webpage that contains this literal).
The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the data in identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return the result that the inquiry " does not have eligible data ".
The 4th step: this searching key word is joined next round upgrade in the task of " homology data directory database " and " non-homogeneous data directory database ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " homology data directory database " and " non-homogeneous data directory database ":
A. by emerging data file of " data search device " search and webpage or link inlet, enter this inlet by data and obtain this document or service.
B. by " data content decision device " judge new-found data content " belonging to same content? " with the content of current " homology data directory database " if "Yes" then it is included into this classification of " homology data directory database " as a new element; If "No" then judge that by " data content decision device " content of its " with current non-homogeneous data directory database " belongs to same content? "
If C. "Yes" then: " for current data and with it homology and be stored in data in ' non-homogeneous data directory database ', a newly-built classification is also all transferred to ' homology data directory database ' "; If "No" then " be the current newly-built classification of data, and deposit in ' non-homogeneous data directory database ' ";
The 6th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to the inquiry's (seeing figure " M2 " mark) who comes to search for by browser again.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).
" homology GIS message processing module "
Fig. 9 is " homology GIS message processing module " process flow diagram.For the GIS message file or the link that meet search condition, " homology GIS message processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", directly respond the searching requirement of having been inquired about, avoid generating from database a large amount of calculating of dynamic web page according to request dynamic.
" homologous information processing module " is placed in " non-homogeneous GIS information index database " and " the homology GIS information index database " result is sub-category, and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." homology GIS message processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and what judge that needs look for according to key words content and keyword grammer by software is GIS message file or link (for example, contain in the keyword searching of " .JPG " expression needs be .JPG file rather than the webpage that contains this literal).
The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the GIS information in identical source among this result and aggregate into one " title search result ", after clicking " same source file " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return the result that the inquiry " does not have eligible GIS information ".
The 4th step: this searching key word is joined next round upgrade in the task of " homology GIS information index database " and " non-homogeneous GIS information index database ", and regularly start the renewal process of two databases.
The 5th step: the renewal process of " homology GIS information index database " and " non-homogeneous GIS information index database ":
A. by emerging GIS message file of " GIS information searcher " search and webpage or link inlet, enter this inlet by software and obtain this document or service.
B. by " GIS information content decision device " judge the new-found GIS information content " belonging to same content? " with the content of current " homology GIS information index database " if "Yes" then it is included into this classification of " homology GIS information index database " as a new element; If "No" then judge that by " GIS information content decision device " content of its " with current non-homogeneous GIS information index database " belongs to same content? "
If C. "Yes" then: " for current GIS information and with it homology and be stored in GIS information in ' non-homogeneous GIS information index database ', a newly-built classification is also all transferred to ' homology GIS information index database ' "; If "No" then " be the current newly-built classification of GIS information, and deposit in ' non-homogeneous GIS information index database ' ";
The 6th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " homology web results database " and " non-homogeneous web results database ", be published to " search engine search results Web server ", present to the inquiry's (seeing figure " M2 " mark) who comes to search for by browser again.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).
" with the value network service processing module "
Figure 10 is " with the value network service processing module " process flow diagram.For the network service that meets search condition, " with the value network service processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", directly respond the searching requirement of having been inquired about, avoid generating from database a large amount of calculating of dynamic web page according to request dynamic.
" with the value information processing module " is with in result is sub-category is placed on " non-with value network service index data base " and " serving index data base with value network ", and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." with the value network service processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and what judge that needs look for according to key words content and keyword grammer by software is network service document or link (for example, contain in the keyword searching of " .JPG " expression needs be .JPG file rather than the webpage that contains this literal).
The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the network service in identical source among this result and aggregate into one " title search result ", after clicking " same value document " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return the result that the inquiry " does not have eligible network service ".
The 4th step: this searching key word is joined next round upgrade in the task of " with value network service index data base " and " non-", and regularly start the renewal process of two databases with value network service index data base.
The 5th step: the renewal process of " with value network service index data base " and " non-" with value network service index data base:
A. by emerging network service document of " network service search device " search and webpage or link inlet, enter this inlet by software and obtain this document or service.
B. by " network service content decision device " judge new-found network service content " belonging to same content? " with the content of current " with value network service index data base " if "Yes" then it is included into this classification of " with value network service index data base " as a new element; If "No" then judge that by " network service content decision device " content of its " with current non-with value network service index data base " belongs to same content? "
If C. "Yes" then: " for current network service and with it be worth and be stored in network service in ' non-' with value network service index data base, a newly-built classification is also all transferred to ' serving index data base with value network ' "; If "No" then " serve a newly-built classification, and deposit in ' non-with value network service index data base ' " for current network;
The 6th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " with being worth the webpage result database " and " non-" with being worth the webpage result database, be published to " search engine search results Web server ", present to the inquiry's (seeing figure " M2 " mark) who comes to search for by browser again.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).
" with being worth the business information processing module "
Figure 11 is " with being worth the business information processing module " process flow diagram.For the business information that meets search condition, " with being worth the business information processing module " all adopts the hyperlink mode in the Html webpage to offer by the inquiry.For improving the serviceability of native system substantially, we have adopted following technology:
Adopted the webpage distribution technology, use " search result web page distributor " Search Results to be published in advance " search engine search results Web server ", directly respond the searching requirement of having been inquired about, avoid generating from database a large amount of calculating of dynamic web page according to request dynamic.
" with the value information processing module " is with in result is sub-category is placed on " non-with being worth the business information index data base " and " with being worth the business information index data base ", and regularly be published to " search engine search results Web server " by " search result web page distributor ", avoid double counting and reduced the calculating stand-by period." with being worth the business information processing module " treatment scheme is as follows:
The 1st step: receiving inquiry's searching key word, and what judge that needs look for according to key words content and keyword grammer by software is business information file or link (for example, contain in the keyword searching of " .JPG " expression needs be .JPG file rather than the webpage that contains this literal).
The 2nd step: judge " content that will search for is distributed on the Web server? " if the target of search is distributed on " search engine search results Web server " then directly returns Search Results (seeing figure " M1 " mark), will meet the interface that obtains that search condition has the business information in identical source among this result and aggregate into one " title search result ", after clicking " same value document " button, can on " search engine search results Web server ", see the webpage that another comprises whole Search Results, the inquiry can be seen meet whole Search Results of querying condition, finish search procedure.If the target of search is not distributed on " search engine search results Web server " since the 3rd step.
The 3rd step: return the result that the inquiry " does not have eligible business information ".
The 4th step: this searching key word is joined next round upgrade in the task of " with being worth the business information index data base " and " non-", and regularly start the renewal process of two databases with being worth the business information index data base.
The 5th step: the renewal process of " with being worth the business information index data base " and " non-" with being worth the business information index data base:
A. by emerging business information file of " business information searcher " search and webpage or link inlet, enter this inlet by software and obtain this document or service.
B. by " business information content decision device " judge new-found business information content " belonging to same content? " with the content of current " with being worth the business information index data base " if "Yes" then it is included into this classification of " with being worth the business information index data base " as a new element; If "No" then judge that by " business information content decision device " content of its " with current non-with being worth the business information index data base " belongs to same content? "
If C. "Yes" then: " for current business information and with it be worth and be stored in business information in ' non-' with being worth the business information index data base, a newly-built classification is also all transferred to ' with value business information index data base ' "; If "No" then " be the current newly-built classification of business information, and deposit in ' non-with be worth business information index data base ' ";
The 6th step: the static Web page that dynamically generates Search Results by the content of " search result web page distributor " basis " with being worth the webpage result database " and " non-" with being worth the webpage result database, be published to " search engine search results Web server ", present to the inquiry's (seeing figure " M2 " mark) who comes to search for by browser again.
As the another kind of implementation method in the 6th step, also can directly present to inquiring user by " dynamic web page Web server " by browser.(seeing figure " M3 " mark).
The characteristics of " with being worth the business information processing module " are and can judge whether a plurality of business information targets have identical use value to the inquiry automatically with inquiry's distribution according to commodity or service feature, supply, thereby as the foundation that it is aggregated into " title search result ", and the foundation of Query Result ordering.
The content decision device can be general in various " homology (with being worth) message processing modules ".
" content decision device " specific implementation
" content of multimedia decision device " specific implementation:
1 input: many matchmakers file (record into file if the service of playing just will rise, or obtain media file information) that can receive a plurality of sources from Play Server.
2 handle: carry out the comparison of the content of multimedia goodness of fit.
3 return: calculate the identical content degree value that has in the input multimedia: SameMediaPower.
The specific implementation method:
The 1st step: receive " being judged object ": the multimedia that can receive a plurality of sources.And record is judged the quantity of object: InputQuantity.
The 2nd step: search the attribute that " being judged object " can participate in comparing in following table, write down the quantity that current attribute has identical value " being judged object ": SameQuantity (for example, 5 are judged in the object, there is the attribute of 3 objects to have identical value, then the SameQuantity=3 of this attribute)
The 3rd step: import current attribute " weight " value (from following table, finding) in deterministic process: Power
The 4th step: calculate by whole " being judged object " goodness of fit on current attribute: PSame=SameQuantity*Power
The 5th step: return " the 1st step " to next " attribute " execution " the 1st step "~" the 4th step ", obtain the PSame of this attribute.Until obtain subordinate's property the PSame value.
The 6th step: the identical content degree value of calculating and return " being judged object ": SameMediaPower=(all mathematics accumulated values of Psame value)/InputQuantity.
Content is judged in video file or the service of playing:
Figure BSA00000554541100261
Figure BSA00000554541100271
Note:
1. the invention reside in employing " weight " value and calculate the method for the comparison importance of every kind of attribute, and be not only listed concrete numerical value in the table, " weight " concrete numerical value only is representative value in the table, changes its concrete numerical value according to actual needs and still belongs to category of the present invention.
2. according to actual conditions, some property value may be " empty (Null) ", and property value equates for " sky " Shi Buying is considered attribute in the computation process.
Audio file is judged content:
Figure BSA00000554541100272
Figure BSA00000554541100281
Note:
1 the invention reside in the method that employing " weight " value is calculated the comparison importance of every kind of attribute, and be not only listed concrete numerical value in the table, " weight " concrete numerical value only is representative value in the table, changes its concrete numerical value according to actual needs and still belongs to category of the present invention.
2 according to actual conditions, and some property value may be " empty (Null) ", and property value equates for " sky " Shi Buying is considered attribute in the computation process.
The Flash file is judged content:
Figure BSA00000554541100282
Note:
1. the invention reside in employing " weight " value and calculate the method for the comparison importance of every kind of attribute, and be not only listed concrete numerical value in the table, " weight " concrete numerical value only is representative value in the table, changes its concrete numerical value according to actual needs and still belongs to category of the present invention.
2. according to actual conditions, some property value may be " empty (Null) ", and property value equates for " sky " Shi Buying is considered attribute in the computation process.
" image content decision device " specific implementation
1 input: the picture that can receive a plurality of sources.
2 handle: carry out the comparison of the image content goodness of fit.
3 return: calculate the identical content degree value that has in the input picture: SamePicPower.
The specific implementation method:
The 1st step: receive " being judged object ": the picture that can receive a plurality of sources.And record is judged the quantity of object: InputQuantity.
The 2nd step: search the attribute that " being judged object " can participate in comparing in following table, write down the quantity that current attribute has identical value " being judged object ": SameQuantity (for example, 5 are judged in the object, there is the attribute of 3 objects to have identical value, then the SameQuantity=3 of this attribute)
The 3rd step: import current attribute " weight " value (from following table, finding) in deterministic process: Power
The 4th step: calculate by whole " being judged object " goodness of fit on current attribute: PSame=SameQuantity*Power
The 5th step: return " the 1st step " to next " attribute " execution " the 1st step "~" the 4th step ", obtain the PSame of this attribute.Until obtain subordinate's property the PSame value.
The 6th step: the identical content degree value of calculating and return " being judged object ": SamePicPower=(all mathematics accumulated values of Psame value)/InputQuantity.
According to the judgement of the various attributes of picture and image recognition software for similarity degree.
Figure BSA00000554541100291
Note:
1. the invention reside in employing " weight " value and calculate the method for the comparison importance of every kind of attribute, and be not only listed concrete numerical value in the table, " weight " concrete numerical value only is representative value in the table, changes its concrete numerical value according to actual needs and still belongs to category of the present invention.
2. according to actual conditions, some property value may be " empty (Null) ", and property value equates for " sky " Shi Buying is considered attribute in the computation process.
" word content decision device " specific implementation
" word content decision device ", can realize by software:
1 input: can receive the literal in a plurality of sources, as " being judged object ".
2 handle: carry out the comparison of the image content goodness of fit.
3 return: the consistent degree value SameTextPower between " being judged object ".
Implementation method:
The 1st step: find out in a plurality of pictures of input
In the word content, has the total length value of the part of identical word or sentence: SameLenth.
The 2nd step: find out in a plurality of word contents of input the length value of the input characters that length is the shortest, MinLenth.
The 3rd step: return literal similarity degree value: SameTextPower=SameLenth/MinLenth
In the literal that finds in this way: the normally same piece of writing article number of pages of the long article word of length is few or contain mass advertising and outside hyperlink, and the shortest normally same piece of writing of the literal article of length is divided into multipage number more or contain minimum advertisement and outside hyperlink.
" linked contents decision device " specific implementation
" linked contents decision device " can be realized by software: be used for comparing the hyperlink that is contained on a plurality of webpages and whether have common trait.
1 input: the Url address (every group of whole hyperlinks that hyperlink normally obtains from a webpage) of organizing hyperlink more.
2 handle: carry out goodness of fit calculating in hyperlink Url address between each group
3 return: have identical hyperlink number between each group.
Implementation method:
The 1st step: receive " being judged object ": the URL address of organizing hyperlink more.
The 2nd step: the URL number of addresses that statistics " being judged object " similarity degree: SameURLPower=all occurred every group of hyperlink.
The 3rd step: return SameURLPower.
" software content decision device " specific implementation
" software content decision device ", whether a plurality of softwares that are used for comparing input are software of the same race.
1 input: the software that can receive a plurality of sources.
2 handle: carry out the comparison of the software content goodness of fit.
3 return: software content goodness of fit numerical value.
The specific implementation method:
The 1st step: receive " being judged object ": the file of a plurality of inputs or catalogue.And record is judged the quantity of object: InputQuantity.
The 2nd step: search the attribute that " being judged object " can be compared in following table, write down the quantity that current attribute has identical value " being judged object ": SameQuantity (for example, 5 are judged in the object, there is the attribute of 3 objects to have identical value, then the SameQuantity=3 of this attribute)
The 3rd step: import current attribute " weight " value (from following table, finding) in deterministic process: Power
The 4th step: calculate by whole " being judged object " goodness of fit on current attribute: PSame=SameQuantity*Power.
The 5th step: return " the 1st step " to next " attribute " execution " the 1st step "~" the 4th step ", obtain the PSame of this attribute.Until obtain subordinate's property the PSame value.
The 6th step: the identical value of calculating and return " being judged object ": SameSoftPower=(all mathematics accumulated values of Psame value)/InputQuantity.
Figure BSA00000554541100311
Figure BSA00000554541100321
Note:
1. the invention reside in employing " weight " value and calculate the method for the comparison importance of every kind of attribute, and be not only listed concrete numerical value in the table, " weight " concrete numerical value only is representative value in the table, changes its concrete numerical value according to actual needs and still belongs to category of the present invention.
2. according to actual conditions, some property value may be " empty (Null) ", and property value equates for " sky " Shi Buying is considered attribute in the computation process.
" data or data-base content decision device " specific implementation
Whether every data recording content comparing one by one in the different pieces of information library file equates, returns the database consistent degree value SameDBPower that participates in comparison and whether surpasses thresholding.
The database of the record number that the SameDBPower=field name is identical and numerical value equates/participation comparison has the minimum record number of this field.
SameDBPower has reflected that identical content record number has the ratio of the database of minimum record number relatively, and the SameDBPower value is: 0~1.
" data or data-base content decision device " specific implementation
Can adopt following performing step for data file:
The 1st step: in a plurality of data files that participate in comparison, file of picked at random is as " comparison standard ".
The 2nd step: carry out the conforming rough comparison of other file and " comparison standard ": file size, file verification and, file attribute informations such as title, theme, version, author, classification, key word, remarks.
The 3rd step: if unanimity then be judged to be " rough consistent ", such judged result is the output of conduct " data or data-base content decision device " directly.
The 4th step: further compare as need, in the input file that obtains " rough consistent ", carried out for the 5th step.
The 5th step: meticulous comparison: the comparison one by one of each byte in file attribute information and the file.All all identical file of feature can be judged to be " in full accord ", as the output of " data or data-base content decision device ".
Can adopt following performing step for database file:
The 1st step: the database file to input judges whether to meet database format of the same race according to filename suffix and file attribute.
The 2nd step: carried out for the 3rd step for database format of the same race, for direct the 4th step of database format not of the same race
The 3rd step: form database of the same race compares roughly: file size, file verification and, file attribute informations such as title, theme, version, author, classification, key word, remarks.Above-mentioned feature carried out for the 4th step not in full conformity with as the output of " inconsistent " judged result for the database file that meets fully.
The 4th step: the meticulous comparison of database: (this step adapts to various database file and participates in the content comparison).Form according to every kind of database file extracts its " database table " one by one, and judge whether its " database table " structure is consistent: inconsistent conduct " inconsistent " output, consistent database file carried out for the 5th step.
The 5th step: the content of comparing every record of the database file that participates in comparison one by one: run into the identical situation of recorded content: for counter " the record number that the SameRecNum field name is identical and numerical value equates " adds 1.
The 6th step: calculate " SameDBPower database consistent degree value "=" the record number that the SameRecNum field name is identical and numerical value equates "/" database that participates in comparing has the minimum record number of this field ".(SameDBPower has reflected that identical content record number has the ratio of the database of minimum record number relatively, and the SameDBPower value is: 0~1).
The 7th step: judge that whether " SameDBPower database consistent degree value " surpasses thresholding, surpass thresholding and then export " unanimity " as judged result, otherwise output " inconsistent " is as judged result.
" GIS information content decision device "
" GIS information content decision device ", can realize by software:
1 input: can receive the numerical map in a plurality of sources, as " being judged object ".
2 handle: carry out the goodness of fit comparison of the coverage of numerical map.
3 return: the consistent degree value SameMapPower (value 0~1) between " being judged object ".
Implementation method:
The 1st step: open the numerical map file of participating in comparison according to the form of numerical map.
The 2nd step: find the northwest corner of numerical map and the longitude and latitude of southeast corner (also can be the map diagonal angle of other form).
The 3rd step: the northwest corner of the numerical map of comparing and longitude, the latitude error of southeast corner are participated in comparison, calculate the consistance value SameMapPower of map overlay area:
Suppose that " Fig. 1 " and " Fig. 2 " participates in comparison:
Then:
The area of minimum map in the secondary map of area/two of SameMapPower=two secondary map overlapping regions.
The 4th step: return the SameMapPower value.
The 5th step: judge whether (for example: threshold value=0.8), be then to be judged to be identical map, be not then to be judged to be map inequality to SameMapPower above thresholding.
" network service content decision device "
The FTP service content judgement of " network service content decision device ":
The 1st step: adopt corresponding File Transfer Protocol to land the service that participates in comparison, and obtain its inner file.
The 2nd step: behind the file that obtains the FTP service, at first judge according to the filename suffix whether file type is consistent, if inconsistent returning " inconsistent " is as output, if the file type unanimity carried out for the 3rd step.
The 3rd step: whether consistent, and return its judged result if adopting " content of multimedia decision device ", " image content decision device ", " word content decision device ", " software content decision device ", " data or data-base content decision device " or " GIS information content decision device " to adjudicate its file content according to file type.
The mailbox service content judgement that the Email website provides:
If the mailbox service information spinner that the Email website provides is by the webpage of each website of software search, and from the webpage label, parse mailbox size, charge situation, whether support information such as POP agreement.
The 1st step: mailbox size is divided into corresponding grade, (for example: 10MB~25MB, 25MB~100MB, 100MB~300MB, 300MB~1GB, 1GB~100GB etc.), whether the mailbox that judge to participate in comparison then is in same rank, if " be not " then return " inconsistent ", if "Yes" then carried out for the 2nd step.
The 2nd step: whether comparison " charge situation " is consistent, if " not being " then return " inconsistent ", if "Yes" then carried out for the 3rd step.
The 3rd step: comparison supports whether the POP terms of agreement is consistent, if " not being " then return " inconsistent ", if "Yes" then return " unanimity ".
" business information content decision device "
Whether product of issuing on webpage or service sale information is identical, and in identical physical geography scope, in the identical administrative geography scope, identical distance range.
The 1st step: whether the business information that comparison participates in comparison is identical product or service, if " not being " returns " inconsistent ", if "Yes" entered for the 2nd step.
The 2nd step: whether the business information that judge to participate in comparison (for example: personal consumption class commodity, need have geographic position susceptibility to on-the-spot service of serving has geographic position susceptibility, for example ice cream, private tutor's service etc.), if " be not " to return judged result " unanimity ", and if "Yes" would carry out the 3rd the step.
The 3rd step: whether the supplier who judges the business information that participates in comparison is in identical city or zone, if " not being " returns judged result " inconsistent ", if return judged result " unanimity ".
" obtain web page user attention rate subsystem "
Figure 12 is for obtaining web page user attention rate subsystem structure figure.This search engine can and supporting with it web browser (or compatible this search engine can and supporting with it web browser between other third party's browsers of communications protocol) the collaborative work mode, gather the degree of concern of user by web browser to each webpage, and report search engine, the foundation of carrying out search result rank or selection " title search result " as search engine.This method and device can also be separately outside search engines, and independent formation can provide the Web inquiry system of " webpage popular degree ranking list ", and can carry out charge operation or in return condition exchange other interests for.
Native system mainly comprises the two large divisions: " the PageFocus webserver " and " PageFocus web browser ".
" the PageFocus webserver " structure
" the PageFocus webserver " obtains the degree of concern of global user to each webpage by " PageFocus web browser ", and forms " pay close attention to score value PageFocus " database of this webpage, as the metric of the popular degree of webpage.
" the PageFocus webserver " is made up of following:
(1) " PageFocus browser ID registrar ": for " the PageFocus web browser " that is just using on network distributes globally unique ID identification number.
(2) " the PageFocusAccServer webpage is paid close attention to statistical server ": " the paying close attention to score value PageFocus " for one or more webpages that comprises in " PageFocus packet " that " the PageFocus web browser " that the reception whole world is being moved sent.Be used for distinguishing the different users that browses for ID number.
(3) " PageFocus browser online upgrading server ": be used for providing online upgrade service to the whole world " PageFocus web browser ".
(4) " data encrypting and deciphering module ": be used between " the PageFocus webserver " and " PageFocus web browser ", transmitting enciphered data, place and attacked or steal information.
" PageFocus web browser " structure
" PageFocus web browser " reports the degree of concern of active user for certain webpage by network to " the PageFocus webserver ".
" PageFocus web browser " is made up of following:
(1) " pays close attention to score value PageFocus computing module ": according to the operation of user to " PageFocus web browser ", calculate the degree of concern of user, and form " PageFocus packet " to " the PageFocusAccServer webpage is paid close attention to statistical server " report to certain webpage.
(2) " PageFocus browser ID Registering modules ": with " PageFocus browser ID registrar " communication obtaining globally unique sign ID, as the foundation of distinguishing different user.
(3) " PageFocus browser online upgrading module ":, be latest edition to keep " PageFocus browser " on active user's computing machine with " PageFocus browser online upgrading server " communication.
This device comprises: " the PageFocus web browser " of the invention, " PageFocus browser ID registrar " and " webpage score server ", and the specific implementation method is as follows:
The 1st step: develop special " a PageFocus web browser ", each browser all possesses globally unique ID identification number when mounted, or initiatively seeks " PageFocus browser ID registrar " on the network in use to obtain globally unique ID identification number.
The 2nd step: " PageFocus web browser " possesses and (for example: the repertoire IE browser of Microsoft) has the general networks browser.
The 3rd step: " PageFocus web browser " also possesses the user converted to " paying close attention to score value PageFocus " of webpage and forms " PageFocus packet " according to the listed weight of following table to the operation of browser with to the operation of webpage, be passed to " the PageFocusAccServer webpage is paid close attention to statistical server " of this search engine with cipher mode by procotol.
The 4th step: " paying close attention to score value PageFocus " that " the PageFocusAccServer webpage is paid close attention to statistical server " comprises its inside after " PageFocus packet " that each " PageFocus web browser " of receiving the whole world sent is added on the corresponding webpage.
The 5th step: " paying close attention to score value PageFocus " of each webpage of the whole world that comprises on " PageFocusAccServer webpage pay close attention to statistical server ", these information can form by various disposal routes: search engine is selected to can be used as the foundation of " title search result ", also can directly be announced out the service of conduct " webpage hot topic degree ranking list " according to, search engine the webpage seniority among brothers and sisters in having the identical content Search Results.
The method that " PageFocus web browser " calculating " is paid close attention to score value PageFocus ":
Because the repertoire that " PageFocus web browser " has generic browser, so can be when the user uses browser, gather its operation behavior according to following table, and according to " weight " of every kind of behavior this webpage is carried out " paying close attention to score value PageFocus " and score, and when browser thoroughly cuts out this webpage, form a branch value record of " paying close attention to score value PageFocus " about this webpage, issue with the form of " PageFocus packet "
" the PageFocusAccServer webpage is paid close attention to statistical server ".
Figure BSA00000554541100361
Figure BSA00000554541100371
Note:
1. though have erroneous judgement with these standards of grading, can obtain statistical accuracy by a large amount of operations on the network.
2. listed " weight " concrete numerical value in the table is representative value only, and the invention reside in by browser is page marking, and the change of any other " weight project " and " weight " all belongs to category of the present invention.
3. adopt the user that the mode of webpage ballot is based on abundant trust for netizen's social morality, so its " weight " to the mathematics multiplication of whole score, rather than the mathematics addition.
4. because each webpage all may obtain a large amount of PageFocus scores, may cause overflowing of software variable, so can adopt " mathematics logarithm " or " scientific notation " record score " the PageFocusAccServer webpage is paid close attention to statistical server ".
5. be other approach of this method, except when browser thoroughly cuts out this webpage, forming " PageFocus packet ", can also determine the opportunity of " PageFocus packet " with other any regular, for example: regularly, be accumulated to certain score value or the like, these methods all belong to category of the present invention.
6. the detailed calculated method of " every style of writing word reading rate " in showing:
A. mouse roller rolls: word read speed=(viewing area width/set width) *Each literal line number of rolling/rolling time at interval.
B. keyboard page turning: word read speed=(viewing area width/set width) *Literal line number/page turning the time interval of each page turning.
C. the forms scroll bar rolls: word read speed=(viewing area width/set width) *Each literal line number of rolling/rolling time at interval.
The formation method of " PageFocus packet "
The content of " PageFocus packet ":
Figure BSA00000554541100381
Note: each " PageFocus packet " can comprise the call of a plurality of webpages.Every webpage call can also add other attribute, but in order to raise the efficiency, only lists important contents in the table, adds other attributes and also belong to category of the present invention in table." PageFocus packet " sends the selection on opportunity:
Reduce to send bandwidth that " PageFocus packet " take and the pressure that brings to server end, can take one of following several schemes:
When certain webpage is thoroughly sent " PageFocus packet " when browser cuts out.
When thoroughly cutting out, browser sends " PageFocus packet ".
Browser is retained in local computer with " PageFocus packet " with document form, runs up to specific quantity or length-specific or special time and sends during the cycle again.
" title search result " selection algorithm
This algorithm is mainly used in " homology Search Results " how to select to be used as " title search result " in original searching results.This algorithm need address the problem:
1. judge the content quality of webpage by network user behavior and web page contents, the preferential demonstration that quality is high.
2. avoid a certain Search Results to bear too much click traffic, cause the slack-off even collapse of website processing speed because of becoming " title search result ".
3. avoid a certain Search Results to bear too much click traffic and cause service response speed slack-off, and reduce visitor's experience good opinion because of becoming " title search result ".
4. making becomes " title search result " as a kind of power, can offer the website that needs, and this power can be bought in these websites.
5. the baseline results of each " homology Search Results " all has an opportunity to become " title search result " according to certain probability.
" title search result " system of selection is, when in " homology Search Results ", selecting " title search result ", " search result content quality ", " weighted value " and " service response delay " three key elements have been considered simultaneously, that is: the preferential demonstration that content quality is high, the preferential demonstration that has preferential demonstration, the network of weighting to serve; Then still according to this principle, and " weighted value " can be bought to system operator of the present invention when arranging all " homology Search Results ".The specific implementation method that " title search result " selects is as follows:
The 1st step: calculating each " homology Search Results " becomes the probability weights PWn of " title search result " (this Search Results is the n bar):
PWn=TP*PageFocus/(RespDelay-K)
Note 1: when (RespDelay-K) smaller or equal to zero the time, (RespDelay-K) answering value is 1.
Note 2: the variable implication is as follows in the formula
A.PageFocus webpage attention rate value: be this Search Results according to the present invention in " obtaining the method and apparatus of web page user attention rate " " PageFocus value " of being obtained.
B.RespDelay web service operating lag: be that this Search Results is at the operating lag when the searchers provides service access.(because the operating lag that depends on the website is experienced in visit, react slow more, it is poor more to experience).
C.K service response constant: be the constant that can define, 50 milliseconds (ms) used in suggestion, and the service response that is lower than the K value postpones and will do not discovered, and does not influence experience, thereby can ignore.
The D.TP title search is power as a result: as a kind of weighting, anyone can obtain " the TP title search is power as a result " by various give-and-take conditions with the network operator of system of the present invention.
E. as other implementation algorithm of this formula, following other form can also be arranged:
a.PWn=(TP+PageFocus)/(RespDelay-K)
b.PWn=(TP+PageFocus)/RespDelay/K
c.PWn=TP*PageFocus/RespDelay/K
The 2nd step: the summation of the probability weights PWn of statistics summation all original " homology Search Results ": the whole probability weights of PWall.
The 3rd step: calculate the probability that every " homology Search Results " becomes " title search result ": Pn=PWn/PWall.
The 4th step: according to the probability of Pn value,, dynamically select at random " title search result ", present to the searchers along with searchers's visit action.
The adaptive apparatus and method of web site contents style
Content of the present invention is: utilize the various information of can be obtainable, helping to judge user's environment of living in and state, make the user who is in different operating or life leisure state under the prerequisite that need not any operation, registration, setting or Cookie setting, see different styles during visit same page URL address, comprising:
1. utilize user's IP address to judge its residing country or zone,, can judge that by his time he is in the duty state that still lies fallow again in conjunction with just can calculating local administrative region time of visitor by this website time.
2. by user's IP address, can inquire the attribute of this IP address: family, workplace.Style and the content that is fit to its environment of living in is provided according to its place of living in.
3. can know its residing geographic position by user's IP address, when the inquiry business information, can
Automatically will be arranged in the foremost apart from he nearest supplier.
Be exemplified below:
Synchronization, the content of seeing during webpage of identical URL in this website of different user captures is different:
A. the user in duty and the environment sees is serious, brief introduction, the page that does not contain leisure recreation and amusement information.
What the user in state and the environment of B. lying fallow saw is the page of livening up, can containing leisure recreation and amusement information, can contain the personal consumption advertising message.
The present invention can partly or entirely be applied to the web station system beyond the search engine, all belongs to category of the present invention.
Each large-scale website in order to satisfy the visit of big flow, has all adopted server cluster, even has set up the local service subsystem in the zone at present, shunts user capture.But being exactly each cluster member, the key character of present server cluster all provides identical content.As Figure 13: the preceding user who visits is by " Website server cluster inlet " equipment, any feature of part, directly be assigned on certain server cluster member server with identical content.
As Figure 14, and device of the present invention has been done partly change to said structure, after " Website server cluster inlet " receives calling party, whether in running order the various customer attribute informations such as IP address that send during according to its access websites judge whether it is in running order, and provide the information service of different-style and content to it according to it.
Automatically judge User Status and provide appropriate web page style and the method for content
The 1st step: at first server cluster is divided into " work style " and " individual and leisure style " two big classes, no matter be static page or dynamic page, in the identical content of this two classes server update, automatically produce two class styles, so that the user of different operating or life leisure state sees different styles when visit same page URL address.
The 2nd step: after " Website server cluster inlet " receives that the user visits the request of this website webpage first, at first obtain its IP address at (or in IP layer protocol) in the access protocal.
The 3rd step: inquiring about its IP address according to the IP address in " IP address properties database " is " IP address, workplace " or " the IP address of individual or leisure occasion ", if " IP address, workplace " then carried out for the 4th step, if then carried out for the 5th step " the IP address of individual or leisure occasion ".
The 4th step: obtain " IP address, workplace " residing geographic position, and obtain administrative time of this geographic area, (week 1~5 8:00~20:00) then is assigned to its visit " work style server " in the server cluster and goes up to provide to it and be fit to the page service that use the workplace, otherwise carries out for the 5th step if this IP address affiliated area is in the working time.
The 5th step: " individual and leisure style server " that then its visit be assigned in the server cluster upward provides the page service that is fit to individual and the use of leisure state to it.

Claims (8)

1. system that obtains web page user attention rate PageFocus based on the homologous information search engine aggregation display method of attention rate, described system comprises the PageFocus webserver, PageFocus web browser, it is characterized in that:
(1) the PageFocus webserver comprises PageFocus browser ID registrar, the concern of PageFocusAccServer webpage statistical server, PageFocus browser online upgrading server and data encrypting and deciphering module;
(2) the PageFocus web browser comprises PageFocus browser ID Registering modules, pays close attention to score value PageFocus computing module;
The job step of described system is as follows:
(1) " PageFocus web browser ", each PageFocus web browser all possesses globally unique ID identification number when mounted, or initiatively seeks " PageFocus browser ID registrar " on the network in use to obtain globally unique ID identification number;
(2) " PageFocus web browser " possesses and has the general networks browser, and with the user to the operation of PageFocus web browser with to the operation of webpage, and the web page contents feature converts " pay close attention to score value PageFocus " of webpage to and forms " PageFocus packet " according to weight, is passed to " the PageFocusAccServer webpage is paid close attention to statistical server " of this search engine by procotol with cipher mode;
(3) " PageFocusAccServer webpage pay close attention to statistical server " will be somebody's turn to do " the concern score value PageFocus " that " the PageFocusAccServer webpage is paid close attention to statistical server " inside comprises and be added on the corresponding webpage after " PageFocus packet " that each " PageFocus web browser " of receiving the whole world sent;
(4) " paying close attention to score value PageFocus " of each webpage of the whole world that comprises on " PageFocusAccServer webpage pay close attention to statistical server ", these information can form by various disposal routes: search engine is selected to can be used as the foundation of " title search result ", also can directly be announced out the service of conduct " webpage hot topic degree ranking list " according to, search engine the webpage seniority among brothers and sisters in having the identical content Search Results.
2. system according to claim 1, it is characterized in that, described PageFocus packet can form when the PageFocus web browser thoroughly cuts out this webpage, also can regularly form, form again in the time of also can being accumulated to certain score value, pay close attention to the calculating pressure of statistical server to reduce the PageFocusAccServer webpage.
3. system according to claim 1 is characterized in that, described concern score value PageFocus forms according to the listed weight of following table:
Figure FSA00000554541000021
Figure FSA00000554541000031
4. system according to claim 3 is characterized in that, the calculation procedure of described word read speed is as follows:
A. mouse roller rolls: word read speed=(viewing area width/set width) *Each literal line number of rolling/rolling time at interval;
B. keyboard page turning: word read speed=(viewing area width/set width) *Literal line number/page turning the time interval of each page turning;
C. the forms scroll bar rolls: word read speed=(viewing area width/set width) *Each literal line number of rolling/rolling time at interval.
5. system according to claim 3 is characterized in that, described PageFocus packet comprises PageFocus browser ID, webpage URL and webpage PageFocus score value field.
6. system according to claim 1, it is characterized in that, each webpage that possesses " same source web page " is in the page rank process that the participation search engine provides, can use the foundation of the summation of user's attention rate PageFocus score value that each " same source web page " obtain as rank, that is: A can adopt the summation of user's attention rate PageFocus that each " same source web page " obtain as the rank foundation when participating in the search-engine results rank in " the title search result " of " same source web page "; Each webpage in the B " same source web page " also can adopt the summation of user's attention rate PageFocus that each webpage of " the same source web page " of its subordinate obtains as the rank foundation when participating in the search-engine results rank.
7. system according to claim 1 is characterized in that, described PageFocus web browser also comprises PageFocus browser online upgrading module.
8. according to the arbitrary described system of claim 1~7, it is characterized in that described system also comprises webpage score server.
CN 201110228853 2006-02-22 2006-02-22 System for obtaining page user focus degree PageFocus by method for aggregating and displaying same source information search engine based on focus degree Expired - Fee Related CN102298621B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110228853 CN102298621B (en) 2006-02-22 2006-02-22 System for obtaining page user focus degree PageFocus by method for aggregating and displaying same source information search engine based on focus degree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110228853 CN102298621B (en) 2006-02-22 2006-02-22 System for obtaining page user focus degree PageFocus by method for aggregating and displaying same source information search engine based on focus degree

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN2006100079057A Division CN101025737B (en) 2006-02-22 2006-02-22 Attention degree based same source information search engine aggregation display method

Publications (2)

Publication Number Publication Date
CN102298621A true CN102298621A (en) 2011-12-28
CN102298621B CN102298621B (en) 2013-11-06

Family

ID=45359035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110228853 Expired - Fee Related CN102298621B (en) 2006-02-22 2006-02-22 System for obtaining page user focus degree PageFocus by method for aggregating and displaying same source information search engine based on focus degree

Country Status (1)

Country Link
CN (1) CN102298621B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246680A (en) * 2012-02-13 2013-08-14 腾讯科技(深圳)有限公司 Method and device for aggregating and displaying webpage contents in browser
CN103631850A (en) * 2012-08-24 2014-03-12 三星电子株式会社 Electronic device and method for automatically storing url by calculating content stay value
CN104750701A (en) * 2013-12-27 2015-07-01 中兴通讯股份有限公司 Search processing method, device and terminal
CN105659235A (en) * 2016-01-08 2016-06-08 马岩 A term searching method for network information and a system thereof
TWI587703B (en) * 2015-09-25 2017-06-11 禾聯碩股份有限公司 Displaying device and method thereof for generating chatting room with selected channels
CN114154027A (en) * 2021-12-06 2022-03-08 深圳市大数据资源管理中心 Non-homologous inconsistent data processing method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100538547B1 (en) * 1995-12-30 2006-02-28 타임라인 인코퍼레이티드 Data retrieval method and apparatus with multiple source capability
CN1254136A (en) * 1998-11-12 2000-05-24 英业达股份有限公司 Method for inquiring about index multi-media header data and its device

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246680A (en) * 2012-02-13 2013-08-14 腾讯科技(深圳)有限公司 Method and device for aggregating and displaying webpage contents in browser
CN103631850A (en) * 2012-08-24 2014-03-12 三星电子株式会社 Electronic device and method for automatically storing url by calculating content stay value
CN104750701A (en) * 2013-12-27 2015-07-01 中兴通讯股份有限公司 Search processing method, device and terminal
TWI587703B (en) * 2015-09-25 2017-06-11 禾聯碩股份有限公司 Displaying device and method thereof for generating chatting room with selected channels
CN105659235A (en) * 2016-01-08 2016-06-08 马岩 A term searching method for network information and a system thereof
CN114154027A (en) * 2021-12-06 2022-03-08 深圳市大数据资源管理中心 Non-homologous inconsistent data processing method

Also Published As

Publication number Publication date
CN102298621B (en) 2013-11-06

Similar Documents

Publication Publication Date Title
CN101025737B (en) Attention degree based same source information search engine aggregation display method
TWI416344B (en) Computer-implemented method and computer-readable medium for providing access to content
US9858348B1 (en) System and method for presentation of media related to a context
Efron Information search and retrieval in microblogs
US8447640B2 (en) Device, system and method of handling user requests
JP5596152B2 (en) Information matching method and system on electronic commerce website
US7987261B2 (en) Traffic predictor for network-accessible information modules
CN101568921A (en) Dynamic pricing models for digital content
US20110191332A1 (en) Method of and System for Updating Locally Cached Content Descriptor Information
US8688519B1 (en) Targeting mobile applications through search query mining
CN101512586A (en) Serving locally relevant advertisements
KR20100094021A (en) Customized and intellectual symbol, icon internet information searching system utilizing a mobile communication terminal and ip-based information terminal
CN103221951A (en) Predictive query suggestion caching
US20080235111A1 (en) Internet Art Community
CN102782676A (en) Online search based on geography tagged recommendations
US20140136517A1 (en) Apparatus And Methods for Providing Search Results
CN101350092A (en) Method and system for publishing network advertisement
CN102298621B (en) System for obtaining page user focus degree PageFocus by method for aggregating and displaying same source information search engine based on focus degree
CN104050243A (en) Network searching method and system combined with searching and social contact
CN105787066A (en) Digital content distribution system based on total analysis
CN102880622A (en) Method and system for determining user characteristics on internet
CN101788981A (en) Deep web mobile search method, server and system
Yong-hong et al. Research of data mining based on e-commerce
Yang et al. Micro-blog friend recommendation algorithms based on content and social relationship
Rajan et al. Features and Challenges of web mining systems in emerging technology

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20131106

Termination date: 20150222

EXPY Termination of patent right or utility model