Embodiment
In order to make the object of the invention, technical scheme and advantage clearer,, the present invention is further elaborated below in conjunction with accompanying drawing and embodiment.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
The quality that technical scheme provided by the invention is come the evaluating network page according to user's operation information to reach the demand of being close to the users, improves the effect of page quality assessment accuracy.
The present invention provides a kind of quality acquisition methods of Webpage, and this method is as shown in Figure 1, and this method is accomplished by server, and this method specifically comprises the steps:
Extract user's operation information in S11, the user's click logs from search engine;
S12, estimate the quality of the corresponding Webpage of this user's operation information according to user's operation information.
Need to prove that the user's operation information among the above-mentioned S11 specifically can comprise: user's click information or user inquiring information.
Wherein user's click information can comprise: temporal characteristics and/or click sequence signature.
Temporal characteristics can comprise the duration of a page of user capture, and the relative duration of the page (all click the ratio of average duration in duration and the affiliated inquiry), the duration accounts for the combination in any in the number percent of retrieval T.T.;
Clicking sequence signature can comprise: this clicks the order in current inquiry; Whether whether whether this is clicked is the click first time of current inquiry, be the last click of current inquiry, be unique one click of current inquiry; Whether this is clicked is the click first time of current inquiry; Whether whether is the last click of current inquiry, be unique one click of current inquiry, the combination in any whether current ordering is hit by point with afterwards the page before the current page.
User inquiring information can comprise: inquiry duration information and/or reading characteristic information;
Wherein inquiring about duration information can comprise: the duration of whole inquiry.
Reading characteristic information specifically can comprise: the user clicks the time interval that the number and/or begin from inquiry of the Webpage of reading is clicked to for the first time in this inquiry.
Optional, the method that realizes S11 specifically can for:
All users of polymerization search engine are to user's click logs of same queries speech, all user's operation information of the polymerization consolidated network page in user's click logs of same queries speech.
Need to prove that above-mentioned user's operation information specifically comprises user's click information and user inquiring information, wherein user inquiring information can also comprise in the following information one or more except the information that comprises foregoing description.
User inquiring information can also comprise: a kind of or combination in any in the clicks of the page that clicks, the number of clicks of average each page are maximum in the page number of clicking in the time interval of the duration of the total degree that the user clicks in the ratio of the number of times of inquiry, the not inquiry of click, the inquiry, the average duration of inquiry, average each click, the click for the first time of average inquiry distance, the inquiry, this inquiry or the click entropy of inquiry.
Wherein click the degree of divergence that entropy has reflected all clicks in the inquiry, if it is big more to click entropy, explain that click is diffusing all the more, show that the page number of clicking in this inquiry is many, perhaps the clicks for each page all compares on average; Relative click entropy is more little, explains that click is concentrated more, and it is few to show as the page number of clicking in this inquiry, and the clicks that perhaps is directed against the minority individual pages is much larger than being directed against other page clicks.
Clicking entropy uses following formula to calculate; Suppose in current inquiry; Total total n page clicked, and the probability that i page clicked is pi:
When polymerization; Publicity approximate treatment below the Probability p i that the page is clicked uses, wherein clicki representes the number of times that i the page clicked:
In addition, the method that realizes S12 specifically can for:
Machine learning machine after the utilization training is estimated the quality of the corresponding Webpage of this user's operation information according to user's operation information.
Need to prove, above-mentioned machine learning facility body can for: SVMs can certainly be other learning machine.Machine learning machine after the above-mentioned training can be the good machine learning machine of training in advance; The concrete grammar of this training machine learning machine can for: from Webpage, select arbitrarily a certain amount of page to carry out manual work evaluation; Obtain the artificial page quality of estimating; All user's operation information of the artificial webpage of estimating of polymerization obtain polymerization result, and this manual work evaluation quality, the page and this polymerization result are trained the machine learning machine as the training sample of machine learning machine.
The method of wherein the machine learning machine being trained can for: be used for training from the data of sample extraction 2/3 (also can be other ratio), other 1/3 is used for evaluation and test.Certainly in actual conditions; Can also detect the learning machine prediction result; The method of its detection can for: after training is accomplished, use the machine learning machine after the training that the page of manual work evaluation is predicted, predicted the outcome; Comparison predicts the outcome and artificial evaluation quality, to estimate the prediction effect of machine learning machine.
Need to prove that can also adopt alternate manner to realize S12, for example, directly the height through user's operation information obtains the Webpage quality, can also be other method certainly, the present invention does not limit to the concrete implementation of this method.
The technique effect of technical scheme provided by the invention is described through principle of work of the present invention below.
The quality of the quality of Webpage generally is divided three classes, the first kind: the high-quality page, and high with the key word of the inquiry degree of association, this high-quality page user is also big to the possibility of its operation, and corresponding user's operation information is also high; Second type: the middle quality page, common with the key word of the inquiry degree of association, this middle quality page user is also general to the possibility of its operation, and corresponding user's operation information is also general; The 3rd type, the inferior quality page, low with the key word of the inquiry degree of association, this inferior quality page user is little to the possibility of its operation, and corresponding user's operation information is also low, and this page generally can be for inquiring about uncorrelated or practising fraud the page etc.So the quality of the quality of Webpage is all directly related with user's operation information; Technical scheme provided by the invention just is based on the acquisition methods that this point proposed a kind of new Webpage and estimates page quality; Improve the accuracy of page quality assessment; Because this method need not considered the link of this webpage when webpage is carried out quality assessment, be effectively to improve page quality so adopt raising link page method for quality, can not influence the accuracy of page quality assessment yet.In addition, method provided by the invention adopts the machine learning machine to come the evaluating network page quality, has improved work efficiency.
The present invention also provides a kind of quality of Webpage to obtain system, and this system is as shown in Figure 2, comprising:
Extraction unit 21 is used for from user's click logs of search engine, extracting user's operation information;
Operation evaluation unit 22 is used for the quality according to the corresponding Webpage of this user's operation information of user's operation information evaluation.
The concrete manifestation form of above-mentioned user's operation information can repeat no more referring to the explanation among the method embodiment here.
Optional, extraction unit 21 specifically can comprise:
Log aggregation module 211 is used for the user click logs of all users of polymerization search engine to the same queries speech;
Information fusion unit 212 is used for all user's operation information at user's click logs polymerization consolidated network page of same queries speech.
Optional, aforesaid operations estimates 22, specifically can also be used to utilize the good machine learning machine of training in advance to estimate the quality of the corresponding Webpage of this user's operation information according to user's operation information.
The quality that system provided by the invention comes the evaluating network page based on user's operation information, thus the demand that the quality of its evaluating network page is close to the users more, so it has the advantage of the quality assessment accuracy that improves Webpage.
The present invention also provides a kind of server, and this server comprises that the quality of above-mentioned Webpage obtains system.
It should be noted that the system in the foregoing description, each included unit is just divided according to function logic, but is not limited to above-mentioned division, as long as can realize function corresponding; In addition, the concrete title of each functional unit also just for the ease of mutual differentiation, is not limited to protection scope of the present invention.
In addition; One of ordinary skill in the art will appreciate that all or part of step that realizes in the foregoing description method is to instruct relevant hardware to accomplish through program; Corresponding program can be stored in a kind of computer-readable recording medium; The above-mentioned storage medium of mentioning can be a ROM (read-only memory), disk or CD etc.
In sum, technical scheme provided by the invention has the advantage of the quality assessment accuracy that improves Webpage.
The above is merely preferred embodiment of the present invention, not in order to restriction the present invention, all any modifications of within spirit of the present invention and principle, being done, is equal to and replaces and improvement etc., all should be included within protection scope of the present invention.