CN102486774A - Method and system for acquiring quality of webpage and server - Google Patents

Method and system for acquiring quality of webpage and server Download PDF

Info

Publication number
CN102486774A
CN102486774A CN201010568551XA CN201010568551A CN102486774A CN 102486774 A CN102486774 A CN 102486774A CN 201010568551X A CN201010568551X A CN 201010568551XA CN 201010568551 A CN201010568551 A CN 201010568551A CN 102486774 A CN102486774 A CN 102486774A
Authority
CN
China
Prior art keywords
user
inquiry
click
page
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201010568551XA
Other languages
Chinese (zh)
Inventor
冯超
贺海军
张锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shiji Guangsu Information Technology Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201010568551XA priority Critical patent/CN102486774A/en
Publication of CN102486774A publication Critical patent/CN102486774A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention is suitable for the field of the Internet and provides a method and a system for acquiring the quality of a webpage and a server. The method comprises the following steps: extracting user operating information from a user click log in a search engine; evaluating the quality of the webpage which corresponds to the user operating information according to the user operating information. The technical scheme provided by the invention has the advantage of improving the accuracy of evaluating the quality of the webpage.

Description

A kind of quality acquisition methods, system and server of Webpage
Technical field
The invention belongs to internet arena, relate in particular to a kind of quality acquisition methods, system and server of Webpage.
Background technology
Development along with Internet technology; More and more users is obtained information through the page of internet; For example through Webpage in this key word correspondence of search website inputted search keyword query; The arrangement of the Webpage that searches out through search website neither lack of alignment, and the quality according to the Webpage quality sorts usually.For the quality of the evaluating network page, the quality acquisition methods of the Webpage of employing has: Pagerank, Pagerank are a kind of based on the hyperlink relationship analysis page method for quality between the page.Its thought is that the page quality of being pointed to so also should be preferably if a quality exists a link to point to another page in the page preferably.This method has been specified the initial score value of the page in implementation process, let these score values outwards propagate along the chain that goes out of the page, and every propagation once all is the process of once giving a mark.The summation of the score value that each page is received is exactly that epicycle is propagated the marking to this page.Constantly repeat such marking process, all tend towards stability up to the mark of each page, the mark of this moment is exactly the final mark of the page.Obtain the high more Webpage of score value at last and just be considered to the high more Webpage of quality.
The method of the technical scheme that prior art provides is to confirm the quality of this page through the quality of the link page in the page; Adopt raising link page method for quality can both effectively improve page quality; For example adopt exchange to link methods such as perhaps linking factory and can both effectively improve page quality; The page quality of linked web pages is not the information that the user needs, so the Webpage quality that causes estimating according to the quality of the height evaluating network page of link page quality is inaccurate.
Summary of the invention
The embodiment of the invention provides a kind of quality acquisition methods of Webpage, is intended to solve the inaccurate problem of page quality assessment in the prior art.
The embodiment of the invention is achieved in that the present invention provides a kind of quality acquisition methods of Webpage, and said method comprises:
Extract user's operation information in user's click logs from search engine;
Estimate the quality of the corresponding Webpage of this user's operation information according to user's operation information.
The embodiment of the invention also provides a kind of quality of Webpage to obtain system, and said system comprises:
Extraction unit is used for from user's click logs of search engine, extracting user's operation information;
The operation evaluation unit is used for the quality according to the corresponding Webpage of this user's operation information of user's operation information evaluation.
The present invention also provides a kind of server, and this server comprises that the quality of above-mentioned Webpage obtains system.
The embodiment of the invention compared with prior art; Beneficial effect is: technical scheme of the present invention is come the quality of the evaluating network page through user's operation information; Because the demand that user's operation information more is close to the users; So the accuracy of the quality of the Webpage that the employing technique scheme is estimated is higher, so it has the advantage that improves Webpage evaluation quality accuracy.
Description of drawings
Fig. 1 is the process flow diagram of the quality acquisition methods of Webpage provided by the invention;
Fig. 2 is the structural drawing that the quality of Webpage provided by the invention is obtained system.
Embodiment
In order to make the object of the invention, technical scheme and advantage clearer,, the present invention is further elaborated below in conjunction with accompanying drawing and embodiment.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
The quality that technical scheme provided by the invention is come the evaluating network page according to user's operation information to reach the demand of being close to the users, improves the effect of page quality assessment accuracy.
The present invention provides a kind of quality acquisition methods of Webpage, and this method is as shown in Figure 1, and this method is accomplished by server, and this method specifically comprises the steps:
Extract user's operation information in S11, the user's click logs from search engine;
S12, estimate the quality of the corresponding Webpage of this user's operation information according to user's operation information.
Need to prove that the user's operation information among the above-mentioned S11 specifically can comprise: user's click information or user inquiring information.
Wherein user's click information can comprise: temporal characteristics and/or click sequence signature.
Temporal characteristics can comprise the duration of a page of user capture, and the relative duration of the page (all click the ratio of average duration in duration and the affiliated inquiry), the duration accounts for the combination in any in the number percent of retrieval T.T.;
Clicking sequence signature can comprise: this clicks the order in current inquiry; Whether whether whether this is clicked is the click first time of current inquiry, be the last click of current inquiry, be unique one click of current inquiry; Whether this is clicked is the click first time of current inquiry; Whether whether is the last click of current inquiry, be unique one click of current inquiry, the combination in any whether current ordering is hit by point with afterwards the page before the current page.
User inquiring information can comprise: inquiry duration information and/or reading characteristic information;
Wherein inquiring about duration information can comprise: the duration of whole inquiry.
Reading characteristic information specifically can comprise: the user clicks the time interval that the number and/or begin from inquiry of the Webpage of reading is clicked to for the first time in this inquiry.
Optional, the method that realizes S11 specifically can for:
All users of polymerization search engine are to user's click logs of same queries speech, all user's operation information of the polymerization consolidated network page in user's click logs of same queries speech.
Need to prove that above-mentioned user's operation information specifically comprises user's click information and user inquiring information, wherein user inquiring information can also comprise in the following information one or more except the information that comprises foregoing description.
User inquiring information can also comprise: a kind of or combination in any in the clicks of the page that clicks, the number of clicks of average each page are maximum in the page number of clicking in the time interval of the duration of the total degree that the user clicks in the ratio of the number of times of inquiry, the not inquiry of click, the inquiry, the average duration of inquiry, average each click, the click for the first time of average inquiry distance, the inquiry, this inquiry or the click entropy of inquiry.
Wherein click the degree of divergence that entropy has reflected all clicks in the inquiry, if it is big more to click entropy, explain that click is diffusing all the more, show that the page number of clicking in this inquiry is many, perhaps the clicks for each page all compares on average; Relative click entropy is more little, explains that click is concentrated more, and it is few to show as the page number of clicking in this inquiry, and the clicks that perhaps is directed against the minority individual pages is much larger than being directed against other page clicks.
Clicking entropy uses following formula to calculate; Suppose in current inquiry; Total total n page clicked, and the probability that i page clicked is pi:
Figure BDA0000035595930000041
When polymerization; Publicity approximate treatment below the Probability p i that the page is clicked uses, wherein clicki representes the number of times that i the page clicked:
In addition, the method that realizes S12 specifically can for:
Machine learning machine after the utilization training is estimated the quality of the corresponding Webpage of this user's operation information according to user's operation information.
Need to prove, above-mentioned machine learning facility body can for: SVMs can certainly be other learning machine.Machine learning machine after the above-mentioned training can be the good machine learning machine of training in advance; The concrete grammar of this training machine learning machine can for: from Webpage, select arbitrarily a certain amount of page to carry out manual work evaluation; Obtain the artificial page quality of estimating; All user's operation information of the artificial webpage of estimating of polymerization obtain polymerization result, and this manual work evaluation quality, the page and this polymerization result are trained the machine learning machine as the training sample of machine learning machine.
The method of wherein the machine learning machine being trained can for: be used for training from the data of sample extraction 2/3 (also can be other ratio), other 1/3 is used for evaluation and test.Certainly in actual conditions; Can also detect the learning machine prediction result; The method of its detection can for: after training is accomplished, use the machine learning machine after the training that the page of manual work evaluation is predicted, predicted the outcome; Comparison predicts the outcome and artificial evaluation quality, to estimate the prediction effect of machine learning machine.
Need to prove that can also adopt alternate manner to realize S12, for example, directly the height through user's operation information obtains the Webpage quality, can also be other method certainly, the present invention does not limit to the concrete implementation of this method.
The technique effect of technical scheme provided by the invention is described through principle of work of the present invention below.
The quality of the quality of Webpage generally is divided three classes, the first kind: the high-quality page, and high with the key word of the inquiry degree of association, this high-quality page user is also big to the possibility of its operation, and corresponding user's operation information is also high; Second type: the middle quality page, common with the key word of the inquiry degree of association, this middle quality page user is also general to the possibility of its operation, and corresponding user's operation information is also general; The 3rd type, the inferior quality page, low with the key word of the inquiry degree of association, this inferior quality page user is little to the possibility of its operation, and corresponding user's operation information is also low, and this page generally can be for inquiring about uncorrelated or practising fraud the page etc.So the quality of the quality of Webpage is all directly related with user's operation information; Technical scheme provided by the invention just is based on the acquisition methods that this point proposed a kind of new Webpage and estimates page quality; Improve the accuracy of page quality assessment; Because this method need not considered the link of this webpage when webpage is carried out quality assessment, be effectively to improve page quality so adopt raising link page method for quality, can not influence the accuracy of page quality assessment yet.In addition, method provided by the invention adopts the machine learning machine to come the evaluating network page quality, has improved work efficiency.
The present invention also provides a kind of quality of Webpage to obtain system, and this system is as shown in Figure 2, comprising:
Extraction unit 21 is used for from user's click logs of search engine, extracting user's operation information;
Operation evaluation unit 22 is used for the quality according to the corresponding Webpage of this user's operation information of user's operation information evaluation.
The concrete manifestation form of above-mentioned user's operation information can repeat no more referring to the explanation among the method embodiment here.
Optional, extraction unit 21 specifically can comprise:
Log aggregation module 211 is used for the user click logs of all users of polymerization search engine to the same queries speech;
Information fusion unit 212 is used for all user's operation information at user's click logs polymerization consolidated network page of same queries speech.
Optional, aforesaid operations estimates 22, specifically can also be used to utilize the good machine learning machine of training in advance to estimate the quality of the corresponding Webpage of this user's operation information according to user's operation information.
The quality that system provided by the invention comes the evaluating network page based on user's operation information, thus the demand that the quality of its evaluating network page is close to the users more, so it has the advantage of the quality assessment accuracy that improves Webpage.
The present invention also provides a kind of server, and this server comprises that the quality of above-mentioned Webpage obtains system.
It should be noted that the system in the foregoing description, each included unit is just divided according to function logic, but is not limited to above-mentioned division, as long as can realize function corresponding; In addition, the concrete title of each functional unit also just for the ease of mutual differentiation, is not limited to protection scope of the present invention.
In addition; One of ordinary skill in the art will appreciate that all or part of step that realizes in the foregoing description method is to instruct relevant hardware to accomplish through program; Corresponding program can be stored in a kind of computer-readable recording medium; The above-mentioned storage medium of mentioning can be a ROM (read-only memory), disk or CD etc.
In sum, technical scheme provided by the invention has the advantage of the quality assessment accuracy that improves Webpage.
The above is merely preferred embodiment of the present invention, not in order to restriction the present invention, all any modifications of within spirit of the present invention and principle, being done, is equal to and replaces and improvement etc., all should be included within protection scope of the present invention.

Claims (11)

1. the quality acquisition methods of a Webpage is characterized in that, said method comprises:
Extract user's operation information in user's click logs from search engine;
Estimate the quality of the corresponding Webpage of this user's operation information according to user's operation information.
2. method according to claim 1 is characterized in that, said user's operation information specifically comprises: user's click information;
Wherein, said user's click information comprises: at least one in temporal characteristics or the click sequence signature;
Said temporal characteristics specifically comprises: the duration of accession page, relative duration of the page, duration account for the combination in any in the number percent of retrieving T.T.;
Said click sequence signature specifically comprises: this clicks the order in current inquiry; Whether this is clicked is the click first time of current inquiry; Whether be the last click of current inquiry; Whether be unique one click of current inquiry; Whether this is clicked is the click first time of current inquiry; Whether be the last click of current inquiry; Whether be unique one click of current inquiry; The combination in any whether current ordering was clicked with the page afterwards before current page.
3. method according to claim 1 is characterized in that, said user's operation information specifically comprises: user inquiring information
Said user inquiring information comprises: at least one in inquiry duration information or the reading characteristic information;
Said inquiry duration information comprises: the duration of whole inquiry;
Said reading characteristic information comprises: in this inquiry the user click reading Webpage number or begin at least one to the time interval of clicking for the first time from inquiry;
Said user inquiring information also comprises: the combination in any in the clicks of the page that clicks, the number of clicks of average each page are maximum in the page number of clicking in the time interval of the duration of the total degree that the user clicks in the ratio of the number of times of inquiry, the not inquiry of click, the inquiry, the average duration of inquiry, average each click, the click for the first time of average inquiry distance, the inquiry, this inquiry or the click entropy of inquiry.
4. method according to claim 1 is characterized in that, the step of extracting user's operation information in said user's click logs from search engine specifically comprises:
All users of polymerization search engine are to user's click logs of same queries speech, all user's operation information of the polymerization consolidated network page in user's click logs of same queries speech.
5. method according to claim 1 is characterized in that, the step of said quality according to the corresponding Webpage of this user's operation information of user's operation information evaluation specifically comprises:
Utilize the good machine learning machine of training in advance to estimate the quality of the corresponding Webpage of this user's operation information according to user's operation information.
6. the quality of a Webpage is obtained system, it is characterized in that, said system comprises:
Extraction unit is used for from user's click logs of search engine, extracting user's operation information;
The operation evaluation unit is used for the quality according to the corresponding Webpage of this user's operation information of user's operation information evaluation.
7. system according to claim 6 is characterized in that, said user's operation information specifically comprises: user's click information;
Wherein, said user's click information comprises: at least one in temporal characteristics or the click sequence signature;
Said time response specifically comprises: the duration of accession page, relative duration of the page, duration account for the combination in any in the number percent of retrieving T.T.;
Said click sequence signature specifically comprises: this clicks the order in current inquiry; Whether this is clicked is the click first time of current inquiry; Whether be the last click of current inquiry; Whether be unique one click of current inquiry; Whether this is clicked is the click first time of current inquiry; Whether be the last click of current inquiry; Whether be unique one click of current inquiry; The combination in any whether current ordering was clicked with the page afterwards before current page.
8. system according to claim 6 is characterized in that, said user's operation information specifically comprises: user inquiring information;
Said user inquiring information comprises: at least one in inquiry duration information or the reading characteristic information;
Said inquiry duration information comprises: the duration of whole inquiry;
Said reading characteristic information comprises: in this inquiry the user click reading Webpage number or begin at least one to the time interval of clicking for the first time from inquiry;
Said user inquiring information also comprises: the combination in any in the clicks of the page that clicks, the number of clicks of average each page are maximum in the page number of clicking in the time interval of the duration of the total degree that the user clicks in the ratio of the number of times of inquiry, the not inquiry of click, the inquiry, the average duration of inquiry, average each click, the click for the first time of average inquiry distance, the inquiry, this inquiry or the click entropy of inquiry.
9. system according to claim 6 is characterized in that, said extraction unit specifically comprises:
The log aggregation module is used for the user click logs of all users of polymerization search engine to the same queries speech;
The information fusion unit is used for all user's operation information at user's click logs polymerization consolidated network page of same queries speech.
10. system according to claim 6 is characterized in that, said operation evaluation specifically is used to utilize the good machine learning machine of training in advance to estimate the quality of the corresponding Webpage of this user's operation information according to user's operation information.
11. a server is characterized in that, said server comprises like the quality of the arbitrary described Webpage of claim 6-10 and obtains system.
CN201010568551XA 2010-12-01 2010-12-01 Method and system for acquiring quality of webpage and server Pending CN102486774A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010568551XA CN102486774A (en) 2010-12-01 2010-12-01 Method and system for acquiring quality of webpage and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010568551XA CN102486774A (en) 2010-12-01 2010-12-01 Method and system for acquiring quality of webpage and server

Publications (1)

Publication Number Publication Date
CN102486774A true CN102486774A (en) 2012-06-06

Family

ID=46152267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010568551XA Pending CN102486774A (en) 2010-12-01 2010-12-01 Method and system for acquiring quality of webpage and server

Country Status (1)

Country Link
CN (1) CN102486774A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544257A (en) * 2013-10-15 2014-01-29 北京国双科技有限公司 Method and device for webpage quality detection
CN104050197A (en) * 2013-03-15 2014-09-17 腾讯科技(深圳)有限公司 Evaluation method and device for information retrieval system
CN104615680A (en) * 2015-01-21 2015-05-13 广州神马移动信息科技有限公司 Method and device for establishing web page quality model
CN105989071A (en) * 2015-02-10 2016-10-05 阿里巴巴集团控股有限公司 Method and device for obtaining user network operation characteristics
CN106886554A (en) * 2016-12-27 2017-06-23 苏州思杰马克丁软件有限公司 A kind of determination method and device of article quality
CN113806660A (en) * 2021-09-17 2021-12-17 北京百度网讯科技有限公司 Data evaluation method, training method, device, electronic device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001001217A2 (en) * 1999-06-29 2001-01-04 Colorstamps, Inc. Electronic market maker of electronic attention
CN101030210A (en) * 2006-10-08 2007-09-05 胡继强 Method for searching sort by user action

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001001217A2 (en) * 1999-06-29 2001-01-04 Colorstamps, Inc. Electronic market maker of electronic attention
WO2001001217A3 (en) * 1999-06-29 2001-04-26 Colorstamps Inc Electronic market maker of electronic attention
CN101030210A (en) * 2006-10-08 2007-09-05 胡继强 Method for searching sort by user action

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050197A (en) * 2013-03-15 2014-09-17 腾讯科技(深圳)有限公司 Evaluation method and device for information retrieval system
CN104050197B (en) * 2013-03-15 2018-08-17 腾讯科技(深圳)有限公司 A kind of information retrieval system evaluating method and device
CN103544257A (en) * 2013-10-15 2014-01-29 北京国双科技有限公司 Method and device for webpage quality detection
CN103544257B (en) * 2013-10-15 2017-01-18 北京国双科技有限公司 Method and device for webpage quality detection
CN104615680A (en) * 2015-01-21 2015-05-13 广州神马移动信息科技有限公司 Method and device for establishing web page quality model
WO2016115944A1 (en) * 2015-01-21 2016-07-28 广州神马移动信息科技有限公司 Method and device for establishing webpage quality model
US10891350B2 (en) 2015-01-21 2021-01-12 Guangzhou Shenma Mobile Information Technology Co., Ltd. Method and device for establishing webpage quality model
CN105989071A (en) * 2015-02-10 2016-10-05 阿里巴巴集团控股有限公司 Method and device for obtaining user network operation characteristics
CN106886554A (en) * 2016-12-27 2017-06-23 苏州思杰马克丁软件有限公司 A kind of determination method and device of article quality
CN113806660A (en) * 2021-09-17 2021-12-17 北京百度网讯科技有限公司 Data evaluation method, training method, device, electronic device and storage medium
CN113806660B (en) * 2021-09-17 2024-04-26 北京百度网讯科技有限公司 Data evaluation method, training device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN100565526C (en) A kind of anti-cheat method and system at the webpage cheating
CN102486774A (en) Method and system for acquiring quality of webpage and server
CN101685521A (en) Method for showing advertisements in webpage and system
CN102915314B (en) A kind of Automatic error correction pair generation method and system
US8255386B1 (en) Selection of documents to place in search index
CN1637741B (en) Annotation management in pen-based computing system
CN103186574B (en) A kind of generation method and apparatus of Search Results
CN101329687B (en) Method for positioning news web page
US20150356072A1 (en) Method and Apparatus of Matching Text Information and Pushing a Business Object
CN104462293A (en) Search processing method and method and device for generating search result ranking model
CN103365839A (en) Recommendation search method and device for search engines
CN105045901A (en) Search keyword push method and device
CN101641697A (en) Related search queries for a webpage and their applications
CN101866341A (en) Information push method, device and system
CN103365910A (en) Method and system for information retrieval
CN105022761A (en) Group search method and apparatus
CN102722499B (en) Search engine and implementation method thereof
CN102801709A (en) Phishing website identification system and method
CN1996316A (en) Search engine searching method based on web page correlation
CN103902597A (en) Method and device for determining search relevant categories corresponding to target keywords
US20120233096A1 (en) Optimizing an index of web documents
CN103324745A (en) Text garbage identifying method and system based on Bayesian model
CN102663022A (en) Classification recognition method based on URL (uniform resource locator)
CN103425650A (en) Recommendation searching method and recommendation searching system
CN104268142A (en) Meta search result ranking algorithm based on rejection strategy

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: SHENZHEN SHIJI LIGHT SPEED INFORMATION TECHNOLOGY

Free format text: FORMER OWNER: TENGXUN SCI-TECH (SHENZHEN) CO., LTD.

Effective date: 20131018

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 518044 SHENZHEN, GUANGDONG PROVINCE TO: 518057 SHENZHEN, GUANGDONG PROVINCE

TA01 Transfer of patent application right

Effective date of registration: 20131018

Address after: A Tencent Building in Shenzhen Nanshan District City, Guangdong streets in Guangdong province science and technology 518057 16

Applicant after: Shenzhen Shiji Guangsu Information Technology Co., Ltd.

Address before: Shenzhen Futian District City, Guangdong province 518044 Zhenxing Road, SEG Science Park 2 East Room 403

Applicant before: Tencent Technology (Shenzhen) Co., Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20120606