Summary of the invention
In view of this, the invention provides a kind of keyword attribute quantization method and device based on user click data, can access keyword attribute quantized data more accurately.
For reaching above-mentioned purpose, technical scheme of the present invention specifically is achieved in that
A kind of keyword attribute quantization method based on user click data, the method comprises:
Obtain the set be used to the candidate keywords of carrying out attribute quantification;
Determine respectively the quantification marking algorithm of described keyword and corresponding Search Results link thereof, with the mark of the described Search Results link influence factor as the quantification marking result of described keyword; With the mark of the described keyword influence factor as the quantification marking result of described Search Results link, set up the iterative quantization model of described keyword;
Corresponding Search Results link quantizes marking to described keyword according to the click data between described keyword and corresponding Search Results link and described quantitative model; By iterative computation, obtain the quantized data of the described attribute of keyword again.
Preferably, described attribute is the business degree.
Preferably, described set of obtaining be used to the candidate keywords of carrying out attribute quantification comprises:
Grasp electric business's site title channel keyword vertical with commodity as candidate keywords;
Choose the highest N bar keyword of frequency of occurrence as the candidate keywords of business degree attribute quantification from described candidate keywords;
Wherein N is positive integer.
Preferably, described iterative quantization model comprises:
All user click frequency sums that T (x) expression is relevant with x, wherein, x can be keyword or corresponding Search Results link;
When x is keyword, x → y
iDuring expression user search keyword (x), in the Search Results link that obtains, the number of clicks of certain link that the user clicked, during T this moment (x) expression user search keyword (x), the number of clicks sum of all Search Results links of clicking;
When x is the Search Results link, x → y
iThe number of clicks that the same Search Results that the expression user arrives by different keyword searches links; T this moment (x) expression, the number of clicks sum that the same Search Results that the user arrives by all different keyword searches links;
Expression quantizes marking to keyword;
Link quantizes marking to Search Results in expression;
The quantification score of score (q) expression keyword, the quantification score of score (u) expression Search Results link; Click (u
i→ q) represent that certain Search Results links the number of clicks related with corresponding keyword, click (q
i→ u) represent that certain keyword links related number of clicks with corresponding Search Results; γ represents to transfer weight coefficient, score
0Represent initial business degree score.
Preferably, the click data between the link of described keyword and correspondence comprises:
During the user search keyword, in the Search Results link that obtains, the number of clicks of certain link that the user clicked, the number of clicks sum that the same Search Results that the user arrives by all different keyword searches links, during the user search keyword, the number of clicks sum of all Search Results links of clicking.
A kind of keyword attribute quantization device based on user click data, this device comprises:
Choose module, be used for obtaining the set be used to the candidate keywords of carrying out attribute quantification;
Quantitative model is set up module, is used for determining respectively the quantification marking algorithm of described keyword and corresponding Search Results link thereof, with the mark of the described Search Results link influence factor as the quantification marking result of described keyword; With the mark of the described keyword influence factor as the quantification marking result of described Search Results link, set up the iterative quantization model of described keyword;
Quantize computing module, corresponding Search Results link quantizes marking to described keyword for the click data between linking for the Search Results according to described keyword and correspondence and described quantitative model; By iterative computation, obtain the quantized data of the described attribute of keyword again.
Preferably, the described module of choosing comprises:
Placement unit is used for grasping electric business's site title channel keyword vertical with commodity as candidate keywords;
Choose the unit, be used for choosing the highest N bar keyword of frequency of occurrence as the quantification keyword seed of business degree attribute from described candidate keywords; Wherein N is positive integer.
Preferably, described quantitative model is set up the quantitative model that module sets up and is comprised:
All user click frequency sums that T (x) expression is relevant with x, wherein, x can be keyword or corresponding Search Results link;
When x is keyword, x → y
iDuring expression user search keyword (x), in the Search Results link that obtains, the number of clicks of certain link that the user clicked, during T this moment (x) expression user search keyword (x), the number of clicks sum of all Search Results links of clicking;
When x is the Search Results link, x → y
iThe number of clicks that the same Search Results that the expression user arrives by different keyword searches links; T this moment (x) expression, the number of clicks sum that the same Search Results that the user arrives by all different keyword searches links;
Expression quantizes marking to keyword;
Link quantizes marking to Search Results in expression;
The quantification score of score (q) expression keyword, the quantification score of score (u) expression Search Results link; Click (u
i→ q) represent that certain Search Results links the number of clicks related with corresponding keyword, click (q
i→ u) represent that certain keyword links related number of clicks with corresponding Search Results; γ represents to transfer weight coefficient, score
0Represent initial business degree score.
Preferably, described quantification computing module comprises:
Data capture unit, when being used for obtaining the user search keyword, in the Search Results link that obtains, the number of clicks of certain link that the user clicked, the number of clicks sum that the same Search Results that the user arrives by all different keyword searches links, during the user search keyword, the number of clicks sum of all Search Results links of clicking;
Computing unit, corresponding link quantizes marking to described keyword to be used for using described quantitative model to set up the quantitative model that module sets up; By iterative computation, obtain the quantized data of the described attribute of keyword again.
as seen from the above technical solution, this keyword attribute quantization method and device based on user click data of the present invention, combine the analysis to the user search behavior, user click data and keyword are introduced in quantitative model with the relation between linking, thereby the quantification discrimination to keyword attribute is significantly improved, improved simultaneously the complicated character string of some statements, long character string and the query string recognition capability that does not cover keyword, and the comparability of the quantized data of keyword attribute, have very great help for the follow-up application such as sequence, reached good quantification effect.
Embodiment
For making purpose of the present invention, technical scheme and advantage clearer, referring to the accompanying drawing embodiment that develops simultaneously, the present invention is described in more detail.
The main core concept of the present invention is: if on the internet, a webpage is pointed to by a lot of other web page interlinkages, the content that it is described so is subject to general admitting and trust, higher authority is arranged, should have higher rank, we carry out modeling to the user search behavior based on this thought, wherein the keyword of user search is with user's request, and the direct embodiment of demand in search engine is exactly that the user can tend to click the Search Results link consistent with the keyword demand more.Take commercial demand as example, if the keyword of user search has commercial demand, after this keyword of user search, the Search Results link that tendency is clicked just has certain commerciality so; On the contrary, if user search a keyword, the Search Results link that the user is inclined to click all has certain commerciality, so counter this keyword that pushes away itself also has certain commerciality.
Based on above-mentioned thought, hereinafter take commerciality as example, specifically introducing the present invention excavates with the keyword of commercial demand and the set that links with commercial demand by iteration, set up the business degree attribute quantification model of keyword, finally keyword is carried out the method for business measurement by business measurement model and user click data under Unified frame.The quantization method of other attribute of keyword can be with reference to the method for business measurement, and its method is similar, has therefore just repeated no more the quantization method of other attribute of keyword herein.
Fig. 1 is keyword business degree attribute quantification method flow diagram of the present invention, as shown in Figure 1, comprises following flow process:
Step 101 is obtained the quantification keyword seed;
Quantizing keyword seed refers to, set for the candidate keywords of carrying out certain attribute quantification, take business degree attribute as example, concrete obtain manner can obtain from the keyword character string of the title (title) of electric business's webpage or the vertical channel of commercial articles searching, perhaps have gyp webpage from other and obtain in character string with commercial characteristics, perhaps artificial setting all can.
For the quantification keyword seed that gets by different modes, as the title of above-mentioned electric business's webpage or the keyword of the vertical channel of commercial articles searching, can also therefrom extract the final keyword seed that quantizes of N bar keyword conduct that wherein frequency of occurrence is the highest, to dwindle the subsequent quantizatiion calculated amount.
Step 102 is set up the keyword quantitative model;
Determine respectively the quantification marking algorithm of described keyword and corresponding Search Results link thereof, with the mark of the described Search Results link influence factor as the quantification marking result of described keyword; With the mark of the described keyword influence factor as the quantification marking result of described Search Results link, set up the iterative quantization model of described keyword;
Take the business degree as example, can utilize business degree marking algorithm to build business degree keyword quantitative model;
In the present invention, according to above-mentioned description for core concept of the present invention, business degree marking algorithm can be expressed as follows:
(formula 1)
All user click frequency sums that T (x) expression is relevant with x, wherein, x can be keyword or Search Results link corresponding to keyword;
When x is keyword, x → y
iDuring expression user search keyword (x), in the Search Results link that obtains, the number of clicks of certain link that the user clicked, during T this moment (x) expression user search keyword (x), the number of clicks sum of all Search Results links of clicking;
When x is the Search Results link, x → y
iThe number of clicks that the same Search Results that the expression user arrives by different keyword searches links; T this moment (x) expression, the number of clicks sum that the same Search Results that the user arrives by all different keyword searches links.
By formula 1, we can obtain following quantitative formula:
(formula 2)
(formula 3)
Wherein formula 2 expressions are given a mark to keyword, and formula 3 expressions link Search Results gives a mark;
Q represents that query is keyword, and u represents that url is the Search Results link;
The quantification score of score (q) expression keyword, the quantification score of score (u) expression Search Results link; Click (u
i→ q) represent that certain Search Results links the number of clicks related with corresponding keyword, click (q
i→ u) represent that certain keyword links related number of clicks with corresponding Search Results; γ represents to transfer weight coefficient, score
0Represent initial business degree score, generally make a call to an identical mark, as 1 minute, this mark was a mark relatively, embodiment be that relativity and absolute value are irrelevant.
I is used for distinguishing different Search Results link or keyword; T (u), T (q) are as formula 1.
What above-mentioned formula 2,3 integral body embodied is two parts, and front portion is initial business degree score, and rear portion represents to click by the user score of behavioral data iteration, both by transferring weight coefficient γ to carry out combination adjustment, obtains final business degree score.
Carry out iteration based on above-mentioned formula 2 and formula 3 and excavate calculating, can construct quantitative model.The principle of concrete iterative computation as shown in Figure 2, by the mark transmission between keyword and corresponding Search Results link, along with the increase of iteration, mark can change simultaneously, finally reaches iteration stable, each mark embodies the relative size of business degree.
certainly, above-mentioned formula 1, 2, 3 are only one gives an example, its embodiment be that corresponding Search Results link quantizes marking to keyword according to the click data between keyword and corresponding Search Results link, transmit by the iteration mark again, finally obtain the process of the quantized data of keyword, iteration mark transmission between its Search Results that focuses on keyword and correspondence links, it is formula 2, 3 give a mark for keyword and corresponding Search Results link respectively, but the mark that in formula 2, Search Results is linked is as the influence factor of the quantification marking result of self, in formula 3 with the mark of the keyword influence factor as the quantification marking result of self, keyword and corresponding Search Results link thereof are taken into consideration, rather than isolated treating, thereby set up quantitative model more accurately.
Step 103, the Search Results link clicks data corresponding according to keyword and quantitative model carry out iterative quantization calculating to keyword, obtain the quantized data of keyword.
Namely corresponding Search Results link quantizes marking to described keyword according to the click data between described keyword and corresponding Search Results link and described quantitative model; By iterative computation, obtain the quantized data of the described attribute of keyword again.
Can obtain Search Results link clicks data corresponding to keyword from user's keyword-Search Results link clicks data, when being above-mentioned user search keyword, in the Search Results link that obtains, the number of clicks of certain link that the user clicked, the number of clicks sum that the same Search Results that the user arrives by all different keyword searches links, during the user search keyword, number of clicks sum of all Search Results links of clicking etc.; According to user click data, in the formula 3 of substitution quantitative model, can obtain business degree score corresponding to Search Results link, then can obtain the business degree score of this keyword according to formula 2.
Obtain the business degree score of keyword by step 103 after, can utilize this score to carry out follow-up multiple quantification and use, comprise the commerciality sequence, commercial webpage identification, commercial webpage recommending, advertisement support, cheating website identification etc.
For other attribute of keyword, as long as according to above-mentioned identical step, choose the quantification keyword seed that contains this attribute, in the subsequent quantizatiion model, transfer weight coefficient γ, and initial business score score
0Be decided according to the actual requirements, can obtain required quantitative model, and then calculate the quantification score of this attribute of keyword, no longer describe in detail here.
In addition, the present invention also provides a kind of keyword attribute quantization device based on user click data, and as shown in Figure 3, this device comprises:
Choose module 301, be used for choosing the quantification keyword seed of the attribute that contains the needs quantification;
Quantitative model is set up module 302, is used for determining respectively the quantification marking algorithm of described keyword and corresponding Search Results link thereof, with the mark of the described Search Results link influence factor as the quantification marking result of described keyword; With the mark of the described keyword influence factor as the quantification marking result of described Search Results link, set up the iterative quantization model of described keyword;
Quantize computing module 303, corresponding Search Results link quantizes marking to described keyword for the click data between linking for the Search Results according to described keyword and correspondence and described quantitative model; By iterative computation, obtain the quantized data of the described attribute of keyword again.
Wherein, describedly choose module 301 as shown in Figure 4, comprising:
Placement unit 401 is used for grasping electric business's site title channel keyword vertical with commodity as candidate keywords;
Choose unit 402, be used for choosing the highest N bar keyword of frequency of occurrence as the quantification keyword seed of business degree attribute from described candidate keywords; Wherein N is positive integer.
The quantitative model that described quantitative model is set up module 302 foundation comprises:
All user click frequency sums that T (x) expression is relevant with x, wherein, x can be keyword or corresponding Search Results link;
When x is keyword, x → y
iDuring expression user search keyword (x), in the Search Results link that obtains, the number of clicks of certain link that the user clicked, during T this moment (x) expression user search keyword (x), the number of clicks sum of all Search Results links of clicking;
When x is the Search Results link, x → y
iThe number of clicks that the same Search Results that the expression user arrives by different keyword searches links; T this moment (x) expression, the number of clicks sum that the same Search Results that the user arrives by all different keyword searches links;
Expression quantizes marking to keyword;
Link quantizes marking to Search Results in expression;
Q represents that query is keyword, and u represents that url is the Search Results link;
The quantification score of score (q) expression keyword, the quantification score of score (u) expression Search Results link; Click (u
i→ q) represent that certain Search Results links the number of clicks related with corresponding keyword, click (q
i→ u) represent that certain keyword links related number of clicks with corresponding Search Results;
γ represents to transfer weight coefficient, score
0Represent initial business degree score.
As shown in Figure 5, described quantification computing module 303 comprises:
Data capture unit 501, when being used for obtaining the user search keyword, in the Search Results link that obtains, the number of clicks of certain link that the user clicked, the number of clicks sum that the same Search Results that the user arrives by all different keyword searches links, during the user search keyword, the number of clicks sum of all Search Results links of clicking;
Computing unit 502, corresponding link quantizes marking to described keyword to be used for using described quantitative model to set up the quantitative model that module 302 sets up; By iterative computation, obtain the quantized data of the described attribute of keyword again.
By the above embodiments as seen, this keyword attribute quantization method and device based on user click data of the present invention, be applied as example with commerciality marking, the marking accuracy rate of the commercial keyword that obtains and discrimination are all very high, and has comparability, relevance of searches be can greatly improve, the removal of rubbish website cheating page and the effects such as the power of carrying of the business type page greatly improved.
The above is only preferred embodiment of the present invention, and is in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, is equal to replacement, improvement etc., within all should being included in the scope of protection of the invention.