CN106997550A

CN106997550A - A kind of method of the ad click rate prediction based on stack self-encoding encoder

Info

Publication number: CN106997550A
Application number: CN201710159687.7A
Authority: CN
Inventors: 梅佳俊; 杨长春; 杨晋苏; 吴云; 吴浩
Original assignee: Changzhou University
Current assignee: Changzhou University
Priority date: 2017-03-17
Filing date: 2017-03-17
Publication date: 2017-08-01

Abstract

The present invention provides a kind of Forecasting Methodology of ad click rate.This method is directed to the characteristics of ad data feature higher-dimension is openness, using K means clustering algorithms and the tensor resolution algorithm reduction dimension of feature and openness, the problem of nonlinearity between feature is associated can not be studied in depth according to shallow Model, the characteristics of being learnt at many levels to feature using deep learning algorithm, from stack self-encoding encoder algorithm, deeper relation between ad data feature is fully excavated.

Description

A kind of method of the ad click rate prediction based on stack self-encoding encoder

Technical field：

Technical field of advertisement is calculated the present invention relates to internet, accurately says it is a kind of advertisement based on stack self-encoding encoder Clicking rate Forecasting Methodology.

Background technology：

Search advertisements have become one of major source of revenues of internet industry at present, are also largest, increase most One of fast advertising channel.For participating in advertiser, advertising media and the user three of search advertisements, on the one hand, advertisement Business delivers advertisement by paying the form of each clicking cost (CostPerClick, CPC) by advertising media, advertising media Income then comes from each clicking cost and predicts joint effect with ad click rate (Click-Through Rate, CTR) and obtain Arrive, i.e. CPC*CTR, the income of the accuracy and advertiser and advertising media of ad click rate prediction is closely bound up.On the other hand, The probability that user clicks on advertisement tapers off trend with the discharge order of advertisement position, ad click rate is predicted and will prediction As a result high advertisement putting can increase the clicking rate to advertisement of user in the forward position of result of page searching.Search advertisements The quality that clicking rate predicts the outcome is directly connected to the income of advertiser and advertising media, therefore, and this research is industry already One of the Hot events on boundary.

This problem is predicted for ad click rate, conventional method is classified, commending system is angularly respectively from hypothesis testing Cut, but these are by designing the method extracted featured aspects acquisition feature and be modeled to user, it is inabundant The characteristics of considering that the higher-dimension that has of ad data is openness, there is nonlinearity association between feature, causes Information Pull not Fully.

The content of the invention:

1. in view of the shortcoming of above prior art, the present invention provides a kind of ad click rate Forecasting Methodology, to improve advertisement The accuracy of clicking rate prediction,

Including following steps：

Step 1：Advertisement-inquiry matrix is set up, K-means clusters are carried out to advertisement and inquiry respectively；

Step 2：Tensor resolution is carried out to user-inquiry-advertisement three-dimensional tensor model；

Step 3：Extract the essential characteristic of influence ad click rate；

Step 4：Using the essential characteristic of selection as the input layer of stack self-encoding encoder, it is trained, obtains higher order combination Feature；

Step 5：By in higher order combination feature input logic regression model, it is trained；

Step 6：Model training is completed, and the model that data input to be predicted is trained is predicted.

2. in some embodiments, step 1 includes：

Step 1-1：With the advertising display number of times provided in experimental data as advertisement A_iWith inquiry Q_jWeight, to set up Advertisement-inquiry matrix

Step 1-2：The advertisement-inquiry matrix is clustered using K-means algorithms；

Step 1-3：Number of users, inquiry number and advertisement number in primary data use N respectively_u, N_qAnd N_aRepresent, same type After being clustered inside object, the object belonged in same cluster is represented with same ID, by the user after cluster, inquiry and advertisement Number of clusters uses K respectively_u, K_qAnd K_aRepresent.So, number of users, inquiry number and the advertisement number that primary data is concentrated are by original N_u, N_qWith N_aDimensionality reduction is K respectively_u, K_qAnd K_a。

Its advantage is to can solve the problem that the openness harmful effect brought to predicting the outcome of higher-dimension of ad data.

3. in some embodiments, step 4 includes：

Step 1：First the essential characteristic chosen is inputted and will be trained in self-encoding encoder, train obtained weight parameter and Offset parameter b as stack self-encoding encoder input layer and first layer weight and biasing；

Step 2：Obtained output layer will be trained in step 1, be trained as the input layer of self-encoding encoder, obtain first The weight and biasing of layer and the second layer；

Step 3：By that analogy, biasing and the weight of each interlayer are obtained, the training to stack self-encoding encoder is completed.

Its advantage is the higher-dimension assemblage characteristic for resulting in data, excavates the deep layer non-linear relationship in data.

Brief description of the drawings：

Accompanying drawing described herein is used for providing a further understanding of the present invention, the part of the application is constituted, attached In figure

Fig. 1 show ad click rate Forecasting Methodology flow chart；

Embodiment：

High dimensional data in ad data is carried out cluster dimensionality reduction and tensor resolution by the present invention, then the data handled well are carried Feature is taken, the feature of extraction is completed to stack self-encoding encoder as the input of stack self-encoding encoder using successively greedy algorithm Training, the higher order combination feature of acquisition be used for train Logic Regression Models, finally using AUC curves as evaluation index to reality Result is tested to be evaluated.

Data Dimensionality Reduction is an effective means for solving Deta sparseness.For in ad data inside same type object Between there are similarity relationships, analogical object is clustered first, initial aggregated data is obtained；Then, for inhomogeneity The complicated incidence relation existed between type object, is modeled using tensor structure to it, and is approximately opened with tensor resolution method Amount.

The present invention is clustered using the K-means clustering algorithms divided based on distance to inquiry, advertisement and user.Purpose It is that high as far as possible with the object similarity in cluster, acquisition is initial by clustering so that analogical object is aggregated in same cluster Aggregated data.With the advertising display number of times provided in experimental data as advertisement A_iWith inquiry Q_jWeight, come set up advertisement- Inquire about matrixWherein N_aRepresent advertisement number, N_qRepresent inquiry number, W_ijRepresent<A_i,Q_j>Between weight.To the advertisement- Inquiry matrix is clustered using K-means algorithms.It is below the clustering algorithm so that advertisement is clustered as an example.

Input：Advertisement-inquiry matrix W_M×N, cluster number of clusters K

Output：K advertisement gathering is closed

1. pair advertisement-inquiry matrix W_M×NScanning, obtains all M advertisements and N number of inquiry, A={ a is denoted as respectively₁, a₂,…,a_MAnd Q={ q₁,q₂,…,q_N}；

2. randomly selecting K from M advertisement as initial cluster centre point, T={ t are denoted as₁,t₂,…,t_k}；

3. K cluster set { P of initialization₁,P₂,…,P_KIt is empty set；

4. calculate each advertisement a_iWith each cluster centre point t_jThe distance between, calculation formula is as follows：

(wherein G_ijRepresent advertisement a_iWith the advertisement t as cluster centre_jThe query set showed jointly,Respectively It is advertisement a_iWith t_jWeight (displaying number of times), Dis (a_i,t_j) represent a_iWith t_jDistance)；

If 5. Dis (a_i,t_j)=max { D (a_i,t₁),D(a_i,t₂),...D(a_i,t_k), then advertisement a_iBelong to cluster P_j；

6. calculating the average weight value of all advertisements in same cluster set, cluster centre is regenerated；

7. if the deviation of cluster centre has reached the threshold value of setting, cluster is completed；Otherwise step 4 is gone to recalculate.

Number of users, inquiry number and advertisement number in primary data use N respectively_u, N_qAnd N_aRepresent, inside same type object After cluster, the object belonged in same cluster is represented with same ID, and the number of clusters of the user after cluster, inquiry and advertisement is distinguished Use K_u, K_qAnd K_aRepresent.So, number of users, inquiry number and the advertisement number that primary data is concentrated are by original N_u, N_qAnd N_aDrop respectively Tie up as K_u, K_qAnd K_a。

There is ternary relation between user-inquiry-advertisement in click logs data.Traditional dimension reduction method (such as PCA Deng) internal relation between three has been not only broken up, when the data dimension number of degrees are very big, it is easily caused dimension disaster.Therefore, this hair It is bright that user, inquiry and advertisement three-dimensional data are represented with three-dimensional tensor structural model, then carry out dimensionality reduction using tensor resolution method. Amount pattern dimensionality reduction has been sufficiently reserved structural information and internal association between user, inquiry and advertisement, because parameter is less, for For high dimensional data, the dimensionality reduction of tensor pattern has more preferable yojan effect than vector pattern.Then utilize in tensor resolution method Tucker decomposition methods to data carry out dimensionality reduction.

The purpose that Tucker is decomposed is to find one and original tensor H approximate tensor, and is at utmost retained original Tensor information and structural information.Initial tensor H 3 number of dimensions are K respectively_u, K_qAnd K_a, the approximate tensor after dimensionality reduction H ' 3 number of dimensions use I respectively_u, I_q, I_aRepresent.

The characteristics of there is nonlinearity association between the feature of ad data, and higher order polynomial function can be effectively Portray highlights correlations relation.It is non-linear between multitiered network structure Level by level learning feature of the invention using stack self-encoding encoder Association.

Self-encoding encoder is a deep learning algorithm for reappearing initial characteristicses as far as possible, is usually used to study initial data More preferable character representation, by 3 layer network structure compositions：Bottom be input layer, it is middle for hidden layer (new data presentation layer) with And output layer.

The present invention is described as follows using the higher order combination characteristic procedure in stack self-encoding encoder study ad data：

(1) using the initial characteristicses of extraction as the input of model, initial characteristicses are done with feature nonlinear transformation, and to obtain the 1st hidden Hide layer, i.e. low order assemblage characteristic.

(2) low order assemblage characteristic is again passed by nonlinear transformation and obtains relative high-order as the object of new study Assemblage characteristic, this process repeats down, untill reaching the hiding number of plies of setting.

For more preferable learning network weight parameter, the present invention is calculated using the unsupervised learning based on successively greedy training Method.Successively the key of greedy study is the connection weight of successively training network weight parameter, every time a study adjacent two layers node, By Level by level learning with obtain the overall situation stack self-encoding encoder model parameter.Successively greedy method learns stack self-encoding encoder weight The process of parameter is as follows：

(1), by minimizing the reconstructed error of input and output, calculated by input layer to the 1st hidden layer using backpropagation Method training parameter, obtains the 1st potential expression (i.e. the 1st hidden layer) of input data.

(2) last layer characteristic vector is trained weight parameter using same method, obtained as the input for training next layer To another potential expression (i.e. the 2nd hidden layer) of data, the like.

Clicking rate pre-estimation problem is substantially two classification problems based on probability, and the present invention uses logistic regression conduct Click on prediction model.

Claims

1. a kind of ad click rate Forecasting Methodology, it is characterised in that comprise the following steps：

Step 3：Extract the essential characteristic of influence ad click rate；

2. ad click rate Forecasting Methodology according to claim 1, it is characterised in that the step 1 includes：

Step 1-1：With the advertising display number of times provided in experimental data as advertisement A_iWith inquiry Q_jWeight, come set up advertisement- Inquire about matrix

Step 1-3：Number of users, inquiry number and advertisement number in primary data use N respectively_u, N_qAnd N_aRepresent, same type object After the cluster of inside, the object belonged in same cluster is represented with same ID, by the number of clusters of the user after cluster, inquiry and advertisement K is used respectively_u, K_qAnd K_aRepresent.So, number of users, inquiry number and the advertisement number that primary data is concentrated are by original N_u, N_qAnd N_aPoint Other dimensionality reduction is K_u, K_qAnd K_a。

3. ad click rate Forecasting Methodology according to claim 1, it is characterised in that the step 4 includes：

Step 1：First the essential characteristic chosen is inputted and will be trained in self-encoding encoder, obtained weight parameter w and partially is trained Parameter b is put as the weight and biasing of stack self-encoding encoder input layer and first layer；

Step 2：Obtained output layer will be trained in step 1, is trained as the input layer of self-encoding encoder, obtain first layer and The weight of the second layer and biasing；

Step 3：The rest may be inferred, obtains biasing and the weight of each interlayer, completes the training to stack self-encoding encoder.