CN106997550A - A kind of method of the ad click rate prediction based on stack self-encoding encoder - Google Patents

A kind of method of the ad click rate prediction based on stack self-encoding encoder Download PDF

Info

Publication number
CN106997550A
CN106997550A CN201710159687.7A CN201710159687A CN106997550A CN 106997550 A CN106997550 A CN 106997550A CN 201710159687 A CN201710159687 A CN 201710159687A CN 106997550 A CN106997550 A CN 106997550A
Authority
CN
China
Prior art keywords
advertisement
inquiry
trained
encoding encoder
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710159687.7A
Other languages
Chinese (zh)
Inventor
梅佳俊
杨长春
杨晋苏
吴云
吴浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou University
Original Assignee
Changzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou University filed Critical Changzhou University
Priority to CN201710159687.7A priority Critical patent/CN106997550A/en
Publication of CN106997550A publication Critical patent/CN106997550A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides a kind of Forecasting Methodology of ad click rate.This method is directed to the characteristics of ad data feature higher-dimension is openness, using K means clustering algorithms and the tensor resolution algorithm reduction dimension of feature and openness, the problem of nonlinearity between feature is associated can not be studied in depth according to shallow Model, the characteristics of being learnt at many levels to feature using deep learning algorithm, from stack self-encoding encoder algorithm, deeper relation between ad data feature is fully excavated.

Description

A kind of method of the ad click rate prediction based on stack self-encoding encoder
Technical field:
Technical field of advertisement is calculated the present invention relates to internet, accurately says it is a kind of advertisement based on stack self-encoding encoder Clicking rate Forecasting Methodology.
Background technology:
Search advertisements have become one of major source of revenues of internet industry at present, are also largest, increase most One of fast advertising channel.For participating in advertiser, advertising media and the user three of search advertisements, on the one hand, advertisement Business delivers advertisement by paying the form of each clicking cost (CostPerClick, CPC) by advertising media, advertising media Income then comes from each clicking cost and predicts joint effect with ad click rate (Click-Through Rate, CTR) and obtain Arrive, i.e. CPC*CTR, the income of the accuracy and advertiser and advertising media of ad click rate prediction is closely bound up.On the other hand, The probability that user clicks on advertisement tapers off trend with the discharge order of advertisement position, ad click rate is predicted and will prediction As a result high advertisement putting can increase the clicking rate to advertisement of user in the forward position of result of page searching.Search advertisements The quality that clicking rate predicts the outcome is directly connected to the income of advertiser and advertising media, therefore, and this research is industry already One of the Hot events on boundary.
This problem is predicted for ad click rate, conventional method is classified, commending system is angularly respectively from hypothesis testing Cut, but these are by designing the method extracted featured aspects acquisition feature and be modeled to user, it is inabundant The characteristics of considering that the higher-dimension that has of ad data is openness, there is nonlinearity association between feature, causes Information Pull not Fully.
The content of the invention:
1. in view of the shortcoming of above prior art, the present invention provides a kind of ad click rate Forecasting Methodology, to improve advertisement The accuracy of clicking rate prediction,
Including following steps:
Step 1:Advertisement-inquiry matrix is set up, K-means clusters are carried out to advertisement and inquiry respectively;
Step 2:Tensor resolution is carried out to user-inquiry-advertisement three-dimensional tensor model;
Step 3:Extract the essential characteristic of influence ad click rate;
Step 4:Using the essential characteristic of selection as the input layer of stack self-encoding encoder, it is trained, obtains higher order combination Feature;
Step 5:By in higher order combination feature input logic regression model, it is trained;
Step 6:Model training is completed, and the model that data input to be predicted is trained is predicted.
2. in some embodiments, step 1 includes:
Step 1-1:With the advertising display number of times provided in experimental data as advertisement AiWith inquiry QjWeight, to set up Advertisement-inquiry matrix
Step 1-2:The advertisement-inquiry matrix is clustered using K-means algorithms;
Step 1-3:Number of users, inquiry number and advertisement number in primary data use N respectivelyu, NqAnd NaRepresent, same type After being clustered inside object, the object belonged in same cluster is represented with same ID, by the user after cluster, inquiry and advertisement Number of clusters uses K respectivelyu, KqAnd KaRepresent.So, number of users, inquiry number and the advertisement number that primary data is concentrated are by original Nu, NqWith NaDimensionality reduction is K respectivelyu, KqAnd Ka
Its advantage is to can solve the problem that the openness harmful effect brought to predicting the outcome of higher-dimension of ad data.
3. in some embodiments, step 4 includes:
Step 1:First the essential characteristic chosen is inputted and will be trained in self-encoding encoder, train obtained weight parameter and Offset parameter b as stack self-encoding encoder input layer and first layer weight and biasing;
Step 2:Obtained output layer will be trained in step 1, be trained as the input layer of self-encoding encoder, obtain first The weight and biasing of layer and the second layer;
Step 3:By that analogy, biasing and the weight of each interlayer are obtained, the training to stack self-encoding encoder is completed.
Its advantage is the higher-dimension assemblage characteristic for resulting in data, excavates the deep layer non-linear relationship in data.
Brief description of the drawings:
Accompanying drawing described herein is used for providing a further understanding of the present invention, the part of the application is constituted, attached In figure
Fig. 1 show ad click rate Forecasting Methodology flow chart;
Embodiment:
High dimensional data in ad data is carried out cluster dimensionality reduction and tensor resolution by the present invention, then the data handled well are carried Feature is taken, the feature of extraction is completed to stack self-encoding encoder as the input of stack self-encoding encoder using successively greedy algorithm Training, the higher order combination feature of acquisition be used for train Logic Regression Models, finally using AUC curves as evaluation index to reality Result is tested to be evaluated.
Data Dimensionality Reduction is an effective means for solving Deta sparseness.For in ad data inside same type object Between there are similarity relationships, analogical object is clustered first, initial aggregated data is obtained;Then, for inhomogeneity The complicated incidence relation existed between type object, is modeled using tensor structure to it, and is approximately opened with tensor resolution method Amount.
The present invention is clustered using the K-means clustering algorithms divided based on distance to inquiry, advertisement and user.Purpose It is that high as far as possible with the object similarity in cluster, acquisition is initial by clustering so that analogical object is aggregated in same cluster Aggregated data.With the advertising display number of times provided in experimental data as advertisement AiWith inquiry QjWeight, come set up advertisement- Inquire about matrixWherein NaRepresent advertisement number, NqRepresent inquiry number, WijRepresent<Ai,Qj>Between weight.To the advertisement- Inquiry matrix is clustered using K-means algorithms.It is below the clustering algorithm so that advertisement is clustered as an example.
Input:Advertisement-inquiry matrix WM×N, cluster number of clusters K
Output:K advertisement gathering is closed
1. pair advertisement-inquiry matrix WM×NScanning, obtains all M advertisements and N number of inquiry, A={ a is denoted as respectively1, a2,…,aMAnd Q={ q1,q2,…,qN};
2. randomly selecting K from M advertisement as initial cluster centre point, T={ t are denoted as1,t2,…,tk};
3. K cluster set { P of initialization1,P2,…,PKIt is empty set;
4. calculate each advertisement aiWith each cluster centre point tjThe distance between, calculation formula is as follows:
(wherein GijRepresent advertisement aiWith the advertisement t as cluster centrejThe query set showed jointly,Respectively It is advertisement aiWith tjWeight (displaying number of times), Dis (ai,tj) represent aiWith tjDistance);
If 5. Dis (ai,tj)=max { D (ai,t1),D(ai,t2),...D(ai,tk), then advertisement aiBelong to cluster Pj
6. calculating the average weight value of all advertisements in same cluster set, cluster centre is regenerated;
7. if the deviation of cluster centre has reached the threshold value of setting, cluster is completed;Otherwise step 4 is gone to recalculate.
Number of users, inquiry number and advertisement number in primary data use N respectivelyu, NqAnd NaRepresent, inside same type object After cluster, the object belonged in same cluster is represented with same ID, and the number of clusters of the user after cluster, inquiry and advertisement is distinguished Use Ku, KqAnd KaRepresent.So, number of users, inquiry number and the advertisement number that primary data is concentrated are by original Nu, NqAnd NaDrop respectively Tie up as Ku, KqAnd Ka
There is ternary relation between user-inquiry-advertisement in click logs data.Traditional dimension reduction method (such as PCA Deng) internal relation between three has been not only broken up, when the data dimension number of degrees are very big, it is easily caused dimension disaster.Therefore, this hair It is bright that user, inquiry and advertisement three-dimensional data are represented with three-dimensional tensor structural model, then carry out dimensionality reduction using tensor resolution method. Amount pattern dimensionality reduction has been sufficiently reserved structural information and internal association between user, inquiry and advertisement, because parameter is less, for For high dimensional data, the dimensionality reduction of tensor pattern has more preferable yojan effect than vector pattern.Then utilize in tensor resolution method Tucker decomposition methods to data carry out dimensionality reduction.
The purpose that Tucker is decomposed is to find one and original tensor H approximate tensor, and is at utmost retained original Tensor information and structural information.Initial tensor H 3 number of dimensions are K respectivelyu, KqAnd Ka, the approximate tensor after dimensionality reduction H ' 3 number of dimensions use I respectivelyu, Iq, IaRepresent.
The characteristics of there is nonlinearity association between the feature of ad data, and higher order polynomial function can be effectively Portray highlights correlations relation.It is non-linear between multitiered network structure Level by level learning feature of the invention using stack self-encoding encoder Association.
Self-encoding encoder is a deep learning algorithm for reappearing initial characteristicses as far as possible, is usually used to study initial data More preferable character representation, by 3 layer network structure compositions:Bottom be input layer, it is middle for hidden layer (new data presentation layer) with And output layer.
The present invention is described as follows using the higher order combination characteristic procedure in stack self-encoding encoder study ad data:
(1) using the initial characteristicses of extraction as the input of model, initial characteristicses are done with feature nonlinear transformation, and to obtain the 1st hidden Hide layer, i.e. low order assemblage characteristic.
(2) low order assemblage characteristic is again passed by nonlinear transformation and obtains relative high-order as the object of new study Assemblage characteristic, this process repeats down, untill reaching the hiding number of plies of setting.
For more preferable learning network weight parameter, the present invention is calculated using the unsupervised learning based on successively greedy training Method.Successively the key of greedy study is the connection weight of successively training network weight parameter, every time a study adjacent two layers node, By Level by level learning with obtain the overall situation stack self-encoding encoder model parameter.Successively greedy method learns stack self-encoding encoder weight The process of parameter is as follows:
(1), by minimizing the reconstructed error of input and output, calculated by input layer to the 1st hidden layer using backpropagation Method training parameter, obtains the 1st potential expression (i.e. the 1st hidden layer) of input data.
(2) last layer characteristic vector is trained weight parameter using same method, obtained as the input for training next layer To another potential expression (i.e. the 2nd hidden layer) of data, the like.
Clicking rate pre-estimation problem is substantially two classification problems based on probability, and the present invention uses logistic regression conduct Click on prediction model.

Claims (3)

1. a kind of ad click rate Forecasting Methodology, it is characterised in that comprise the following steps:
Step 1:Advertisement-inquiry matrix is set up, K-means clusters are carried out to advertisement and inquiry respectively;
Step 2:Tensor resolution is carried out to user-inquiry-advertisement three-dimensional tensor model;
Step 3:Extract the essential characteristic of influence ad click rate;
Step 4:Using the essential characteristic of selection as the input layer of stack self-encoding encoder, it is trained, obtains higher order combination feature;
Step 5:By in higher order combination feature input logic regression model, it is trained;
Step 6:Model training is completed, and the model that data input to be predicted is trained is predicted.
2. ad click rate Forecasting Methodology according to claim 1, it is characterised in that the step 1 includes:
Step 1-1:With the advertising display number of times provided in experimental data as advertisement AiWith inquiry QjWeight, come set up advertisement- Inquire about matrix
Step 1-2:The advertisement-inquiry matrix is clustered using K-means algorithms;
Step 1-3:Number of users, inquiry number and advertisement number in primary data use N respectivelyu, NqAnd NaRepresent, same type object After the cluster of inside, the object belonged in same cluster is represented with same ID, by the number of clusters of the user after cluster, inquiry and advertisement K is used respectivelyu, KqAnd KaRepresent.So, number of users, inquiry number and the advertisement number that primary data is concentrated are by original Nu, NqAnd NaPoint Other dimensionality reduction is Ku, KqAnd Ka
3. ad click rate Forecasting Methodology according to claim 1, it is characterised in that the step 4 includes:
Step 1:First the essential characteristic chosen is inputted and will be trained in self-encoding encoder, obtained weight parameter w and partially is trained Parameter b is put as the weight and biasing of stack self-encoding encoder input layer and first layer;
Step 2:Obtained output layer will be trained in step 1, is trained as the input layer of self-encoding encoder, obtain first layer and The weight of the second layer and biasing;
Step 3:The rest may be inferred, obtains biasing and the weight of each interlayer, completes the training to stack self-encoding encoder.
CN201710159687.7A 2017-03-17 2017-03-17 A kind of method of the ad click rate prediction based on stack self-encoding encoder Pending CN106997550A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710159687.7A CN106997550A (en) 2017-03-17 2017-03-17 A kind of method of the ad click rate prediction based on stack self-encoding encoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710159687.7A CN106997550A (en) 2017-03-17 2017-03-17 A kind of method of the ad click rate prediction based on stack self-encoding encoder

Publications (1)

Publication Number Publication Date
CN106997550A true CN106997550A (en) 2017-08-01

Family

ID=59431465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710159687.7A Pending CN106997550A (en) 2017-03-17 2017-03-17 A kind of method of the ad click rate prediction based on stack self-encoding encoder

Country Status (1)

Country Link
CN (1) CN106997550A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111311328A (en) * 2020-02-20 2020-06-19 支付宝(杭州)信息技术有限公司 Method and device for determining advertisement click rate of product under advertisement channel

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100179855A1 (en) * 2009-01-09 2010-07-15 Ye Chen Large-Scale Behavioral Targeting for Advertising over a Network
CN105787767A (en) * 2016-03-03 2016-07-20 上海珍岛信息技术有限公司 Method and system for obtaining advertisement click-through rate pre-estimation model
CN106156530A (en) * 2016-08-03 2016-11-23 北京好运到信息科技有限公司 Health check-up data analysing method based on stack own coding device and device
CN106485353A (en) * 2016-09-30 2017-03-08 中国科学院遥感与数字地球研究所 Air pollutant concentration forecasting procedure and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100179855A1 (en) * 2009-01-09 2010-07-15 Ye Chen Large-Scale Behavioral Targeting for Advertising over a Network
CN105787767A (en) * 2016-03-03 2016-07-20 上海珍岛信息技术有限公司 Method and system for obtaining advertisement click-through rate pre-estimation model
CN106156530A (en) * 2016-08-03 2016-11-23 北京好运到信息科技有限公司 Health check-up data analysing method based on stack own coding device and device
CN106485353A (en) * 2016-09-30 2017-03-08 中国科学院遥感与数字地球研究所 Air pollutant concentration forecasting procedure and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111311328A (en) * 2020-02-20 2020-06-19 支付宝(杭州)信息技术有限公司 Method and device for determining advertisement click rate of product under advertisement channel
CN111311328B (en) * 2020-02-20 2022-07-01 支付宝(杭州)信息技术有限公司 Method and device for determining advertisement click rate of product under advertisement channel

Similar Documents

Publication Publication Date Title
CN109255506B (en) Internet financial user loan overdue prediction method based on big data
CN111339415B (en) Click rate prediction method and device based on multi-interactive attention network
CN108629630B (en) Advertisement recommendation method based on feature cross-combination deep neural network
CN107958091A (en) A kind of NLP artificial intelligence approaches and interactive system based on financial vertical knowledge mapping
CN112487199B (en) User characteristic prediction method based on user purchasing behavior
CN114117220A (en) Deep reinforcement learning interactive recommendation system and method based on knowledge enhancement
CN113627482B (en) Cross-modal image generation method and device based on audio-touch signal fusion
Lenz et al. Measuring the diffusion of innovations with paragraph vector topic models
Ionescu et al. Overview of the ImageCLEF 2022: Multimedia retrieval in medical, social media and nature applications
CN104462385A (en) Personalized movie similarity calculation method based on user interest model
Wang et al. Predicting and ranking box office revenue of movies based on big data
US20200320381A1 (en) Method to explain factors influencing ai predictions with deep neural networks
US20200265466A1 (en) Interpretable click-through rate prediction through hierarchical attention
CN109766557A (en) A kind of sentiment analysis method, apparatus, storage medium and terminal device
US9858526B2 (en) Method and system using association rules to form custom lists of cookies
CN113254652B (en) Social media posting authenticity detection method based on hypergraph attention network
CN112380433A (en) Recommendation meta-learning method for cold-start user
Huynh et al. Joint age estimation and gender classification of Asian faces using wide ResNet
CN110119448B (en) Semi-supervised cross-domain text classification method based on dual automatic encoders
Chen et al. Estimating ads’ click through rate with recurrent neural network
CN112487305B (en) GCN-based dynamic social user alignment method
CN110910235A (en) Method for detecting abnormal behavior in credit based on user relationship network
CN106997550A (en) A kind of method of the ad click rate prediction based on stack self-encoding encoder
She et al. Research on advertising click-through rate prediction based on CNN-FM hybrid model
CN111651660A (en) Method for cross-media retrieval of difficult samples

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170801