CN106997550A - A kind of method of the ad click rate prediction based on stack self-encoding encoder - Google Patents
A kind of method of the ad click rate prediction based on stack self-encoding encoder Download PDFInfo
- Publication number
- CN106997550A CN106997550A CN201710159687.7A CN201710159687A CN106997550A CN 106997550 A CN106997550 A CN 106997550A CN 201710159687 A CN201710159687 A CN 201710159687A CN 106997550 A CN106997550 A CN 106997550A
- Authority
- CN
- China
- Prior art keywords
- advertisement
- inquiry
- trained
- encoding encoder
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0242—Determining effectiveness of advertisements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0202—Market predictions or forecasting for commercial activities
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention provides a kind of Forecasting Methodology of ad click rate.This method is directed to the characteristics of ad data feature higher-dimension is openness, using K means clustering algorithms and the tensor resolution algorithm reduction dimension of feature and openness, the problem of nonlinearity between feature is associated can not be studied in depth according to shallow Model, the characteristics of being learnt at many levels to feature using deep learning algorithm, from stack self-encoding encoder algorithm, deeper relation between ad data feature is fully excavated.
Description
Technical field:
Technical field of advertisement is calculated the present invention relates to internet, accurately says it is a kind of advertisement based on stack self-encoding encoder
Clicking rate Forecasting Methodology.
Background technology:
Search advertisements have become one of major source of revenues of internet industry at present, are also largest, increase most
One of fast advertising channel.For participating in advertiser, advertising media and the user three of search advertisements, on the one hand, advertisement
Business delivers advertisement by paying the form of each clicking cost (CostPerClick, CPC) by advertising media, advertising media
Income then comes from each clicking cost and predicts joint effect with ad click rate (Click-Through Rate, CTR) and obtain
Arrive, i.e. CPC*CTR, the income of the accuracy and advertiser and advertising media of ad click rate prediction is closely bound up.On the other hand,
The probability that user clicks on advertisement tapers off trend with the discharge order of advertisement position, ad click rate is predicted and will prediction
As a result high advertisement putting can increase the clicking rate to advertisement of user in the forward position of result of page searching.Search advertisements
The quality that clicking rate predicts the outcome is directly connected to the income of advertiser and advertising media, therefore, and this research is industry already
One of the Hot events on boundary.
This problem is predicted for ad click rate, conventional method is classified, commending system is angularly respectively from hypothesis testing
Cut, but these are by designing the method extracted featured aspects acquisition feature and be modeled to user, it is inabundant
The characteristics of considering that the higher-dimension that has of ad data is openness, there is nonlinearity association between feature, causes Information Pull not
Fully.
The content of the invention:
1. in view of the shortcoming of above prior art, the present invention provides a kind of ad click rate Forecasting Methodology, to improve advertisement
The accuracy of clicking rate prediction,
Including following steps:
Step 1:Advertisement-inquiry matrix is set up, K-means clusters are carried out to advertisement and inquiry respectively;
Step 2:Tensor resolution is carried out to user-inquiry-advertisement three-dimensional tensor model;
Step 3:Extract the essential characteristic of influence ad click rate;
Step 4:Using the essential characteristic of selection as the input layer of stack self-encoding encoder, it is trained, obtains higher order combination
Feature;
Step 5:By in higher order combination feature input logic regression model, it is trained;
Step 6:Model training is completed, and the model that data input to be predicted is trained is predicted.
2. in some embodiments, step 1 includes:
Step 1-1:With the advertising display number of times provided in experimental data as advertisement AiWith inquiry QjWeight, to set up
Advertisement-inquiry matrix
Step 1-2:The advertisement-inquiry matrix is clustered using K-means algorithms;
Step 1-3:Number of users, inquiry number and advertisement number in primary data use N respectivelyu, NqAnd NaRepresent, same type
After being clustered inside object, the object belonged in same cluster is represented with same ID, by the user after cluster, inquiry and advertisement
Number of clusters uses K respectivelyu, KqAnd KaRepresent.So, number of users, inquiry number and the advertisement number that primary data is concentrated are by original Nu, NqWith
NaDimensionality reduction is K respectivelyu, KqAnd Ka。
Its advantage is to can solve the problem that the openness harmful effect brought to predicting the outcome of higher-dimension of ad data.
3. in some embodiments, step 4 includes:
Step 1:First the essential characteristic chosen is inputted and will be trained in self-encoding encoder, train obtained weight parameter and
Offset parameter b as stack self-encoding encoder input layer and first layer weight and biasing;
Step 2:Obtained output layer will be trained in step 1, be trained as the input layer of self-encoding encoder, obtain first
The weight and biasing of layer and the second layer;
Step 3:By that analogy, biasing and the weight of each interlayer are obtained, the training to stack self-encoding encoder is completed.
Its advantage is the higher-dimension assemblage characteristic for resulting in data, excavates the deep layer non-linear relationship in data.
Brief description of the drawings:
Accompanying drawing described herein is used for providing a further understanding of the present invention, the part of the application is constituted, attached
In figure
Fig. 1 show ad click rate Forecasting Methodology flow chart;
Embodiment:
High dimensional data in ad data is carried out cluster dimensionality reduction and tensor resolution by the present invention, then the data handled well are carried
Feature is taken, the feature of extraction is completed to stack self-encoding encoder as the input of stack self-encoding encoder using successively greedy algorithm
Training, the higher order combination feature of acquisition be used for train Logic Regression Models, finally using AUC curves as evaluation index to reality
Result is tested to be evaluated.
Data Dimensionality Reduction is an effective means for solving Deta sparseness.For in ad data inside same type object
Between there are similarity relationships, analogical object is clustered first, initial aggregated data is obtained;Then, for inhomogeneity
The complicated incidence relation existed between type object, is modeled using tensor structure to it, and is approximately opened with tensor resolution method
Amount.
The present invention is clustered using the K-means clustering algorithms divided based on distance to inquiry, advertisement and user.Purpose
It is that high as far as possible with the object similarity in cluster, acquisition is initial by clustering so that analogical object is aggregated in same cluster
Aggregated data.With the advertising display number of times provided in experimental data as advertisement AiWith inquiry QjWeight, come set up advertisement-
Inquire about matrixWherein NaRepresent advertisement number, NqRepresent inquiry number, WijRepresent<Ai,Qj>Between weight.To the advertisement-
Inquiry matrix is clustered using K-means algorithms.It is below the clustering algorithm so that advertisement is clustered as an example.
Input:Advertisement-inquiry matrix WM×N, cluster number of clusters K
Output:K advertisement gathering is closed
1. pair advertisement-inquiry matrix WM×NScanning, obtains all M advertisements and N number of inquiry, A={ a is denoted as respectively1,
a2,…,aMAnd Q={ q1,q2,…,qN};
2. randomly selecting K from M advertisement as initial cluster centre point, T={ t are denoted as1,t2,…,tk};
3. K cluster set { P of initialization1,P2,…,PKIt is empty set;
4. calculate each advertisement aiWith each cluster centre point tjThe distance between, calculation formula is as follows:
(wherein GijRepresent advertisement aiWith the advertisement t as cluster centrejThe query set showed jointly,Respectively
It is advertisement aiWith tjWeight (displaying number of times), Dis (ai,tj) represent aiWith tjDistance);
If 5. Dis (ai,tj)=max { D (ai,t1),D(ai,t2),...D(ai,tk), then advertisement aiBelong to cluster Pj;
6. calculating the average weight value of all advertisements in same cluster set, cluster centre is regenerated;
7. if the deviation of cluster centre has reached the threshold value of setting, cluster is completed;Otherwise step 4 is gone to recalculate.
Number of users, inquiry number and advertisement number in primary data use N respectivelyu, NqAnd NaRepresent, inside same type object
After cluster, the object belonged in same cluster is represented with same ID, and the number of clusters of the user after cluster, inquiry and advertisement is distinguished
Use Ku, KqAnd KaRepresent.So, number of users, inquiry number and the advertisement number that primary data is concentrated are by original Nu, NqAnd NaDrop respectively
Tie up as Ku, KqAnd Ka。
There is ternary relation between user-inquiry-advertisement in click logs data.Traditional dimension reduction method (such as PCA
Deng) internal relation between three has been not only broken up, when the data dimension number of degrees are very big, it is easily caused dimension disaster.Therefore, this hair
It is bright that user, inquiry and advertisement three-dimensional data are represented with three-dimensional tensor structural model, then carry out dimensionality reduction using tensor resolution method.
Amount pattern dimensionality reduction has been sufficiently reserved structural information and internal association between user, inquiry and advertisement, because parameter is less, for
For high dimensional data, the dimensionality reduction of tensor pattern has more preferable yojan effect than vector pattern.Then utilize in tensor resolution method
Tucker decomposition methods to data carry out dimensionality reduction.
The purpose that Tucker is decomposed is to find one and original tensor H approximate tensor, and is at utmost retained original
Tensor information and structural information.Initial tensor H 3 number of dimensions are K respectivelyu, KqAnd Ka, the approximate tensor after dimensionality reduction
H ' 3 number of dimensions use I respectivelyu, Iq, IaRepresent.
The characteristics of there is nonlinearity association between the feature of ad data, and higher order polynomial function can be effectively
Portray highlights correlations relation.It is non-linear between multitiered network structure Level by level learning feature of the invention using stack self-encoding encoder
Association.
Self-encoding encoder is a deep learning algorithm for reappearing initial characteristicses as far as possible, is usually used to study initial data
More preferable character representation, by 3 layer network structure compositions:Bottom be input layer, it is middle for hidden layer (new data presentation layer) with
And output layer.
The present invention is described as follows using the higher order combination characteristic procedure in stack self-encoding encoder study ad data:
(1) using the initial characteristicses of extraction as the input of model, initial characteristicses are done with feature nonlinear transformation, and to obtain the 1st hidden
Hide layer, i.e. low order assemblage characteristic.
(2) low order assemblage characteristic is again passed by nonlinear transformation and obtains relative high-order as the object of new study
Assemblage characteristic, this process repeats down, untill reaching the hiding number of plies of setting.
For more preferable learning network weight parameter, the present invention is calculated using the unsupervised learning based on successively greedy training
Method.Successively the key of greedy study is the connection weight of successively training network weight parameter, every time a study adjacent two layers node,
By Level by level learning with obtain the overall situation stack self-encoding encoder model parameter.Successively greedy method learns stack self-encoding encoder weight
The process of parameter is as follows:
(1), by minimizing the reconstructed error of input and output, calculated by input layer to the 1st hidden layer using backpropagation
Method training parameter, obtains the 1st potential expression (i.e. the 1st hidden layer) of input data.
(2) last layer characteristic vector is trained weight parameter using same method, obtained as the input for training next layer
To another potential expression (i.e. the 2nd hidden layer) of data, the like.
Clicking rate pre-estimation problem is substantially two classification problems based on probability, and the present invention uses logistic regression conduct
Click on prediction model.
Claims (3)
1. a kind of ad click rate Forecasting Methodology, it is characterised in that comprise the following steps:
Step 1:Advertisement-inquiry matrix is set up, K-means clusters are carried out to advertisement and inquiry respectively;
Step 2:Tensor resolution is carried out to user-inquiry-advertisement three-dimensional tensor model;
Step 3:Extract the essential characteristic of influence ad click rate;
Step 4:Using the essential characteristic of selection as the input layer of stack self-encoding encoder, it is trained, obtains higher order combination feature;
Step 5:By in higher order combination feature input logic regression model, it is trained;
Step 6:Model training is completed, and the model that data input to be predicted is trained is predicted.
2. ad click rate Forecasting Methodology according to claim 1, it is characterised in that the step 1 includes:
Step 1-1:With the advertising display number of times provided in experimental data as advertisement AiWith inquiry QjWeight, come set up advertisement-
Inquire about matrix
Step 1-2:The advertisement-inquiry matrix is clustered using K-means algorithms;
Step 1-3:Number of users, inquiry number and advertisement number in primary data use N respectivelyu, NqAnd NaRepresent, same type object
After the cluster of inside, the object belonged in same cluster is represented with same ID, by the number of clusters of the user after cluster, inquiry and advertisement
K is used respectivelyu, KqAnd KaRepresent.So, number of users, inquiry number and the advertisement number that primary data is concentrated are by original Nu, NqAnd NaPoint
Other dimensionality reduction is Ku, KqAnd Ka。
3. ad click rate Forecasting Methodology according to claim 1, it is characterised in that the step 4 includes:
Step 1:First the essential characteristic chosen is inputted and will be trained in self-encoding encoder, obtained weight parameter w and partially is trained
Parameter b is put as the weight and biasing of stack self-encoding encoder input layer and first layer;
Step 2:Obtained output layer will be trained in step 1, is trained as the input layer of self-encoding encoder, obtain first layer and
The weight of the second layer and biasing;
Step 3:The rest may be inferred, obtains biasing and the weight of each interlayer, completes the training to stack self-encoding encoder.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710159687.7A CN106997550A (en) | 2017-03-17 | 2017-03-17 | A kind of method of the ad click rate prediction based on stack self-encoding encoder |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710159687.7A CN106997550A (en) | 2017-03-17 | 2017-03-17 | A kind of method of the ad click rate prediction based on stack self-encoding encoder |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106997550A true CN106997550A (en) | 2017-08-01 |
Family
ID=59431465
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710159687.7A Pending CN106997550A (en) | 2017-03-17 | 2017-03-17 | A kind of method of the ad click rate prediction based on stack self-encoding encoder |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106997550A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111311328A (en) * | 2020-02-20 | 2020-06-19 | 支付宝(杭州)信息技术有限公司 | Method and device for determining advertisement click rate of product under advertisement channel |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100179855A1 (en) * | 2009-01-09 | 2010-07-15 | Ye Chen | Large-Scale Behavioral Targeting for Advertising over a Network |
CN105787767A (en) * | 2016-03-03 | 2016-07-20 | 上海珍岛信息技术有限公司 | Method and system for obtaining advertisement click-through rate pre-estimation model |
CN106156530A (en) * | 2016-08-03 | 2016-11-23 | 北京好运到信息科技有限公司 | Health check-up data analysing method based on stack own coding device and device |
CN106485353A (en) * | 2016-09-30 | 2017-03-08 | 中国科学院遥感与数字地球研究所 | Air pollutant concentration forecasting procedure and system |
-
2017
- 2017-03-17 CN CN201710159687.7A patent/CN106997550A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100179855A1 (en) * | 2009-01-09 | 2010-07-15 | Ye Chen | Large-Scale Behavioral Targeting for Advertising over a Network |
CN105787767A (en) * | 2016-03-03 | 2016-07-20 | 上海珍岛信息技术有限公司 | Method and system for obtaining advertisement click-through rate pre-estimation model |
CN106156530A (en) * | 2016-08-03 | 2016-11-23 | 北京好运到信息科技有限公司 | Health check-up data analysing method based on stack own coding device and device |
CN106485353A (en) * | 2016-09-30 | 2017-03-08 | 中国科学院遥感与数字地球研究所 | Air pollutant concentration forecasting procedure and system |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111311328A (en) * | 2020-02-20 | 2020-06-19 | 支付宝(杭州)信息技术有限公司 | Method and device for determining advertisement click rate of product under advertisement channel |
CN111311328B (en) * | 2020-02-20 | 2022-07-01 | 支付宝(杭州)信息技术有限公司 | Method and device for determining advertisement click rate of product under advertisement channel |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109255506B (en) | Internet financial user loan overdue prediction method based on big data | |
CN111339415B (en) | Click rate prediction method and device based on multi-interactive attention network | |
CN108629630B (en) | Advertisement recommendation method based on feature cross-combination deep neural network | |
CN107958091A (en) | A kind of NLP artificial intelligence approaches and interactive system based on financial vertical knowledge mapping | |
CN112487199B (en) | User characteristic prediction method based on user purchasing behavior | |
CN114117220A (en) | Deep reinforcement learning interactive recommendation system and method based on knowledge enhancement | |
CN113627482B (en) | Cross-modal image generation method and device based on audio-touch signal fusion | |
Lenz et al. | Measuring the diffusion of innovations with paragraph vector topic models | |
Ionescu et al. | Overview of the ImageCLEF 2022: Multimedia retrieval in medical, social media and nature applications | |
CN104462385A (en) | Personalized movie similarity calculation method based on user interest model | |
Wang et al. | Predicting and ranking box office revenue of movies based on big data | |
US20200320381A1 (en) | Method to explain factors influencing ai predictions with deep neural networks | |
US20200265466A1 (en) | Interpretable click-through rate prediction through hierarchical attention | |
CN109766557A (en) | A kind of sentiment analysis method, apparatus, storage medium and terminal device | |
US9858526B2 (en) | Method and system using association rules to form custom lists of cookies | |
CN113254652B (en) | Social media posting authenticity detection method based on hypergraph attention network | |
CN112380433A (en) | Recommendation meta-learning method for cold-start user | |
Huynh et al. | Joint age estimation and gender classification of Asian faces using wide ResNet | |
CN110119448B (en) | Semi-supervised cross-domain text classification method based on dual automatic encoders | |
Chen et al. | Estimating ads’ click through rate with recurrent neural network | |
CN112487305B (en) | GCN-based dynamic social user alignment method | |
CN110910235A (en) | Method for detecting abnormal behavior in credit based on user relationship network | |
CN106997550A (en) | A kind of method of the ad click rate prediction based on stack self-encoding encoder | |
She et al. | Research on advertising click-through rate prediction based on CNN-FM hybrid model | |
CN111651660A (en) | Method for cross-media retrieval of difficult samples |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170801 |