CN115048855A - Click rate prediction model, training method and application device thereof - Google Patents

Click rate prediction model, training method and application device thereof Download PDF

Info

Publication number
CN115048855A
CN115048855A CN202210485479.7A CN202210485479A CN115048855A CN 115048855 A CN115048855 A CN 115048855A CN 202210485479 A CN202210485479 A CN 202210485479A CN 115048855 A CN115048855 A CN 115048855A
Authority
CN
China
Prior art keywords
layer
click
embedded
prediction model
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210485479.7A
Other languages
Chinese (zh)
Inventor
黄发良
尹云飞
戴智鹏
张意晨
任皓
农伟骏
孙敬钦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanning Normal University
Original Assignee
Nanning Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanning Normal University filed Critical Nanning Normal University
Priority to CN202210485479.7A priority Critical patent/CN115048855A/en
Publication of CN115048855A publication Critical patent/CN115048855A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/02Reliability analysis or reliability optimisation; Failure analysis, e.g. worst case scenario performance, failure mode and effects analysis [FMEA]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Analysis (AREA)
  • Computing Systems (AREA)
  • Geometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Algebra (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a click rate prediction model and a training method and an application device thereof, wherein the click rate prediction model comprises the following steps: the input layer is used for reading user instance information in high-dimensional sparse data of single-hot coding or multi-hot coding; the embedding layer is used for converting the high-dimensional sparse data into low-dimensional dense vectors to obtain embedded representation of the user and the project; the optimized product enhancement layer is used for compressing, exciting and scaling the embedded representation at a feature domain level, a feature dimension level and a global feature bit level to realize the embedded representation; a depth factorization module and a depth residual error network module for further capturing higher order implicit mutual information of the embedded features for the optimized embedded representation; and the output layer is used for integrating the outputs of the depth factor decomposition module and the depth residual error network module and calculating the probability of clicking the target item. The click rate prediction model trained by the invention can effectively improve the click rate prediction performance of the recommendation system, and can be widely applied to e-commerce scenes of advertisement click rate prediction.

Description

Click rate prediction model, training method and application device thereof
Technical Field
The invention relates to the technical field of click rate prediction, in particular to a click rate prediction model, a training method and an application device thereof.
Background
With the rapid development of information technology, various online services such as e-commerce, news information, social platforms and the like are in a large number, which brings convenience to life of people and also causes a serious problem of information overload, so that people are difficult to find information meeting their own needs in a data ocean. The recommendation system is one of the important technical means for alleviating the problem, and is a click-through rate prediction technology of a core function module of the recommendation system, which is paid much attention by the academic and industrial fields, and different researches propose various click-through rate prediction methods and models from different perspectives, so that the recommendation system is widely applied to various news recommendation platforms, e-commerce platforms and computational advertising industries.
In addition, the optimization work of the existing click rate prediction model on the initial embedded feature vector of the model is rarely considered, so that the information of the cross vector fine granularity cannot be effectively utilized, and the prediction performance of the model is damaged.
Disclosure of Invention
It is an object of the present invention to address at least the above-mentioned deficiencies and to provide at least the advantages which will be described hereinafter.
The click rate prediction model is designed with a product enhancement network to optimize the initial embedded characteristic vector of the model to obtain a product enhanced embedded vector, and simultaneously, a depth factorization machine and a depth residual error neural network are combined in parallel, so that the accuracy of the CTR prediction model and the computation time cost of the training model can be better balanced.
The invention provides a click rate prediction model, which comprises the following components:
the input layer is used for reading in user instance information in high-dimensional sparse data of single-hot coding or multi-hot coding;
the embedding layer is used for converting the high-dimensional sparse data into low-dimensional dense vectors to obtain embedded representation of the user and the project;
the product enhancement layer is used for compressing, exciting and scaling the embedded representation at a feature domain level, a feature dimension level and a global feature bit level to realize optimization of the embedded representation;
a depth factorization module and a depth residual error network module for further capturing high order implicit mutual information of the embedded features for the optimized embedded representation;
and the output layer is used for integrating the output of the depth factorization module and the output of the depth residual error network module and calculating the probability of clicking the target item.
Preferably, the user instance information includes a user attribute, an item attribute, or a user historical behavior attribute. Such as: the user's age, gender, installation time, registration status, city, province, active login, historical purchase status, historical purchase amount, etc.
The invention provides a corresponding training method for the click rate prediction model, which comprises the following steps:
step 1, initializing a parameter set of the click rate prediction model;
step 2, sampling batch samples from all example samples;
step 3, carrying out sparse coding in a single-hot mode or a multi-hot mode on discrete attributes and discretized continuous attributes in the user instance information through an input layer of the click rate prediction model to obtain high-dimensional sparse representation of the user instance information;
step 4, converting the high-dimensional sparse representation of the user instance information into a low-dimensional dense vector through an embedding layer of the click rate prediction model to obtain the embedded representation of the user and the project;
step 5, compressing, exciting and scaling the low-dimensional embedding representation feature domain level, the feature dimension level and the global feature bit level of the example information through the product enhancement layer of the click rate prediction model to realize the optimization of the embedding representation;
step 6, a depth factor decomposition module is utilized to embed and represent the data instance based on the product enhancement layer optimization, and high-order implicit interactive information in the embedded characteristic representation is further captured;
step 7, utilizing a depth residual error network module to embed and represent the data instance based on the product enhancement layer optimization, and further capturing high-order implicit interactive information in the embedded characteristic representation;
step 8, integrating the output of the depth factor decomposition module and the depth residual error network module, and calculating the clicked prediction score of the target project;
step 9, calculating the click prediction Loss based on the cross entropy by using the actual condition of the item click and the predicted condition of the item click;
step 10, optimizing cross entropy loss by adopting an Adam optimizer based on error back propagation, adjusting weight parameters of the prediction model, and searching the learning rate between {0.0001 and 0.01 };
and 11, training the prediction model in a mode of circularly executing the steps from 2 to 10 until a specified iteration number.
Preferably, in the step 4, the embedded representation e ═ e { e } of the user and the item is obtained by the following method 1 ,e 2 ,…,e m }:
e i =M i X i ,i=1,2,…,m
e j =M j X i ,i=1,2,…,m
Wherein X i And X j One-hot coded vector and multiple-hot coded vector, e, of discrete attribute i and discrete attribute j, respectively, of data instance X i And e j Respectively showing the current input instance X in the feature field F i And F j Corresponding embedding vector, M i Is k × n i Real number matrix of n i Is a feature field F i The number of included features, m is the set of feature fields F ═ { F ═ F 1 ,F 2 ,…,F m The size of the leaf.
Preferably, the embedded representation is optimized in step 5 by: firstly, the embedded layer outputs an E-type variant matrix E through a type variant operator reshape 0 Then using a compression operator
Figure RE-GDA0003759682410000031
Compressing original embedded e to k-dimensional statistical vector
Figure RE-GDA0003759682410000032
And
Figure RE-GDA0003759682410000033
using excitation operators
Figure RE-GDA0003759682410000034
Learning based on statistical vector p 1 Is embedded in the weights of the different potential attributes of the vector, using an excitation operator
Figure RE-GDA0003759682410000035
Learning based on statistics p 2 Using the excitation operator
Figure RE-GDA0003759682410000036
Learning is based on vector e learning the integral embedding features, using scaling operations with q 1 、q 2 And q is 3 Weighting the initial embedded vector for weight to obtain E 1 =E 0 q 1 、E 2 =(E 0 ) T q 2 And e bit =eq 3 (ii) a Finally, the output average pooling of the compression operator, the excitation operator and the scaling operation is obtained
Figure RE-GDA0003759682410000037
Wherein, i is more than or equal to 1 and less than or equal to k, and j is more than or equal to 1 and less than or equal to m in the compression operator; in the excitation operator q 1 、q 2 And q is 3 Respectively vectors of length k, m and km,
Figure RE-GDA0003759682410000038
and
Figure RE-GDA0003759682410000039
the matrix is respectively of the scale of k multiplied by rk, rk multiplied by k, m multiplied by rm, rm multiplied by m, km multiplied by rkm and km multiplied by km, r is a parameter for controlling the dimension increasing proportion; in scaling operation E 1 And E 2 Are all m × k matrices, e bit Is a vector of length mk.
Preferably, the high-order implicit interaction information embedded in the feature representation is captured in step 6 by: first, the second-order feature interaction layer is used to output the vector of the product enhancement layer
Figure RE-GDA00037596824100000310
Aggregating vectors of length k
Figure RE-GDA00037596824100000311
Then, using the inclusion L 1 A feedforward network pair e of feedforward layers BI Performing a non-linear transformation, wherein L (L ═ 0, 1, …, L 1 ) A feed-forward layer mainly related to
Figure RE-GDA00037596824100000312
Figure RE-GDA00037596824100000313
Wherein Relu is the excitation function, BN is the batch normalization operator,
Figure RE-GDA00037596824100000314
and
Figure RE-GDA00037596824100000315
weights and offsets, W, respectively, for the layer I feedforward layer dfm An output layer weight for the depth factorization module.
Preferably, step 7 utilizes a catalyst comprising L 2 A depth residual network module of each residual layer captures high-order implicit interactive information in the embedded feature representation: no. (L ═ 0, 1, …, L 2 ) The residual layer mainly relates to
Figure RE-GDA0003759682410000041
Figure RE-GDA0003759682410000042
Wherein
Figure RE-GDA0003759682410000043
And
Figure RE-GDA0003759682410000044
is the weight of the l-th layer residual layer,
Figure RE-GDA0003759682410000045
is the offset of the residual layer of the l-th layer, W drn And the output layer weight of the depth residual error network module.
Preferably, the predicted score of the target item clicked is calculated in step 8 as
Figure RE-GDA0003759682410000046
Wherein the content of the first and second substances,
Figure RE-GDA0003759682410000047
and calculating the prediction score of the clicked target item for the click rate prediction model, wherein sigmoid is an excitation function.
Preferably, the cross entropy loss in step 9 is:
Figure RE-GDA0003759682410000048
wherein, theta is a parameter set of the recommendation model, N is the total number of training examples, lambda is the weight of the regularization term, y i Is the training data label of whether the ith item was recommended.
The invention also aims to provide a click rate prediction method by utilizing the trained model, and the click rate prediction method can effectively improve the click rate prediction performance of the recommendation system. The method for predicting click rate comprises the following steps:
acquiring user instance information to be predicted;
inputting user instance information into the click rate prediction model obtained by training, and calculating the click probability of the user on the target item;
recommending the target item corresponding to the click probability which is greater than or equal to the preset click rate threshold value to the user.
Still another object of the present invention is to provide an apparatus for predicting a click rate using the trained model, comprising:
an acquisition unit configured to acquire user instance information;
the prediction unit is configured to input the user instance information into a click rate prediction model obtained by training according to the method, and obtain a predicted click probability of the user for clicking a target item;
and the pushing unit is configured to recommend the target item corresponding to the click probability larger than or equal to a preset click rate threshold value to the terminal of the user.
The invention at least comprises the following beneficial effects:
compared with the traditional click rate prediction model and training method, the method provided by the invention designs the product enhancement network to optimize the initial embedded vector, namely, the embedded vector with the initial characteristics is subjected to scaling attention at the characteristic domain level, the characteristic dimension level and the global characteristic bit level to obtain the product enhancement embedded vector. Meanwhile, the invention utilizes the depth factorization model to capture the high-order interaction information among the optimized input data instance characteristics on one hand, and utilizes the depth residual error connection to improve the training efficiency of the prediction model on the other hand, thereby more effectively depicting the potential interest of customers, better balancing the accuracy of the CTR prediction model and the calculation time cost of the training model, improving the click rate prediction performance of the recommendation system, being widely applied to the recommendation system and improving the active service quality of information.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.
Drawings
FIG. 1 is a flow chart of a click-through rate prediction model of the present invention.
FIG. 2 is a schematic diagram of a product enhancement layer in the click-through rate prediction model of the present invention.
FIG. 3 is a bar graph of the runtime per training cycle for different models on an Avazu dataset.
FIG. 4 is a bar graph of the run time per training cycle for different models on the Criteo dataset.
FIG. 5 is a bar graph of the runtime per training cycle for different models on a Movie dataset.
FIG. 6 is a bar graph of run time per training period for different models on the Book dataset.
Detailed Description
The present invention is further described in detail below with reference to examples so that those skilled in the art can practice the invention with reference to the description.
It is to be noted that the experimental methods described in the following embodiments are all conventional methods unless otherwise specified, and the reagents and materials are commercially available unless otherwise specified.
As shown in FIGS. 1-2, the present invention discloses a click rate prediction model, which comprises:
the input layer is used for reading in user instance information in high-dimensional sparse data of single-hot coding or multi-hot coding; the user instance information includes user attributes, item attributes, or user historical behavior attributes. Such as: the user's age, gender, installation time, registration status, city, province, active login, historical purchase status, historical purchase amount, etc.
The embedding layer is used for converting the high-dimensional sparse data into low-dimensional dense vectors to obtain embedded representation of the user and the project;
the product enhancement layer is used for compressing, exciting and scaling the embedded representation at a feature domain level, a feature dimension level and a global feature bit level to realize optimization of the embedded representation;
a depth factorization module and a depth residual error network module for further capturing high-order implicit mutual information of the embedded features for the optimized embedded representation;
and the output layer is used for integrating the output of the depth factorization module and the output of the depth residual error network module and calculating the probability of clicking the target item.
The click rate prediction model of the invention is trained as follows:
step 1.1: initializing a parameter set of the click rate prediction model;
step 1.2: sampling a batch sample or a small batch sample from all example samples;
step 1.3: carrying out sparse coding in a single-hot mode or a multi-hot mode on discrete attributes and discretized continuous attributes in the user instance information through an input layer of a click rate prediction model to obtain high-dimensional sparse representation of the user instance information;
step 1.4: converting the high-dimensional sparse representation of the user instance information into a low-dimensional dense vector through an embedding layer of a click rate prediction model to obtain embedded representation of the user and the project;
in said step 1.4, the embedded representation e ═ e { e } of the user and the item can be obtained according to the following method 1 ,e 2 ,…,e m }:
e i =M i X i ,i=1,2,…,m
e j =M j X j ,j=1,2,…,m
Wherein X i And X j One-hot coded vector and multiple-hot coded vector, e, of discrete attribute i and discrete attribute j, respectively, of data instance X i And e j Respectively representing the current input instance X in the feature field F i And F j Corresponding embedding vector, M i Is k × n i Real number matrix of n i Is a feature field F i The number of included features, m is the set of feature fields F ═ { F ═ F 1 ,F 2 ,…,F m The size of the page;
step 1.5: compressing, exciting and scaling a low-dimensional embedding representation feature domain level, a feature dimension level and a global feature bit level of example information through a product enhancement layer of the click rate prediction model to realize optimization of embedding representation;
in said step 1.5, the embedded representation can be optimized according to the following method: firstly, the embedded layer outputs an E-type transformation matrix E through a type transformation operator reshape 0 Then using a compression operator
Figure RE-GDA0003759682410000061
Compressing original embedding e to k-dimensional statistical vector
Figure RE-GDA0003759682410000062
And
Figure RE-GDA0003759682410000063
using excitation operators
Figure RE-GDA0003759682410000064
Learning based on statistical vector p 1 The feature of (3) is embedded into the weights of different potential attributes of the vector using an excitation operator
Figure RE-GDA0003759682410000065
Learning based on statistics p 2 Using the excitation operator
Figure RE-GDA0003759682410000066
Learning is based on directionQuantity e learns the overall embedded features, using scaling operations with q 1 、q 2 And q is 3 Weighting the initial embedded vector for weight to obtain E 1 =E 0 q 1 、E 2 =(E 0 ) T q 2 And e bit =eq 3 (ii) a Finally, the output average pooling of the compression operator, the excitation operator and the scaling operation is obtained
Figure RE-GDA0003759682410000071
That is: compression operator
Figure RE-GDA0003759682410000072
And
Figure RE-GDA0003759682410000073
compressing original embedding e to a statistical vector of size k using a summation method
Figure RE-GDA0003759682410000074
And
Figure RE-GDA0003759682410000075
wherein, the first and the second end of the pipe are connected with each other,
Figure RE-GDA0003759682410000076
Figure RE-GDA0003759682410000077
excitation operator
Figure RE-GDA0003759682410000078
Based on statistical vector p 1 Learning feature embedding vector weights, excitation operators, of different potential attributes
Figure RE-GDA0003759682410000079
Based on statistic p 2 Learning the weight of the embedded vector for each feature field, exciting the operator
Figure RE-GDA00037596824100000710
Learning integral embedded features based on vector e, mainly involving
Figure RE-GDA00037596824100000711
Figure RE-GDA00037596824100000712
Figure RE-GDA00037596824100000713
Wherein q is 1 、q 2 And q is 3 Respectively vectors of length k, m and km,
Figure RE-GDA00037596824100000714
and
Figure RE-GDA00037596824100000715
the matrices are of the scale k × rk, rk × k, m × rm, rm × m, km × rkm and km × km, respectively, and r is a parameter controlling the scale of the added dimension.
Scaling operations 3 weights q of original features learned by excitation operations 1 、q 2 And q is 3 Weighting the initial embedded vector, primarily involving
E 1 =E 0 q 1
E 2 =(E 0 ) T q 2
e bit =eq 3
Wherein E 1 And E 2 All are m × k matrices, e bit Is a vector of length mk.
Vectorization of E 1 And E 2 And averagely pooling the outputs of the compression operator, the excitation operator and the scaling operation, mainly relating to
Figure RE-GDA00037596824100000716
e dim =reshape(E 1 )
e field =reshape(E 2 )
Step 1.6: and utilizing a depth factorization module to embed the representation of the data instance based on the product enhancement layer optimization, and further capturing high-order implicit interaction information in the embedded feature representation.
In said step 1.6, the higher order implicit interaction information embedded in the feature representation can be captured by: first, the second-order feature interaction layer is used to output the vector of the product enhancement layer
Figure RE-GDA0003759682410000081
Aggregating vectors of length k
Figure RE-GDA0003759682410000082
Then, using the inclusion L 1 A pair of feedforward networks e of a feedforward layer BI Performing a non-linear transformation, wherein L (L ═ 0, 1, …, L 1 ) A feed-forward layer mainly related to
Figure RE-GDA0003759682410000083
Figure RE-GDA0003759682410000084
Wherein Relu is the excitation function, BN is the batch normalization operator,
Figure RE-GDA0003759682410000085
and
Figure RE-GDA0003759682410000086
weights and offsets, W, respectively, for the layer I feedforward layer dfm An output layer weight for the depth factorization module.
Step 1.7: and utilizing a depth residual error network module to embed the representation into the data instance based on the product enhancement layer optimization, and further capturing high-order implicit interactive information in the embedded feature representation.
In said step 1.7, the inclusion of L is utilized 2 A depth residual network module of each residual layer captures high-order implicit interactive information in the embedded feature representation: no. (L ═ 0, 1, …, L 2 ) The residual layer mainly relates to
Figure RE-GDA0003759682410000087
Figure RE-GDA0003759682410000088
Wherein
Figure RE-GDA0003759682410000089
And
Figure RE-GDA00037596824100000810
is the weight of the l-th layer residual layer,
Figure RE-GDA00037596824100000811
is the offset of the l-th layer residual layer, W drn And the output layer weight of the depth residual error network module. The symbolic representation of a point within a circle is the product of the corresponding elements of the matrix.
Step 1.8: integrating the output of the depth factor decomposition module and the depth residual error network module, and calculating the clicked prediction score of the target item
Figure RE-GDA00037596824100000812
Wherein the content of the first and second substances,
Figure RE-GDA00037596824100000813
and calculating the clicked prediction score of the item for the click rate prediction model, wherein sigmoid is an excitation function.
Step 2.9: and calculating the cross entropy Loss by utilizing the actual condition of item clicking and the predicted condition of item clicking as follows:
Figure RE-GDA00037596824100000814
wherein, theta is a parameter set of the recommendation model, N is the total number of training examples, lambda is the weight of the regularization term, y i Is the training data label of whether the ith item was recommended.
Step 1.10: optimizing cross entropy loss by adopting an Adam optimizer based on error back propagation, adjusting weight parameters of the click rate prediction model, and searching the learning rate between {0.0001 and 0.01 }; the search design of the learning rate enables the training speed to be faster, and is beneficial to faster model convergence;
step 1.11: training the prediction model in a loop execution from step 1.2 to step 1.10 to a specified number of iterations, wherein the specified number of iterations is determined according to error stability of cross entropy loss obtained by the loop execution, and the fluctuation of the cross entropy loss obtained in the last 10 consecutive times is less than 10 -8 And finishing the training. The hit rate prediction model after training is marked as PeDRNFM.
< Effect test >
In order to evaluate the performance and efficiency of the click rate prediction model training, 4 standard data sets for click rate prediction were used for experiments. Table 1 lists the statistics of these 4 data sets for training, validation and testing, respectively.
TABLE 1 feature statistics of data sets
Figure RE-GDA0003759682410000091
The model for test training comprises the following
LR: which is the most classical baseline model in click-through rate estimation.
WD: it is a model based on deep networks, combining deep learning networks and LR in parallel from the perspective of "generalization" and "memory".
FNN: the method is a model based on a deep network, and comprises the steps of utilizing FM pre-training feature embedding and then DNN to perform subsequent deep feature interaction.
IPNN: the model is based on a deep network, belongs to a PNN model and is in the same family with OPNN. After embedding the layer, performing explicit feature interaction in an inner product mode.
DCN: the method is a model based on a deep network, provides explicit interaction of cross network learning characteristics, uses a neural network module to learn implicit interaction of the characteristics, and finally synthesizes the output of the two modules.
Deep FM: the deep learning network and the FM are combined in parallel, and a first-order feature, an explicit second-order feature interaction and an implicit feature interaction are learned at the same time.
XDeepFM: on the basis of DeepFM, a CIN network is further designed to replace an explicit interaction module, and estimation capability of a feature interaction promotion model is carried out at a vector level.
AFN: the model is based on a deep network, a logarithmic neural network layer for adaptively adjusting interaction orders is designed, and the orders of characteristic interaction can be selected through the network, so that the performance of the model is improved.
Autolnt: by utilizing the multi-level multi-head self-attention neural network, different-order feature combinations of the input features can be modeled.
FiBiNET: the importance of the features is dynamically learned through an Squeeze-Excitation network mechanism, and meanwhile, the interaction between the features is learned through a bilinear function.
FNFM: the method has strong second-order feature interactive learning capability.
DRM: the method is a model based on a deep network, and the importance of each embedding dimension to the model is calculated through an attention structure, so that the embedding quality of features is improved.
maskNet: the method is a model based on a deep network, the capability of DNN excavation of complex interaction features can be improved by using MaskBlock, maskNet families have a parallel model and a serial model due to different maskBlock stacking modes, the serial model is MaskNet, and the parallel model is MaskNetx.
PeDRNFM: the click rate prediction model is trained by the method.
Evaluation index
And evaluating the performance of all models by adopting two common indexes of AUC and Logloss in the click rate prediction task. AUC refers to the area under the ROC curve, the probability that the evaluation model assigns a higher prediction score to a positive sample, and the larger the AUC value, the better the model performance. Logloss refers to cross entropy loss, with smaller values indicating better performance of the model.
Performance evaluation
To quantitatively analyze the performance of the pedrinfm model, we selected a current representative click-through rate prediction model and the pedrinfm model to perform click-through rate prediction performance analysis on 4 experimental data sets.
Details of the experiment
All methods are implemented in a framework pytorreh framework, and the feature embedding dimension k of all CTR models is fixed to 16. For a fair comparison, for the model containing DNN, its hidden layer parameters are set to (16, 16), and the dropout range is set to search between {0.1-0.9 }. The Batch Size (Batch Size) is set to 4096 for Avazu and Criteo datasets and searched between 128, 256, 512 for Movie and Book datasets. The learning rate of the Adam optimizer searches between 0.01, 0.001, 0.0001. The objective function regularization term parameter λ is searched between {0.001, 0.0001, 0.0001, 0.00001 }. The random number seeds are fixed at 42. Next, the proprietary hyper-parameters of each model are described:
for the AFM model, the Attention Factor (Attention Factor) size is set to 16; for the DCN and DCNv2 models, the number of interleaved layers was set to 3; for the xDeepFM model, the parameters of the CIN network are set to (16, 16); for the AFN model, the number of the LNN network vector magnitude logarithmic neurons is set to be 1500; for the AutoInt model, the dimension size of the attention layer is set to 64, the number of the attention heads is 2, the number of the attention layers is 3, and residual connection is started; for the DIFM model, the dimension size of the attention layer is set to be 16, the number of the attention heads is 2, the number of the attention layers is 3, and residual error connection is started; for the FiBiNET and DeepFiBiNET models, the Reduction Ratio (Reduction Ratio) is set to 3; for two variants of MaskNet: SerMaskNet and ParaMaskNet, maskbock number set to 3, especially for FFM and FNFM models, the feature embedding dimension is set to 4 due to limitations of machine performance.
Effectiveness analysis
Table 2 shows the results of the effectiveness experiments of various representative click rate prediction models. As can be seen from table 2, observing the experimental results in the table, it can be seen that: 1) the performance of all depth CTR prediction models is higher than that of LR, and the insufficient modeling capacity of the LR on a CTR prediction task is explained again; 2) the IPNN and FiBiNET models have good prediction performance on all data sets, the working modes of the FiBiNET and the IPNN are that each cross feature vector is explicitly calculated, each cross feature vector is subjected to summation pooling, and finally a vector with the same dimensionality and cross feature number is obtained, and the working mode can better realize feature interaction; 3) the PeDRNFM can obtain the optimal performance compared with other baseline models, and the PeDRNFM can effectively optimize the embedded vector of the initial feature.
TABLE 2 comparison of effectiveness of different training models
Figure RE-GDA0003759682410000111
Time efficiency analysis
Fig. 3, 4, 5 and 6 show the run time of each training cycle for different models on four data sets, and from a review of this figure: 1) due to the fact that the number of Avazu and Criteo characteristic domains is large and the number of samples is large, the running time of the model on the two data sets is higher than that of the model on the Movie and Book data sets; 2) on all data sets, the LR reasoning efficiency is highest, and the FNFM, FiBiNET and xDeepFM reasoning efficiency is lowest. The time complexity of FiBiNET and FNFM is high because it requires explicit computation of interactions between features from different feature domains. Each layer of the CIN network is calculated by taking all vectors of the previous layer and all vectors of the input layer to perform a Hadamard product operation, and thus xDeepFM also has a particularly high computational complexity. 3) IPNN has a late calculation time for Avazu and Criteo datasets relative to runtime ordering on Movie and Book datasets because of the higher number of feature fields in the former datasets. The AFN has more logarithmic operation at the LNN network part, and the calculation of the characteristic cross layer of AutoInt is complex, so the actual calculation time of the AFN and the characteristic cross layer is longer; 4) the time required for each training period (epoch) by the PeDRNFM is relatively average over the four data sets. In combination with table 2, it can be seen that the PeDRNFM has better prediction performance at the same time, and in conclusion, the PeDRNFM trained by the present invention better balances the prediction accuracy and the time cost of model training.
Through the test and analysis, the trained model has more advantages in application in a recommendation system for predicting click rate, and can effectively improve click rate prediction performance of the recommendation system and improve information active service quality.
When the method is applied to a recommendation system for predicting click rate, user instance information to be predicted is obtained by utilizing a computer readable medium, a server and the like; inputting user instance information into the trained click rate prediction model, and calculating the click probability of the user on the target item; the target items corresponding to the click probability being greater than or equal to the preset click rate threshold value are recommended to the user, so that the recommendation of commodities, information, videos and the like related to the user is realized, and the method can be widely applied to electronic commerce scenes for predicting the click rate of advertisements.
The click rate prediction model is applied to a recommendation system by adopting a click rate prediction device, and the method comprises the following steps: an acquisition unit configured to acquire user instance information; the prediction unit is configured to input the user instance information into the trained click rate prediction model to obtain the predicted click probability of the user clicking the target item; and the pushing unit is configured to recommend the target item corresponding to the click probability greater than or equal to a preset click rate threshold value to the terminal of the user.
While embodiments of the invention have been disclosed above, it is not intended to be limited to the uses set forth in the specification and examples. It can be applied to all kinds of fields suitable for the present invention. Additional modifications will readily occur to those skilled in the art.

Claims (10)

1. The click rate prediction model is characterized by comprising the following components:
the input layer is used for reading in user instance information in high-dimensional sparse data of single-hot coding or multi-hot coding;
the embedding layer is used for converting the high-dimensional sparse data into low-dimensional dense vectors to obtain embedded representation of the user and the project;
the product enhancement layer is used for compressing, exciting and scaling the embedded representation at a feature domain level, a feature dimension level and a global feature bit level to realize optimization of the embedded representation;
the depth factorization module and the depth residual error network module are used for further capturing high-order implicit interactive information of the embedded features for the optimized embedded representation;
and the output layer is used for integrating the output of the depth factorization module and the output of the depth residual error network module and calculating the probability of clicking the target item.
2. The click-rate prediction model of claim 1 wherein the user instance information comprises user attributes, item attributes, or user historical behavior attributes.
3. A method for training a click-through rate prediction model, according to the click-through rate prediction model of claim 1 or 2, the method comprising:
step 1, initializing a parameter set of the click rate prediction model;
step 2, sampling batch samples from all example samples;
step 3, carrying out sparse coding in a single-hot mode or a multi-hot mode on discrete attributes and discretized continuous attributes in the user instance information through an input layer of the click rate prediction model to obtain high-dimensional sparse representation of the user instance information;
step 4, converting the high-dimensional sparse representation of the user instance information into a low-dimensional dense vector through an embedding layer of the click rate prediction model to obtain the embedded representation of the user and the project;
step 5, compressing, exciting and scaling the low-dimensional embedding representation feature domain level, the feature dimension level and the global feature bit level of the example information through the product enhancement layer of the click rate prediction model to realize the optimization of the embedding representation;
step 6, a depth factor decomposition module is utilized to embed and represent the data instance based on the product enhancement layer optimization, and high-order implicit interactive information in the embedded characteristic representation is further captured;
step 7, utilizing a depth residual error network module to embed and represent the data instance based on the product enhancement layer optimization, and further capturing high-order implicit interactive information in the embedded characteristic representation;
step 8, integrating the output of the depth factor decomposition module and the depth residual error network module, and calculating the clicked prediction score of the target project;
step 9, calculating the click prediction Loss based on the cross entropy by using the actual condition of the item click and the predicted condition of the item click;
step 10, optimizing cross entropy loss by adopting an Adam optimizer based on error back propagation, adjusting weight parameters of the prediction model, and searching the learning rate between {0.0001 and 0.01 };
and 11, training the prediction model in a mode of circularly executing the steps from 2 to 10 until a specified iteration number.
4. A training method according to claim 3, wherein in said step 4, an embedded representation e-e of a user and an item is obtained by 1 ,e 2 ,…,e m }:
e i =M i X i ,i=1,2,…,m
e j =M j X j ,i=1,2,…,m
Wherein X i And X j Discrete attribute i and discrete attribute of data instance X, respectivelyOne-hot coded vector and multiple-hot coded vector of character j, e i And e j Respectively showing the current input instance X in the feature field F i And F j Corresponding embedding vector, M i Is k × n i Real number matrix of n i Is a feature field F i The number of included features, m is the set of feature fields F ═ { F ═ F 1 ,F 2 ,…,F m The size of the leaf.
5. Training method according to claim 4, characterized in that in step 5 the embedded representation is optimized by: firstly, the embedded layer outputs an E-type transformation matrix E through a type transformation operator reshape 0 Then using a compression operator
Figure RE-FDA0003759682400000021
And
Figure RE-FDA0003759682400000022
compressing original embedding e to k-dimensional statistical vector
Figure RE-FDA0003759682400000023
And
Figure RE-FDA0003759682400000024
using excitation operators
Figure RE-FDA0003759682400000025
Learning based on statistical vector p 1 Is embedded in the weights of the different potential attributes of the vector, using an excitation operator
Figure RE-FDA0003759682400000026
Learning based on statistics p 2 Using the excitation operator
Figure RE-FDA0003759682400000027
Learning Whole embedding features based on vector e learning, using scalingOperate with q 1 、q 2 And q is 3 Weighting the initial embedded vector for weight to obtain E 1 =E 0 q 1 、E 2 =(E 0 ) T q 2 And e bit =eq 3 (ii) a Finally, the output average pooling of the compression operator, the excitation operator and the scaling operation is obtained
Figure RE-FDA0003759682400000028
Wherein, i is more than or equal to 1 and less than or equal to k, and j is more than or equal to 1 and less than or equal to m in the compression operator; in the excitation operator q 1 、q 2 And q is 3 Respectively vectors of length k, m and km,
Figure RE-FDA0003759682400000029
and
Figure RE-FDA00037596824000000210
the matrix is respectively of the scale of k multiplied by rk, rk multiplied by k, m multiplied by rm, rm multiplied by m, km multiplied by rkm and km multiplied by km, r is a parameter for controlling the dimension increasing proportion; in scaling operation E 1 And E 2 Are all m × k matrices, e bit Is a vector of length mk.
6. The training method of claim 5, wherein the step 6 captures the higher-order implicit interactive information embedded in the feature representation by: first, the second-order feature interaction layer is used to output the vector of the product enhancement layer
Figure RE-FDA00037596824000000314
Aggregating vectors of length k
Figure RE-FDA0003759682400000031
Then, using the inclusion L 1 A feedforward network pair e of feedforward layers BI Performing a non-linear transformation, wherein L (L ═ 0, 1, …, L 1 ) A feed-forward layer mainly related to
Figure RE-FDA0003759682400000032
α 0 =e BI
Figure RE-FDA0003759682400000033
Wherein Relu is an excitation function, BN is a batch normalization operator,
Figure RE-FDA0003759682400000034
and
Figure RE-FDA0003759682400000035
weights and offsets, W, respectively, of the layer I feedforward layer dfm An output layer weight for the depth factorization module;
the step 7 comprises the step of using a catalyst containing L 2 A depth residual network module of each residual layer to capture higher-order implicit mutual information embedded in the feature representation: no. (L) 0, 1, …, L 2 ) The residual layer mainly relates to
Figure RE-FDA0003759682400000036
q 0 =e
Figure RE-FDA0003759682400000037
Wherein
Figure RE-FDA0003759682400000038
And
Figure RE-FDA0003759682400000039
is the weight of the l-th layer residual layer,
Figure RE-FDA00037596824000000310
is the offset of the l-th layer residual layer, W drn And the output layer weight of the depth residual error network module.
7. The training method of claim 6, wherein the predictive score for the target item clicked on in step 8 is calculated as
Figure RE-FDA00037596824000000311
Wherein the content of the first and second substances,
Figure RE-FDA00037596824000000312
and calculating the prediction score of the clicked target item for the click rate prediction model, wherein sigmoid is an excitation function.
8. The training method of claim 7, wherein the cross-entropy penalty in step 9 is:
Figure RE-FDA00037596824000000313
wherein, theta is a parameter set of the recommendation model, N is the total number of training examples, lambda is the weight of the regularization term, y i Is the training data label of whether the ith item was recommended.
9. A method for predicting click through rates, comprising:
acquiring user instance information to be predicted;
inputting user instance information into a click rate prediction model obtained by training according to the method of any one of claims 3-8, and calculating the click probability of a user on a target item;
recommending the target item corresponding to the click probability which is greater than or equal to the preset click rate threshold value to the user.
10. An apparatus for predicting click through rate, comprising:
an acquisition unit configured to acquire user instance information;
a prediction unit configured to input the user instance information into a click rate prediction model trained according to the method of any one of claims 3-8, so as to obtain a predicted click probability of the user clicking on a target item;
and the pushing unit is configured to recommend the target item corresponding to the click probability greater than or equal to a preset click rate threshold value to the terminal of the user.
CN202210485479.7A 2022-05-06 2022-05-06 Click rate prediction model, training method and application device thereof Withdrawn CN115048855A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210485479.7A CN115048855A (en) 2022-05-06 2022-05-06 Click rate prediction model, training method and application device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210485479.7A CN115048855A (en) 2022-05-06 2022-05-06 Click rate prediction model, training method and application device thereof

Publications (1)

Publication Number Publication Date
CN115048855A true CN115048855A (en) 2022-09-13

Family

ID=83157301

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210485479.7A Withdrawn CN115048855A (en) 2022-05-06 2022-05-06 Click rate prediction model, training method and application device thereof

Country Status (1)

Country Link
CN (1) CN115048855A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115271272A (en) * 2022-09-29 2022-11-01 华东交通大学 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation
CN115563510A (en) * 2022-12-01 2023-01-03 北京搜狐新动力信息技术有限公司 Training method of click rate estimation model and related device
CN117252665A (en) * 2023-11-14 2023-12-19 苏州元脑智能科技有限公司 Service recommendation method and device, electronic equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115271272A (en) * 2022-09-29 2022-11-01 华东交通大学 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation
CN115271272B (en) * 2022-09-29 2022-12-27 华东交通大学 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation
CN115563510A (en) * 2022-12-01 2023-01-03 北京搜狐新动力信息技术有限公司 Training method of click rate estimation model and related device
CN115563510B (en) * 2022-12-01 2023-04-07 北京搜狐新动力信息技术有限公司 Training method of click rate estimation model and related device
CN117252665A (en) * 2023-11-14 2023-12-19 苏州元脑智能科技有限公司 Service recommendation method and device, electronic equipment and storage medium
CN117252665B (en) * 2023-11-14 2024-02-20 苏州元脑智能科技有限公司 Service recommendation method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Wu et al. Session-based recommendation with graph neural networks
CN111797321B (en) Personalized knowledge recommendation method and system for different scenes
CN111275521B (en) Commodity recommendation method based on user comment and satisfaction level embedding
CN111222332B (en) Commodity recommendation method combining attention network and user emotion
Pan et al. Study on convolutional neural network and its application in data mining and sales forecasting for E-commerce
CN115048855A (en) Click rate prediction model, training method and application device thereof
Zhu et al. Online purchase decisions for tourism e-commerce
CN111737578B (en) Recommendation method and system
CN110245285B (en) Personalized recommendation method based on heterogeneous information network
Yu et al. Personalized item recommendation for second-hand trading platform
CN111563770A (en) Click rate estimation method based on feature differentiation learning
CN112258262A (en) Conversation recommendation method based on convolution self-attention network
Wang et al. A new approach for advertising CTR prediction based on deep neural network via attention mechanism
Yakhchi et al. Towards a deep attention-based sequential recommender system
Li et al. A Hybrid Prediction Model for E-Commerce Customer Churn Based on Logistic Regression and Extreme Gradient Boosting Algorithm.
Leng et al. Recurrent convolution basket map for diversity next-basket recommendation
CN116976505A (en) Click rate prediction method of decoupling attention network based on information sharing
Zou et al. Deep field relation neural network for click-through rate prediction
Kumar et al. Mgu-gnn: Minimal gated unit based graph neural network for session-based recommendation
Zhang et al. Attentive hybrid recurrent neural networks for sequential recommendation
CN116757747A (en) Click rate prediction method based on behavior sequence and feature importance
CN115687757A (en) Recommendation method fusing hierarchical attention and feature interaction and application system thereof
Zhang et al. Graph spring network and informative anchor selection for session-based recommendation
Yin et al. PeNet: A feature excitation learning approach to advertisement click-through rate prediction
Salehi et al. Attribute-based collaborative filtering using genetic algorithm and weighted c-means algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20220913