CN115048855A

CN115048855A - Click rate prediction model, training method and application device thereof

Info

Publication number: CN115048855A
Application number: CN202210485479.7A
Authority: CN
Inventors: 黄发良; 尹云飞; 戴智鹏; 张意晨; 任皓; 农伟骏; 孙敬钦
Original assignee: Nanning Normal University
Current assignee: Nanning Normal University
Priority date: 2022-05-06
Filing date: 2022-05-06
Publication date: 2022-09-13

Abstract

The invention discloses a click rate prediction model and a training method and an application device thereof, wherein the click rate prediction model comprises the following steps: the input layer is used for reading user instance information in high-dimensional sparse data of single-hot coding or multi-hot coding; the embedding layer is used for converting the high-dimensional sparse data into low-dimensional dense vectors to obtain embedded representation of the user and the project; the optimized product enhancement layer is used for compressing, exciting and scaling the embedded representation at a feature domain level, a feature dimension level and a global feature bit level to realize the embedded representation; a depth factorization module and a depth residual error network module for further capturing higher order implicit mutual information of the embedded features for the optimized embedded representation; and the output layer is used for integrating the outputs of the depth factor decomposition module and the depth residual error network module and calculating the probability of clicking the target item. The click rate prediction model trained by the invention can effectively improve the click rate prediction performance of the recommendation system, and can be widely applied to e-commerce scenes of advertisement click rate prediction.

Description

Click rate prediction model, training method and application device thereof

Technical Field

The invention relates to the technical field of click rate prediction, in particular to a click rate prediction model, a training method and an application device thereof.

Background

With the rapid development of information technology, various online services such as e-commerce, news information, social platforms and the like are in a large number, which brings convenience to life of people and also causes a serious problem of information overload, so that people are difficult to find information meeting their own needs in a data ocean. The recommendation system is one of the important technical means for alleviating the problem, and is a click-through rate prediction technology of a core function module of the recommendation system, which is paid much attention by the academic and industrial fields, and different researches propose various click-through rate prediction methods and models from different perspectives, so that the recommendation system is widely applied to various news recommendation platforms, e-commerce platforms and computational advertising industries.

In addition, the optimization work of the existing click rate prediction model on the initial embedded feature vector of the model is rarely considered, so that the information of the cross vector fine granularity cannot be effectively utilized, and the prediction performance of the model is damaged.

Disclosure of Invention

It is an object of the present invention to address at least the above-mentioned deficiencies and to provide at least the advantages which will be described hereinafter.

The click rate prediction model is designed with a product enhancement network to optimize the initial embedded characteristic vector of the model to obtain a product enhanced embedded vector, and simultaneously, a depth factorization machine and a depth residual error neural network are combined in parallel, so that the accuracy of the CTR prediction model and the computation time cost of the training model can be better balanced.

The invention provides a click rate prediction model, which comprises the following components:

the input layer is used for reading in user instance information in high-dimensional sparse data of single-hot coding or multi-hot coding;

the embedding layer is used for converting the high-dimensional sparse data into low-dimensional dense vectors to obtain embedded representation of the user and the project;

the product enhancement layer is used for compressing, exciting and scaling the embedded representation at a feature domain level, a feature dimension level and a global feature bit level to realize optimization of the embedded representation;

a depth factorization module and a depth residual error network module for further capturing high order implicit mutual information of the embedded features for the optimized embedded representation;

and the output layer is used for integrating the output of the depth factorization module and the output of the depth residual error network module and calculating the probability of clicking the target item.

Preferably, the user instance information includes a user attribute, an item attribute, or a user historical behavior attribute. Such as: the user's age, gender, installation time, registration status, city, province, active login, historical purchase status, historical purchase amount, etc.

The invention provides a corresponding training method for the click rate prediction model, which comprises the following steps:

step 1, initializing a parameter set of the click rate prediction model;

step 2, sampling batch samples from all example samples;

step 3, carrying out sparse coding in a single-hot mode or a multi-hot mode on discrete attributes and discretized continuous attributes in the user instance information through an input layer of the click rate prediction model to obtain high-dimensional sparse representation of the user instance information;

step 4, converting the high-dimensional sparse representation of the user instance information into a low-dimensional dense vector through an embedding layer of the click rate prediction model to obtain the embedded representation of the user and the project;

step 5, compressing, exciting and scaling the low-dimensional embedding representation feature domain level, the feature dimension level and the global feature bit level of the example information through the product enhancement layer of the click rate prediction model to realize the optimization of the embedding representation;

step 6, a depth factor decomposition module is utilized to embed and represent the data instance based on the product enhancement layer optimization, and high-order implicit interactive information in the embedded characteristic representation is further captured;

step 7, utilizing a depth residual error network module to embed and represent the data instance based on the product enhancement layer optimization, and further capturing high-order implicit interactive information in the embedded characteristic representation;

step 8, integrating the output of the depth factor decomposition module and the depth residual error network module, and calculating the clicked prediction score of the target project;

step 9, calculating the click prediction Loss based on the cross entropy by using the actual condition of the item click and the predicted condition of the item click;

step 10, optimizing cross entropy loss by adopting an Adam optimizer based on error back propagation, adjusting weight parameters of the prediction model, and searching the learning rate between {0.0001 and 0.01 };

and 11, training the prediction model in a mode of circularly executing the steps from 2 to 10 until a specified iteration number.

Preferably, in the step 4, the embedded representation e ═ e { e } of the user and the item is obtained by the following method ₁ ，e ₂ ，…，e _m }：

e _i ＝M _i X _i ，i＝1，2，…，m

e _j ＝M _j X _i ，i＝1，2，…，m

Wherein X _i And X _j One-hot coded vector and multiple-hot coded vector, e, of discrete attribute i and discrete attribute j, respectively, of data instance X _i And e _j Respectively showing the current input instance X in the feature field F _i And F _j Corresponding embedding vector, M _i Is k × n _i Real number matrix of n _i Is a feature field F _i The number of included features, m is the set of feature fields F ═ { F ═ F ₁ ，F ₂ ，…，F _m The size of the leaf.

Preferably, the embedded representation is optimized in step 5 by: firstly, the embedded layer outputs an E-type variant matrix E through a type variant operator reshape ⁰ Then using a compression operator

Compressing original embedded e to k-dimensional statistical vector

And

using excitation operators

Learning based on statistical vector p ¹ Is embedded in the weights of the different potential attributes of the vector, using an excitation operator

Learning based on statistics p ₂ Using the excitation operator

Learning is based on vector e learning the integral embedding features, using scaling operations with q ¹ 、q ² And q is ³ Weighting the initial embedded vector for weight to obtain E ¹ ＝E ⁰ q ¹ 、E ² ＝(E ⁰ ) ^T q ² And e _bit ＝eq ³ (ii) a Finally, the output average pooling of the compression operator, the excitation operator and the scaling operation is obtained

Wherein, i is more than or equal to 1 and less than or equal to k, and j is more than or equal to 1 and less than or equal to m in the compression operator; in the excitation operator q ¹ 、q ² And q is ³ Respectively vectors of length k, m and km,

and

the matrix is respectively of the scale of k multiplied by rk, rk multiplied by k, m multiplied by rm, rm multiplied by m, km multiplied by rkm and km multiplied by km, r is a parameter for controlling the dimension increasing proportion; in scaling operation E ¹ And E ² Are all m × k matrices, e _bit Is a vector of length mk.

Preferably, the high-order implicit interaction information embedded in the feature representation is captured in step 6 by: first, the second-order feature interaction layer is used to output the vector of the product enhancement layer

Aggregating vectors of length k

Then, using the inclusion L ₁ A feedforward network pair e of feedforward layers _BI Performing a non-linear transformation, wherein L (L ═ 0, 1, …, L ₁ ) A feed-forward layer mainly related to

Wherein Relu is the excitation function, BN is the batch normalization operator,

and

weights and offsets, W, respectively, for the layer I feedforward layer _dfm An output layer weight for the depth factorization module.

Preferably, step 7 utilizes a catalyst comprising L ₂ A depth residual network module of each residual layer captures high-order implicit interactive information in the embedded feature representation: no. (L ═ 0, 1, …, L ₂ ) The residual layer mainly relates to

Wherein

And

is the weight of the l-th layer residual layer,

is the offset of the residual layer of the l-th layer, W _drn And the output layer weight of the depth residual error network module.

Preferably, the predicted score of the target item clicked is calculated in step 8 as

Wherein the content of the first and second substances,

and calculating the prediction score of the clicked target item for the click rate prediction model, wherein sigmoid is an excitation function.

Preferably, the cross entropy loss in step 9 is:

wherein, theta is a parameter set of the recommendation model, N is the total number of training examples, lambda is the weight of the regularization term, y _i Is the training data label of whether the ith item was recommended.

The invention also aims to provide a click rate prediction method by utilizing the trained model, and the click rate prediction method can effectively improve the click rate prediction performance of the recommendation system. The method for predicting click rate comprises the following steps:

acquiring user instance information to be predicted;

inputting user instance information into the click rate prediction model obtained by training, and calculating the click probability of the user on the target item;

recommending the target item corresponding to the click probability which is greater than or equal to the preset click rate threshold value to the user.

Still another object of the present invention is to provide an apparatus for predicting a click rate using the trained model, comprising:

an acquisition unit configured to acquire user instance information;

the prediction unit is configured to input the user instance information into a click rate prediction model obtained by training according to the method, and obtain a predicted click probability of the user for clicking a target item;

and the pushing unit is configured to recommend the target item corresponding to the click probability larger than or equal to a preset click rate threshold value to the terminal of the user.

The invention at least comprises the following beneficial effects:

compared with the traditional click rate prediction model and training method, the method provided by the invention designs the product enhancement network to optimize the initial embedded vector, namely, the embedded vector with the initial characteristics is subjected to scaling attention at the characteristic domain level, the characteristic dimension level and the global characteristic bit level to obtain the product enhancement embedded vector. Meanwhile, the invention utilizes the depth factorization model to capture the high-order interaction information among the optimized input data instance characteristics on one hand, and utilizes the depth residual error connection to improve the training efficiency of the prediction model on the other hand, thereby more effectively depicting the potential interest of customers, better balancing the accuracy of the CTR prediction model and the calculation time cost of the training model, improving the click rate prediction performance of the recommendation system, being widely applied to the recommendation system and improving the active service quality of information.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.

Drawings

FIG. 1 is a flow chart of a click-through rate prediction model of the present invention.

FIG. 2 is a schematic diagram of a product enhancement layer in the click-through rate prediction model of the present invention.

FIG. 3 is a bar graph of the runtime per training cycle for different models on an Avazu dataset.

FIG. 4 is a bar graph of the run time per training cycle for different models on the Criteo dataset.

FIG. 5 is a bar graph of the runtime per training cycle for different models on a Movie dataset.

FIG. 6 is a bar graph of run time per training period for different models on the Book dataset.

Detailed Description

The present invention is further described in detail below with reference to examples so that those skilled in the art can practice the invention with reference to the description.

It is to be noted that the experimental methods described in the following embodiments are all conventional methods unless otherwise specified, and the reagents and materials are commercially available unless otherwise specified.

As shown in FIGS. 1-2, the present invention discloses a click rate prediction model, which comprises:

the input layer is used for reading in user instance information in high-dimensional sparse data of single-hot coding or multi-hot coding; the user instance information includes user attributes, item attributes, or user historical behavior attributes. Such as: the user's age, gender, installation time, registration status, city, province, active login, historical purchase status, historical purchase amount, etc.

a depth factorization module and a depth residual error network module for further capturing high-order implicit mutual information of the embedded features for the optimized embedded representation;

The click rate prediction model of the invention is trained as follows:

step 1.1: initializing a parameter set of the click rate prediction model;

step 1.2: sampling a batch sample or a small batch sample from all example samples;

step 1.3: carrying out sparse coding in a single-hot mode or a multi-hot mode on discrete attributes and discretized continuous attributes in the user instance information through an input layer of a click rate prediction model to obtain high-dimensional sparse representation of the user instance information;

step 1.4: converting the high-dimensional sparse representation of the user instance information into a low-dimensional dense vector through an embedding layer of a click rate prediction model to obtain embedded representation of the user and the project;

in said step 1.4, the embedded representation e ═ e { e } of the user and the item can be obtained according to the following method ₁ ，e ₂ ，…，e _m }：

e _i ＝M _i X _i ，i＝1，2，…，m

e _j ＝M _j X _j ，j＝1，2，…，m

Wherein X _i And X _j One-hot coded vector and multiple-hot coded vector, e, of discrete attribute i and discrete attribute j, respectively, of data instance X _i And e _j Respectively representing the current input instance X in the feature field F _i And F _j Corresponding embedding vector, M _i Is k × n _i Real number matrix of n _i Is a feature field F _i The number of included features, m is the set of feature fields F ═ { F ═ F ₁ ，F ₂ ，…，F _m The size of the page;

step 1.5: compressing, exciting and scaling a low-dimensional embedding representation feature domain level, a feature dimension level and a global feature bit level of example information through a product enhancement layer of the click rate prediction model to realize optimization of embedding representation;

in said step 1.5, the embedded representation can be optimized according to the following method: firstly, the embedded layer outputs an E-type transformation matrix E through a type transformation operator reshape ⁰ Then using a compression operator

Compressing original embedding e to k-dimensional statistical vector

And

using excitation operators

Learning based on statistical vector p ¹ The feature of (3) is embedded into the weights of different potential attributes of the vector using an excitation operator

Learning based on statistics p ² Using the excitation operator

Learning is based on directionQuantity e learns the overall embedded features, using scaling operations with q ¹ 、q ² And q is ³ Weighting the initial embedded vector for weight to obtain E ¹ ＝E ⁰ q ¹ 、E ² ＝(E ⁰ ) ^T q ² And e _bit ＝eq ³ (ii) a Finally, the output average pooling of the compression operator, the excitation operator and the scaling operation is obtained

That is: compression operator

And

compressing original embedding e to a statistical vector of size k using a summation method

And

wherein, the first and the second end of the pipe are connected with each other,

excitation operator

Based on statistical vector p ¹ Learning feature embedding vector weights, excitation operators, of different potential attributes

Based on statistic p ² Learning the weight of the embedded vector for each feature field, exciting the operator

Learning integral embedded features based on vector e, mainly involving

Wherein q is ¹ 、q ² And q is ³ Respectively vectors of length k, m and km,

and

the matrices are of the scale k × rk, rk × k, m × rm, rm × m, km × rkm and km × km, respectively, and r is a parameter controlling the scale of the added dimension.

Scaling operations 3 weights q of original features learned by excitation operations ¹ 、q ² And q is ³ Weighting the initial embedded vector, primarily involving

E ¹ ＝E ⁰ q ¹

E ² ＝(E ⁰ ) ^T q ²

e _bit ＝eq ³

Wherein E ¹ And E ² All are m × k matrices, e _bit Is a vector of length mk.

Vectorization of E ¹ And E ² And averagely pooling the outputs of the compression operator, the excitation operator and the scaling operation, mainly relating to

e _dim ＝reshape(E ¹ )

e _field ＝reshape(E ² )

Step 1.6: and utilizing a depth factorization module to embed the representation of the data instance based on the product enhancement layer optimization, and further capturing high-order implicit interaction information in the embedded feature representation.

In said step 1.6, the higher order implicit interaction information embedded in the feature representation can be captured by: first, the second-order feature interaction layer is used to output the vector of the product enhancement layer

Aggregating vectors of length k

Then, using the inclusion L ₁ A pair of feedforward networks e of a feedforward layer _BI Performing a non-linear transformation, wherein L (L ═ 0, 1, …, L ₁ ) A feed-forward layer mainly related to

and

Step 1.7: and utilizing a depth residual error network module to embed the representation into the data instance based on the product enhancement layer optimization, and further capturing high-order implicit interactive information in the embedded feature representation.

In said step 1.7, the inclusion of L is utilized ₂ A depth residual network module of each residual layer captures high-order implicit interactive information in the embedded feature representation: no. (L ═ 0, 1, …, L ₂ ) The residual layer mainly relates to

Wherein

And

is the weight of the l-th layer residual layer,

is the offset of the l-th layer residual layer, W _drn And the output layer weight of the depth residual error network module. The symbolic representation of a point within a circle is the product of the corresponding elements of the matrix.

Step 1.8: integrating the output of the depth factor decomposition module and the depth residual error network module, and calculating the clicked prediction score of the target item

Wherein the content of the first and second substances,

and calculating the clicked prediction score of the item for the click rate prediction model, wherein sigmoid is an excitation function.

Step 2.9: and calculating the cross entropy Loss by utilizing the actual condition of item clicking and the predicted condition of item clicking as follows:

Step 1.10: optimizing cross entropy loss by adopting an Adam optimizer based on error back propagation, adjusting weight parameters of the click rate prediction model, and searching the learning rate between {0.0001 and 0.01 }; the search design of the learning rate enables the training speed to be faster, and is beneficial to faster model convergence;

step 1.11: training the prediction model in a loop execution from step 1.2 to step 1.10 to a specified number of iterations, wherein the specified number of iterations is determined according to error stability of cross entropy loss obtained by the loop execution, and the fluctuation of the cross entropy loss obtained in the last 10 consecutive times is less than 10 ^-8 And finishing the training. The hit rate prediction model after training is marked as PeDRNFM.

< Effect test >

In order to evaluate the performance and efficiency of the click rate prediction model training, 4 standard data sets for click rate prediction were used for experiments. Table 1 lists the statistics of these 4 data sets for training, validation and testing, respectively.

TABLE 1 feature statistics of data sets

The model for test training comprises the following

LR: which is the most classical baseline model in click-through rate estimation.

WD: it is a model based on deep networks, combining deep learning networks and LR in parallel from the perspective of "generalization" and "memory".

FNN: the method is a model based on a deep network, and comprises the steps of utilizing FM pre-training feature embedding and then DNN to perform subsequent deep feature interaction.

IPNN: the model is based on a deep network, belongs to a PNN model and is in the same family with OPNN. After embedding the layer, performing explicit feature interaction in an inner product mode.

DCN: the method is a model based on a deep network, provides explicit interaction of cross network learning characteristics, uses a neural network module to learn implicit interaction of the characteristics, and finally synthesizes the output of the two modules.

Deep FM: the deep learning network and the FM are combined in parallel, and a first-order feature, an explicit second-order feature interaction and an implicit feature interaction are learned at the same time.

XDeepFM: on the basis of DeepFM, a CIN network is further designed to replace an explicit interaction module, and estimation capability of a feature interaction promotion model is carried out at a vector level.

AFN: the model is based on a deep network, a logarithmic neural network layer for adaptively adjusting interaction orders is designed, and the orders of characteristic interaction can be selected through the network, so that the performance of the model is improved.

Autolnt: by utilizing the multi-level multi-head self-attention neural network, different-order feature combinations of the input features can be modeled.

FiBiNET: the importance of the features is dynamically learned through an Squeeze-Excitation network mechanism, and meanwhile, the interaction between the features is learned through a bilinear function.

FNFM: the method has strong second-order feature interactive learning capability.

DRM: the method is a model based on a deep network, and the importance of each embedding dimension to the model is calculated through an attention structure, so that the embedding quality of features is improved.

maskNet: the method is a model based on a deep network, the capability of DNN excavation of complex interaction features can be improved by using MaskBlock, maskNet families have a parallel model and a serial model due to different maskBlock stacking modes, the serial model is MaskNet, and the parallel model is MaskNetx.

PeDRNFM: the click rate prediction model is trained by the method.

Evaluation index

And evaluating the performance of all models by adopting two common indexes of AUC and Logloss in the click rate prediction task. AUC refers to the area under the ROC curve, the probability that the evaluation model assigns a higher prediction score to a positive sample, and the larger the AUC value, the better the model performance. Logloss refers to cross entropy loss, with smaller values indicating better performance of the model.

Performance evaluation

To quantitatively analyze the performance of the pedrinfm model, we selected a current representative click-through rate prediction model and the pedrinfm model to perform click-through rate prediction performance analysis on 4 experimental data sets.

Details of the experiment

All methods are implemented in a framework pytorreh framework, and the feature embedding dimension k of all CTR models is fixed to 16. For a fair comparison, for the model containing DNN, its hidden layer parameters are set to (16, 16), and the dropout range is set to search between {0.1-0.9 }. The Batch Size (Batch Size) is set to 4096 for Avazu and Criteo datasets and searched between 128, 256, 512 for Movie and Book datasets. The learning rate of the Adam optimizer searches between 0.01, 0.001, 0.0001. The objective function regularization term parameter λ is searched between {0.001, 0.0001, 0.0001, 0.00001 }. The random number seeds are fixed at 42. Next, the proprietary hyper-parameters of each model are described:

for the AFM model, the Attention Factor (Attention Factor) size is set to 16; for the DCN and DCNv2 models, the number of interleaved layers was set to 3; for the xDeepFM model, the parameters of the CIN network are set to (16, 16); for the AFN model, the number of the LNN network vector magnitude logarithmic neurons is set to be 1500; for the AutoInt model, the dimension size of the attention layer is set to 64, the number of the attention heads is 2, the number of the attention layers is 3, and residual connection is started; for the DIFM model, the dimension size of the attention layer is set to be 16, the number of the attention heads is 2, the number of the attention layers is 3, and residual error connection is started; for the FiBiNET and DeepFiBiNET models, the Reduction Ratio (Reduction Ratio) is set to 3; for two variants of MaskNet: SerMaskNet and ParaMaskNet, maskbock number set to 3, especially for FFM and FNFM models, the feature embedding dimension is set to 4 due to limitations of machine performance.

Effectiveness analysis

Table 2 shows the results of the effectiveness experiments of various representative click rate prediction models. As can be seen from table 2, observing the experimental results in the table, it can be seen that: 1) the performance of all depth CTR prediction models is higher than that of LR, and the insufficient modeling capacity of the LR on a CTR prediction task is explained again; 2) the IPNN and FiBiNET models have good prediction performance on all data sets, the working modes of the FiBiNET and the IPNN are that each cross feature vector is explicitly calculated, each cross feature vector is subjected to summation pooling, and finally a vector with the same dimensionality and cross feature number is obtained, and the working mode can better realize feature interaction; 3) the PeDRNFM can obtain the optimal performance compared with other baseline models, and the PeDRNFM can effectively optimize the embedded vector of the initial feature.

TABLE 2 comparison of effectiveness of different training models

Time efficiency analysis

Fig. 3, 4, 5 and 6 show the run time of each training cycle for different models on four data sets, and from a review of this figure: 1) due to the fact that the number of Avazu and Criteo characteristic domains is large and the number of samples is large, the running time of the model on the two data sets is higher than that of the model on the Movie and Book data sets; 2) on all data sets, the LR reasoning efficiency is highest, and the FNFM, FiBiNET and xDeepFM reasoning efficiency is lowest. The time complexity of FiBiNET and FNFM is high because it requires explicit computation of interactions between features from different feature domains. Each layer of the CIN network is calculated by taking all vectors of the previous layer and all vectors of the input layer to perform a Hadamard product operation, and thus xDeepFM also has a particularly high computational complexity. 3) IPNN has a late calculation time for Avazu and Criteo datasets relative to runtime ordering on Movie and Book datasets because of the higher number of feature fields in the former datasets. The AFN has more logarithmic operation at the LNN network part, and the calculation of the characteristic cross layer of AutoInt is complex, so the actual calculation time of the AFN and the characteristic cross layer is longer; 4) the time required for each training period (epoch) by the PeDRNFM is relatively average over the four data sets. In combination with table 2, it can be seen that the PeDRNFM has better prediction performance at the same time, and in conclusion, the PeDRNFM trained by the present invention better balances the prediction accuracy and the time cost of model training.

Through the test and analysis, the trained model has more advantages in application in a recommendation system for predicting click rate, and can effectively improve click rate prediction performance of the recommendation system and improve information active service quality.

When the method is applied to a recommendation system for predicting click rate, user instance information to be predicted is obtained by utilizing a computer readable medium, a server and the like; inputting user instance information into the trained click rate prediction model, and calculating the click probability of the user on the target item; the target items corresponding to the click probability being greater than or equal to the preset click rate threshold value are recommended to the user, so that the recommendation of commodities, information, videos and the like related to the user is realized, and the method can be widely applied to electronic commerce scenes for predicting the click rate of advertisements.

The click rate prediction model is applied to a recommendation system by adopting a click rate prediction device, and the method comprises the following steps: an acquisition unit configured to acquire user instance information; the prediction unit is configured to input the user instance information into the trained click rate prediction model to obtain the predicted click probability of the user clicking the target item; and the pushing unit is configured to recommend the target item corresponding to the click probability greater than or equal to a preset click rate threshold value to the terminal of the user.

While embodiments of the invention have been disclosed above, it is not intended to be limited to the uses set forth in the specification and examples. It can be applied to all kinds of fields suitable for the present invention. Additional modifications will readily occur to those skilled in the art.

Claims

1. The click rate prediction model is characterized by comprising the following components:

the depth factorization module and the depth residual error network module are used for further capturing high-order implicit interactive information of the embedded features for the optimized embedded representation;

2. The click-rate prediction model of claim 1 wherein the user instance information comprises user attributes, item attributes, or user historical behavior attributes.

3. A method for training a click-through rate prediction model, according to the click-through rate prediction model of claim 1 or 2, the method comprising:

step 1, initializing a parameter set of the click rate prediction model;

step 2, sampling batch samples from all example samples;

4. A training method according to claim 3, wherein in said step 4, an embedded representation e-e of a user and an item is obtained by ₁ ，e ₂ ，…，e _m }：

e _i ＝M _i X _i ，i＝1，2，…，m

e _j ＝M _j X _j ，i＝1，2，…，m

Wherein X _i And X _j Discrete attribute i and discrete attribute of data instance X, respectivelyOne-hot coded vector and multiple-hot coded vector of character j, e _i And e _j Respectively showing the current input instance X in the feature field F _i And F _j Corresponding embedding vector, M _i Is k × n _i Real number matrix of n _i Is a feature field F _i The number of included features, m is the set of feature fields F ═ { F ═ F ₁ ，F ₂ ，…，F _m The size of the leaf.

5. Training method according to claim 4, characterized in that in step 5 the embedded representation is optimized by: firstly, the embedded layer outputs an E-type transformation matrix E through a type transformation operator reshape ⁰ Then using a compression operator

And

compressing original embedding e to k-dimensional statistical vector

And

using excitation operators

Learning based on statistics p ² Using the excitation operator

Learning Whole embedding features based on vector e learning, using scalingOperate with q ¹ 、q ² And q is ³ Weighting the initial embedded vector for weight to obtain E ¹ ＝E ⁰ q ¹ 、E ² ＝(E ⁰ ) ^T q ² And e _bit ＝eq ³ (ii) a Finally, the output average pooling of the compression operator, the excitation operator and the scaling operation is obtained

and

6. The training method of claim 5, wherein the step 6 captures the higher-order implicit interactive information embedded in the feature representation by: first, the second-order feature interaction layer is used to output the vector of the product enhancement layer

Aggregating vectors of length k

α ⁰ ＝e _BI

Wherein Relu is an excitation function, BN is a batch normalization operator,

and

weights and offsets, W, respectively, of the layer I feedforward layer _dfm An output layer weight for the depth factorization module;

the step 7 comprises the step of using a catalyst containing L ₂ A depth residual network module of each residual layer to capture higher-order implicit mutual information embedded in the feature representation: no. (L) 0, 1, …, L ₂ ) The residual layer mainly relates to

q ₀ ＝e

Wherein

And

is the weight of the l-th layer residual layer,

is the offset of the l-th layer residual layer, W _drn And the output layer weight of the depth residual error network module.

7. The training method of claim 6, wherein the predictive score for the target item clicked on in step 8 is calculated as

Wherein the content of the first and second substances,

8. The training method of claim 7, wherein the cross-entropy penalty in step 9 is:

9. A method for predicting click through rates, comprising:

acquiring user instance information to be predicted;

inputting user instance information into a click rate prediction model obtained by training according to the method of any one of claims 3-8, and calculating the click probability of a user on a target item;

10. An apparatus for predicting click through rate, comprising:

an acquisition unit configured to acquire user instance information;

a prediction unit configured to input the user instance information into a click rate prediction model trained according to the method of any one of claims 3-8, so as to obtain a predicted click probability of the user clicking on a target item;

and the pushing unit is configured to recommend the target item corresponding to the click probability greater than or equal to a preset click rate threshold value to the terminal of the user.