CN110738314B

CN110738314B - Click rate prediction method and device based on deep migration network

Info

Publication number: CN110738314B
Application number: CN201910991888.2A
Authority: CN
Inventors: 郑子彬; 许海城
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2019-10-17
Filing date: 2019-10-17
Publication date: 2023-05-02
Anticipated expiration: 2039-10-17
Also published as: CN110738314A

Abstract

The invention discloses a click rate prediction method and a click rate prediction device based on a depth migration network, wherein the click rate prediction device is used for realizing the method, and the method comprises the steps of discretizing continuous fields; preprocessing discrete features, converting the discrete features into feature id codes, and obtaining a mapping dictionary; converting the characteristic id code into a characterization vector by using a Glove model as an initialization parameter of the depth migration network embedded layer; inputting the feature id code into a depth migration network for training; discretizing the test sample, and mapping the discrete features into feature id codes by using a mapping dictionary M; and inputting the feature id code into a depth migration network to predict the click rate. The invention optimizes the click rate prediction method, improves the prediction accuracy and simultaneously keeps the low predicted time delay.

Description

Click rate prediction method and device based on deep migration network

Technical Field

The invention relates to the field of big data processing, in particular to a click rate prediction method and device based on a deep migration network.

Background

With the popularization of the internet, everywhere in life is closely related to the internet: go up to wash out treasures and go to the east to shop, go up to the beauty group and hunger, go up to vacate the video and look at the movie in the curiosity. The large amount of clicking actions of users on the Internet accumulate a large amount of precious data for platforms such as Taobao, beijing dong, mei Tuo and the like, the precious data force the platforms to put resources into, and the data generate visual values, such as calculating estimated scenes of advertising or recommending systems by using the data.

The main goal of calculating the advertisement is to integrate three-party information of an advertiser, an advertisement position provider and a user, so that more accurate advertisement delivery is performed, the advertisement effect of the advertiser, the income of the advertisement position and the user experience are further improved, the situation of multi-win is achieved, in this link, the most important ring is how to perform accurate advertisement delivery, and the technology used is various methods for estimating the click rate (predicting one field of the probability of clicking the advertisement by the user). The method has the advantages that the method is simple in structure, the method is convenient to use, the data in the advertisement field is easy to use, the data in the advertisement field is calculated, the data in the advertisement field is high in sparsity and huge in magnitude, the simplest linear model like logistic regression is widely used initially, but the estimated scene of click rate needs to simultaneously consider a plurality of objects such as users and advertisements, the combination among the features is far more important than the independent consideration of each feature, a later developed Factorization Model (FM) expresses the features by using a characterization vector, the combined information among the features is expressed by the characterization vector inner product, the recent deep learning model is greatly successful in each field, the deep learning model based on a neural network is gradually applied to the click rate estimated field to make up the defect that the FM model does not comprise the feature combination with the second order or more, and the application of the scene of massive data which needs to generate the prediction result in real time is limited in calculating the advertisement although the effect of the deep learning model is greatly improved compared with the FM model.

In general, the click rate estimation method is continuously optimized to achieve remarkable effect improvement, but the low-delay requirement to be considered in actual use is gradually ignored.

Disclosure of Invention

The main purpose of the invention is to provide a click rate prediction method based on a deep migration network, which aims to overcome the problems.

In order to achieve the above purpose, the click rate prediction method based on the deep migration network provided by the invention comprises the following steps:

s10, discretizing continuous fields of the training samples to obtain discrete features of the training samples;

s20, creating a unique feature id index code for each training sample discrete feature, and creating a mapping dictionary M of the discrete feature according to the mapping relation between the training sample discrete feature and the feature id index code;

s30, counting the co-occurrence frequency of the feature id index codes in different samples to create a feature co-occurrence frequency matrix, converting the feature id index codes into a feature vector matrix of the feature co-occurrence frequency matrix through a Glove model, and taking the feature vector as an initialization parameter of an Embedding layer of the depth migration network;

s40, inputting the feature id index code into a depth migration network to obtain cross entropy loss of predicted click rate, and updating all parameters of the depth migration network by adopting a back propagation algorithm, wherein the depth migration network comprises an embedded layer Embedding for converting the feature id index code into corresponding characterization vectors, a factorized FM network for carrying out inner product on the corresponding characterization vectors to obtain FM predicted click rate, a shared perceptron network for carrying out nonlinear transformation on the corresponding characterization vectors to obtain abstract characterization vectors, a lightweight perceptron network for inputting the abstract characterization vectors to obtain lightweight predicted click rate and a deep perceptron network for inputting the abstract characterization vectors to obtain deep predicted click rate;

S50, discretizing the test sample to obtain discrete features of the test sample, and carrying out matching mapping on the discrete features of the test sample based on a mapping dictionary M to obtain feature id index codes of the test sample;

s60, inputting the characteristic id index code of the test sample into the deep migration network for prediction to obtain click rate prediction of the test sample.

Preferably, the depth migration network includes an embedded layer Embedding, a factorized FM network, a shared perceptron network, a lightweight perceptron network, and a deep perceptron network, and the S40 includes:

s401, inputting the characteristic id index code into an embedded layer Embedding of a depth migration network to obtain a corresponding characterization vector;

S402S401 inputs the feature id index code into an embedded layer Embedding of the depth migration network to obtain a corresponding characterization vector;

s402, inputting the corresponding characterization vector into a factorization FM network, and carrying out inner product on the corresponding characterization vector by the factorization FM network to obtain an FM predicted click rate p _fm (x) Wherein the inner product formula is as follows:

wherein x is the input feature id index code, v is the characterization vector representing the feature id index code, i, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, W _fm Weight parameter for FM network linear regression term, b _fm Is an offset parameter of the linear regression term of the FM network,<v _i ,v _j >to characterize vector v _i And v _j Is calculated by the inner product of (2);

the corresponding characterization vector is input into a shared perceptron network to carry out nonlinear transformation, so as to obtain an abstract characterization vector: h is a _s ＝sigmoid(W _s v+b _s ) Wherein v is a characterization vector input to the shared perceptron network, W _s And weight parameters for the shared perceptron network, b _s To share bias parameters of perceptron network, h _s The abstract representation vector is output by the shared perceptron;

abstract representation vector h _s Inputting into a deep perceptron network, and obtaining the predicted click rate p of the deep perceptron network through the following feedforward calculation formula _deep (x)：

Where the ReLU is the activation function,

for the first layer weight parameter of the deep perceptron network, < ->

For the first layer bias parameter of the deep perceptron network, < ->

For the output vector of the first layer, h _s The output vector of the shared perceptron network is needed to be set manually for the specific layer number;

abstract representation vector h _s Inputting the light-weight perceptron network, and obtaining the predicted click rate p of the light-weight perceptron network through the following feedforward calculation formula _light (x)：

Where the ReLU is the activation function,

layer one weight parameter for lightweight perceptron network, < ->

Layer one bias parameters for lightweight perceptron network,/->

For the output vector of the first layer, h _s The number of layers takes an empirical value for sharing the output vector of the perceptron network, and the number of layers of the light-weight perceptron network is less than that of the deep perceptron network;

s403, calculating click rate loss L (x; W, b) by integrating predicted click rates of an FM network, a lightweight perceptron network and a deep perceptron network, wherein the calculation formula is as follows:

L(x；W,b)＝H(y,p _fm (x))+H(y,p _light (x))+H(y,p _deep (x))+λ||z _light (x)-z _deep (x)|| ² ，

wherein H (y, p) is a cross entropy loss function commonly used for classification tasks, x is an input characteristic id index code, y is a classification label value of training data, and p is p _fm (x) The click rate predicted value of the FM network is represented by the time, and p is p _light (x) The click rate predicted value of the light-weight perceptron network is expressed, and p is p _deep (x) When the click rate predicted value of the depth network is represented, lambda is a weight value for determining the predicted error of the light-weight perceptron network and the depth network, lambda takes a checked value, and z _light (x) Model output value p before sigmoid conversion for light-weight perceptron network _light (x)＝sigmoid(z _light (x))，z _deep (x) Model output value p before sigmoid conversion for depth network _deep (x)＝sigmoid(z _deep (x))；

S404, updating all parameters of the deep migration network by adopting a back propagation algorithm according to the click rate loss L (x; W, b).

Preferably, the formula of the discretization in S10 is specifically as follows:

wherein V is the value of the continuous characteristic, D is the integer value after discretization, N is the constant, >

To round down the symbol, N needs to be determined according to a specific characteristic value range.

Preferably, the step S30 specifically includes:

s301, taking the co-occurrence frequency of the training sample feature id index code in different samples as a matrix element, and creating a feature co-occurrence frequency matrix C _n×n ；

S302 feature co-occurrence frequency matrix C _n×n Matrix decomposition is performed based on matrix multiplication recovery errors to obtain a matrix V _n×k Parameter and gradient descent update matrix V _n×k Parameters, matrix decomposition formulas such as;

wherein->

Is a matrix C _n×n Element C of (2) _i,j I, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, b _i And b _j Is a bias term. The calculation is based on matrix element C _i,j And its estimated value +.>

Error J of (c):

C _i,j is a matrix C _n×n V of elements (v) _i And v _j Is a matrix V _n×k Row vector, b _i And b _j As a bias term and with the aim of minimizing the error J, V is updated by gradient descent _n×k Parameters;

s303 to update the matrix V _n×k As a token vector for the corresponding feature.

The invention also discloses a click rate prediction device based on the deep migration network, which is used for realizing the method and comprises the following steps:

the discrete module is used for carrying out discretization processing on continuous fields of the training samples so as to obtain discrete characteristics of the training samples;

The mapping module is used for creating a unique feature id index code for each training sample discrete feature and creating a mapping dictionary M of the discrete feature according to the mapping relation between the training sample discrete feature and the feature id index code;

the characteristic characterization module is used for counting the co-occurrence frequency of the characteristic id index codes in different samples to create a characteristic co-occurrence frequency matrix, converting the characteristic id index codes into a characterization vector matrix of the characteristic co-occurrence frequency matrix through a Glove model, and taking the characterization vector as an initialization parameter of an Embedding layer of the depth migration network;

the training module is used for inputting the characteristic id index code into the deep migration network for training to acquire click rate loss, and updating all parameters of the deep migration network by adopting a back propagation algorithm;

the test module is used for discretizing the test sample to obtain discrete characteristics of the test sample, and carrying out matching mapping on the discrete characteristics of the test sample based on the mapping dictionary M to obtain a characteristic id index code of the test sample;

and the prediction module is used for inputting the characteristic id index code of the test sample into the deep migration network for training to obtain click rate prediction of the test sample.

Preferably, the training module comprises:

The Embedding sub-module is used for inputting the characteristic id index code into an Embedding layer Embedding of the depth migration network to obtain a corresponding characterization vector;

the subnet prediction submodule is used for inputting the corresponding characterization vector into a factorization FM network, and the factorization FM network carries out inner product on the corresponding characterization vector to obtain an FM predicted click rate p _fm (x) Wherein the inner product formula is as follows:

Where the ReLU is the activation function,

is a first layer weight parameter of the deep layer perceptron network,/>

for the first layer bias parameter of the deep perceptron network, < ->

Where the ReLU is the activation function,

layer one weight parameter for lightweight perceptron network, < ->

Layer one bias parameters for lightweight perceptron network,/->

the integrated prediction sub-module is used for integrating the predicted click rate of the FM network, the light-weight perceptron network and the deep perceptron network to calculate the click rate loss L (x; W, b), and the calculation formula is as follows:

wherein H (y, p) is a cross entropy loss function commonly used for classification tasks, x is an input characteristic id index code, y is a value of a classification label of training data, and p=p _fm (x) When the click rate is predicted value of FM network, p=p _light (x) Predicted click rate for lightweight perceptron network, p=p _deep (x) For the click rate predicted value of the depth network, lambda is a weight value for determining the predicted errors of the light-weight perceptron network and the depth network, lambda takes a checked value, and z _light (x) Model output value p before sigmoid conversion for light-weight perceptron network _light (x)＝sigmoid(z _light (x))，z _deep (x) Model output value p before sigmoid conversion for depth network _deep (x)＝sigmoid(z _deep (x))；

And the parameter updating sub-module is used for updating all parameters of the deep migration network by adopting a back propagation algorithm according to the click rate loss L (x; W, b).

Preferably, the formula of the discretization processing in the discretization module is specifically as follows:

wherein V is the value of the continuous characteristic, D is the integer value after discretization, N is the constant,>

Preferably, the feature characterization module includes:

the feature co-occurrence sub-module is used for creating a feature co-occurrence frequency matrix C by taking the co-occurrence frequency of the training sample feature id index code in different samples as a matrix element _n×n ；

A decomposition sub-module for co-occurrence frequency matrix C _n×n Matrix decomposition is performed based on matrix multiplication recovery errors to obtain a matrix V _n×k Parameter and gradient descent update matrix V _n×k Parameters, matrix decomposition formulas such as;

wherein->

Is a matrix C _n×n Element C of (2) _i,j I, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, b _i And b _j For the bias term, the calculation is based on matrix element C, targeting error minimization _i,j And its estimated value +.>

Error J of (c):

C _i,j is a matrix C _n×n V of elements (v) _i And v _j Is a matrix V _n×k Row vector, b _i And b _j Updating V by gradient descent for bias term _n×k ；

A feature characterization sub-module for characterizing the matrix V after decomposition _n×k As a token vector for the corresponding feature.

Compared with the prior art, the invention has the beneficial effects that: the defect that an FM model in the prior art does not contain the feature combination with the second order or more is overcome, the deep perceptron network is guided to learn the light perceptron network by utilizing the strong migration learning capability in the deep migration network, so that the light perceptron network with better effect and better performance is obtained, finally the FM network, the light perceptron network and the deep perceptron network are integrated to train and obtain the click rate loss, the click rate prediction method is optimized, and the prediction accuracy is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a diagram of a model architecture of a deep migration network of the present invention;

the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that, if directional indications (such as up, down, left, right, front, and rear … …) are included in the embodiments of the present invention, the directional indications are merely used to explain the relative positional relationship, movement conditions, etc. between the components in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are correspondingly changed.

In addition, if there is a description of "first", "second", etc. in the embodiments of the present invention, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.

The click rate prediction method based on the deep migration network provided by the invention comprises the following steps:

s30, counting the co-occurrence frequency of the feature id index codes in different samples to create a feature co-occurrence frequency matrix, converting the feature id index codes into a feature vector matrix of the feature co-occurrence frequency matrix through a Glove model, training the feature vector as a depth migration network to obtain click rate loss, and updating initialization parameters of an Embeddding layer of the depth migration network by adopting a back propagation algorithm;

In the embodiment of the invention, the migration learning is considered to be combined into the click rate estimation method, so that the aim of improving the prediction effect of the light-weight perceptron network while keeping the low-time delay advantage of the light-weight perceptron network is fulfilled. Meanwhile, the Glove technology is combined to initialize the characterization vector, so that the training process of the deep migration network is more stable.

Besides the data to be processed before being transmitted into the input assembly, the initial parameters of the Embedding layer in the input assembly before the training stage are initialized by the characterization vector obtained by the Glove technology.

The invention overcomes the defect that the FM model in the prior art does not contain the feature combination with the second order or more, utilizes the strong migration learning capability in the deep migration network to enable the deep perceptron network to guide the learning of the light perceptron network, thereby obtaining the light perceptron network with better effect and better performance, and finally integrating the FM network, the light perceptron network and the deep perceptron network to train and acquire the click rate loss, thereby optimizing the click rate prediction method and improving the prediction accuracy.

Preferably, the depth migration network comprises an embedded layer Embedding for converting the feature id index code into corresponding characterization vectors, a factorization FM network for carrying out inner product on the corresponding characterization vectors to obtain FM predicted click rate, a shared perceptron network for carrying out nonlinear transformation on the corresponding characterization vectors to obtain abstract characterization vectors, a lightweight perceptron network for inputting the abstract characterization vectors to obtain lightweight predicted click rate and a deep perceptron network for inputting the abstract characterization vectors to obtain deep predicted click rate.

Preferably, the S40 includes:

wherein x is the input feature id index code, v is the characterization vector representing the feature id index code, i, j e n, i and j areDifferent feature id index code subscripts, n is the total number of feature id index codes, W _fm Weight parameter for FM network linear regression term, b _fm Is an offset parameter of the linear regression term of the FM network, <v _i ,v _j >To characterize vector v _i And v _j Is calculated by the inner product of (2);

Where the ReLU is the activation function,

for the first layer weight parameter of the deep perceptron network, < ->

For the first layer bias parameter of the deep perceptron network, < ->

abstract representation vector h _s Input light-weight sensing machine netObtaining the predicted click rate p of the light-weight perceptron network through the following feedforward calculation formula _light (x)：

Where the ReLU is the activation function,

layer one weight parameter for lightweight perceptron network, < ->

Layer one bias parameters for lightweight perceptron network,/->

For the output vector of the first layer, h _s The number of layers takes an empirical value for sharing the output vector of the perceptron network, and the number of layers of the light-weight perceptron network is less than that of the deep perceptron network; / >

wherein H (y, p) is a cross entropy loss function commonly used for classification tasks, x is an input characteristic id index code, y is a classification label value of training data, and p is p _fm (x) The click rate predicted value of the FM network is represented by the time, and p is p _light (x) The click rate predicted value of the light-weight perceptron network is expressed, and p is p _deep (x) And represents the click rate predicted value of the depth network, lambda is a weight value for determining the predicted errors of the lightweight perceptron network and the depth network, lambda takes the checked value,z _light (x) Model output value p before sigmoid conversion for light-weight perceptron network _light (x)＝sigmoid(z _light (x))，z _deep (x) Model output value p before sigmoid conversion for depth network _deep (x)＝sigmoid(z _deep (x))；

Preferably, the step S30 specifically includes:

wherein->

Error J of (c):

Preferably, the training samples are data for marking click category labels, the click category comprises non-click and click, the non-click category label is 0, and the click category label is 1; the test sample is data of a click-free class label.

The invention also discloses a click rate prediction device based on the deep migration network, which is used for realizing the method, and the method refers to the embodiment, and the device adopts all the technical schemes of all the embodiments, so that the device at least has all the beneficial effects brought by the technical schemes of the embodiments, and the description is omitted herein. It comprises the following steps:

Preferably, the training module comprises:

Where the ReLU is the activation function,

for the first layer weight parameter of the deep perceptron network, < ->

For the first layer bias parameter of the deep perceptron network, < ->

Where the ReLU is the activation function,

layer one weight parameter for lightweight perceptron network, < ->

Layer one bias parameters for lightweight perceptron network,/->

wherein V is the value of the continuous characteristic, D isDiscretized integer value, N is a constant, < ->

Preferably, the feature characterization module includes:

wherein->

Error J of (c):

A feature characterization sub-module for characterizing the matrix V after decomposition _n×k As corresponding featuresIs described.

It is additionally understood that the model architecture of the deep migration network is shown in fig. 1, and the model has five components:

1. input assembly

The input component needs data to be input in a discrete feature id code form, and then the feature id code is transmitted into the coding layer to generate a characterization vector, wherein the feature id code is processed into the feature id code, so that the coding layer can conveniently and quickly take out the corresponding characterization vector. Therefore, before data is transmitted to the input assembly, discretization processing is performed on continuous floating point values such as the amount of money, and specifically, discretization processing can be performed on long-tail distributed floating point values with a very wide value range according to the following formula:

wherein V is the value of the continuous characteristic, D is the integer value after discretization, N is the constant,

To round down the symbol. Furthermore, the features that are discrete in nature, such as gender= [ male, female, ], are also processed]To be converted into gender= [0,1]The method comprises the steps of carrying out a first treatment on the surface of the For age= [12, 13, ], 80]To convert to age= [0,1, ], 68]。

Besides processing the data before the data is transmitted into the input assembly, the initial parameters of the Embedding layer in the input assembly before the training phase are initialized by the characterization vector obtained by the Glove technology, and the steps of converting the feature id code into the characterization vector by utilizing the Glove are as follows:

s3.1: counting the number of times of the co-occurrence of the different samples in the data according to the co-occurrence characteristics of the different samples to finally obtain a characteristic co-occurrence frequency matrix C _n×n ；

S3.2: the characteristic co-occurrence frequency matrix is subjected to the following matrix decomposition:

wherein V is _n×k For the decomposed matrix, bias is the bias term. Recovering an error based on the following matrix multiplication:

wherein C is _i,j Is a matrix C _n×n V of elements (v) _i And v _j Is a matrix V _n×k Row vector, b _i And b _j As a deviation term, a matrix V is obtained through gradient descent update _n×k Parameters of (2)

S3.3: matrix V after characteristic co-occurrence frequency matrix decomposition _n×k The row vector serves as a characterization vector for the corresponding feature.

FM network

The FM network takes the characterization vector as input, takes the inner product of the characterization vector as the characteristic combination, and estimates the scene with high data sparseness at the click rate in the explicit second-order characteristic combination mode, so that the scene has higher efficiency and better generalization capability. Compared with the fact that the perceptron network does not contain explicit feature combinations, the FM network is integrated into the deep migration network, and an Embedding layer in the deep migration network can be guided to learn better characterization vectors.

3. Shared perceptron network

The shared perceptron network takes the characterization vector as input, converts the original characterization vector into a more abstract characterization vector by using the complex nonlinear mapping capability of the perceptron network, and enables the subsequent lightweight perceptron network and the deep perceptron network to share the abstract characterization vector as the input of the network.

4. Deep layer perceptron network

The deep perceptron network takes the abstract representation vector converted by the shared perceptron network as input, further carries out nonlinear combination through more layers, so that the deep perceptron network has the capability of representing higher-order feature combination, and therefore, the light perceptron network has better performance, in order to transfer the information learned by the deep perceptron network to the light perceptron network, the output click rate of the deep perceptron network is utilized to enrich whether the original data is clicked or not to obtain 0-1 labels, the training sample adopts the data marked with the click category labels, the click category comprises non-clicked and clicked, the non-clicked category label is 0, and the clicked category label is 1; the test sample is data of a click-free class label. Compared with the prior label which only provides the attribution information 1 or 0 of the category, the output click rate of the deep perception machine network can provide more information, so that the probability of one category is known to be larger than that of the other category, and the exact probability value strength information can be known.

5. Light-weight perceptron network

The light-weight perceptron network takes the abstract representation vector converted by the shared perceptron network as input, on one hand, the information of the abstract representation vector is utilized through nonlinear combination of a plurality of shallow layers to achieve the aim of more accurately predicting the click rate, and on the other hand, the information mined from the data by the deep perceptron network is learned through fitting the predicted click rate of the deep perceptron network to achieve the aim of improving the prediction effect and simultaneously keeping the light-weight perceptron network low-delay.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structural changes made by the description of the present invention and the accompanying drawings or direct/indirect application in other related technical fields are included in the scope of the invention.

Claims

1. The click rate prediction method based on the deep migration network is characterized by comprising the following steps of:

s40, inputting the characteristic id index code into the depth migration network to obtain cross entropy loss of predicted click rate, and updating all parameters of the depth migration network by adopting a back propagation algorithm; comprising the following steps:

wherein x is the input feature id index code, v is the characterization vector representing the feature id index code, i, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, W _fm Weight parameter for FM network linear regression term, b _fm Bias parameter of FM network linear regression term, < v _i ,v _j Is > the characterization vector v _i And v _j Is calculated by the inner product of (2);

abstract representation vector h _s Inputting into deep perceptron network, obtaining deep by feedforward calculation formula as followsLayer perceptron network predictive click rate p _deep (x)：

Where the ReLU is the activation function,

for the first layer weight parameter of the deep perceptron network, < ->

For the first layer bias parameter of the deep perceptron network, < ->

Where the ReLU is the activation function,

layer one weight parameter for lightweight perceptron network, < ->

Layer one bias parameters for lightweight perceptron network,/->

S404, updating all parameters of the deep migration network by adopting a back propagation algorithm according to the click rate loss L (x; W, b);

s60, inputting the characteristic id index code of the test sample into the deep migration network for prediction, and obtaining click rate prediction of the test sample.

2. The click rate prediction method based on a depth migration network of claim 1, wherein the depth migration network comprises an embedded layer embedded for converting feature id index codes into corresponding token vectors, a factorized FM network for inner product of the corresponding token vectors to obtain FM predicted click rates, a shared perceptron network for non-linear transformation of the corresponding token vectors to obtain abstract token vectors, a lightweight perceptron network for input abstract token vectors to obtain lightweight predicted click rates, and a deep perceptron network for input abstract token vectors to obtain deep predicted click rates.

3. The click rate prediction method based on the depth migration network according to claim 1, wherein the discretization processing formula in S10 is specifically as follows:

4. The click rate prediction method based on the depth migration network of claim 1, wherein S30 specifically comprises:

s301, taking the feature id index codes of training samples as matrix elements and creating a feature co-occurrence frequency matrix C _n×n ；

wherein->

Is a matrix C _n×n Element C of (2) _i,j I, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, b _i And b _j For the bias term, the calculation is based on matrix element C _i,j And its estimated value +.>

Error J of (c):

5. The click rate prediction method based on the deep migration network according to claim 1, wherein the training samples are data for marking click category labels, the click categories include not clicked and clicked, the not clicked category labels are 0, and the clicked category labels are 1; the test sample is data of a click-free class label.

6. A click rate prediction apparatus based on a deep migration network, comprising:

The training module is used for inputting the characteristic id index code into the deep migration network for training to acquire click rate loss, and updating all parameters of the deep migration network by adopting a back propagation algorithm; comprising the following steps:

Where the ReLU is the activation function,

for the first layer weight parameter of the deep perceptron network, < ->

For the first layer bias parameter of the deep perceptron network, < ->

Where the ReLU is the activation function,

layer one weight parameter for lightweight perceptron network, < ->

Layer one bias for lightweight perceptron networkParameter setting up->

The parameter updating sub-module is used for updating all parameters of the deep migration network by adopting a back propagation algorithm according to the click rate loss L (x; W, b);

7. The click rate prediction apparatus based on a depth migration network of claim 6, wherein the discretization processing formula in the discretization module is specifically as follows:

8. The depth migration network-based click rate prediction apparatus of claim 6, wherein the feature characterization module comprises:

wherein->

Error J of (c):

A feature characterization sub-module for characterizing the updated matrix V _n×k As a token vector for the corresponding feature.