CN110738314B - Click rate prediction method and device based on deep migration network - Google Patents

Click rate prediction method and device based on deep migration network Download PDF

Info

Publication number
CN110738314B
CN110738314B CN201910991888.2A CN201910991888A CN110738314B CN 110738314 B CN110738314 B CN 110738314B CN 201910991888 A CN201910991888 A CN 201910991888A CN 110738314 B CN110738314 B CN 110738314B
Authority
CN
China
Prior art keywords
network
feature
deep
perceptron
click rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910991888.2A
Other languages
Chinese (zh)
Other versions
CN110738314A (en
Inventor
郑子彬
许海城
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201910991888.2A priority Critical patent/CN110738314B/en
Publication of CN110738314A publication Critical patent/CN110738314A/en
Application granted granted Critical
Publication of CN110738314B publication Critical patent/CN110738314B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0463Neocognitrons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0263Targeted advertisements based upon Internet or website rating

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a click rate prediction method and a click rate prediction device based on a depth migration network, wherein the click rate prediction device is used for realizing the method, and the method comprises the steps of discretizing continuous fields; preprocessing discrete features, converting the discrete features into feature id codes, and obtaining a mapping dictionary; converting the characteristic id code into a characterization vector by using a Glove model as an initialization parameter of the depth migration network embedded layer; inputting the feature id code into a depth migration network for training; discretizing the test sample, and mapping the discrete features into feature id codes by using a mapping dictionary M; and inputting the feature id code into a depth migration network to predict the click rate. The invention optimizes the click rate prediction method, improves the prediction accuracy and simultaneously keeps the low predicted time delay.

Description

Click rate prediction method and device based on deep migration network
Technical Field
The invention relates to the field of big data processing, in particular to a click rate prediction method and device based on a deep migration network.
Background
With the popularization of the internet, everywhere in life is closely related to the internet: go up to wash out treasures and go to the east to shop, go up to the beauty group and hunger, go up to vacate the video and look at the movie in the curiosity. The large amount of clicking actions of users on the Internet accumulate a large amount of precious data for platforms such as Taobao, beijing dong, mei Tuo and the like, the precious data force the platforms to put resources into, and the data generate visual values, such as calculating estimated scenes of advertising or recommending systems by using the data.
The main goal of calculating the advertisement is to integrate three-party information of an advertiser, an advertisement position provider and a user, so that more accurate advertisement delivery is performed, the advertisement effect of the advertiser, the income of the advertisement position and the user experience are further improved, the situation of multi-win is achieved, in this link, the most important ring is how to perform accurate advertisement delivery, and the technology used is various methods for estimating the click rate (predicting one field of the probability of clicking the advertisement by the user). The method has the advantages that the method is simple in structure, the method is convenient to use, the data in the advertisement field is easy to use, the data in the advertisement field is calculated, the data in the advertisement field is high in sparsity and huge in magnitude, the simplest linear model like logistic regression is widely used initially, but the estimated scene of click rate needs to simultaneously consider a plurality of objects such as users and advertisements, the combination among the features is far more important than the independent consideration of each feature, a later developed Factorization Model (FM) expresses the features by using a characterization vector, the combined information among the features is expressed by the characterization vector inner product, the recent deep learning model is greatly successful in each field, the deep learning model based on a neural network is gradually applied to the click rate estimated field to make up the defect that the FM model does not comprise the feature combination with the second order or more, and the application of the scene of massive data which needs to generate the prediction result in real time is limited in calculating the advertisement although the effect of the deep learning model is greatly improved compared with the FM model.
In general, the click rate estimation method is continuously optimized to achieve remarkable effect improvement, but the low-delay requirement to be considered in actual use is gradually ignored.
Disclosure of Invention
The main purpose of the invention is to provide a click rate prediction method based on a deep migration network, which aims to overcome the problems.
In order to achieve the above purpose, the click rate prediction method based on the deep migration network provided by the invention comprises the following steps:
s10, discretizing continuous fields of the training samples to obtain discrete features of the training samples;
s20, creating a unique feature id index code for each training sample discrete feature, and creating a mapping dictionary M of the discrete feature according to the mapping relation between the training sample discrete feature and the feature id index code;
s30, counting the co-occurrence frequency of the feature id index codes in different samples to create a feature co-occurrence frequency matrix, converting the feature id index codes into a feature vector matrix of the feature co-occurrence frequency matrix through a Glove model, and taking the feature vector as an initialization parameter of an Embedding layer of the depth migration network;
s40, inputting the feature id index code into a depth migration network to obtain cross entropy loss of predicted click rate, and updating all parameters of the depth migration network by adopting a back propagation algorithm, wherein the depth migration network comprises an embedded layer Embedding for converting the feature id index code into corresponding characterization vectors, a factorized FM network for carrying out inner product on the corresponding characterization vectors to obtain FM predicted click rate, a shared perceptron network for carrying out nonlinear transformation on the corresponding characterization vectors to obtain abstract characterization vectors, a lightweight perceptron network for inputting the abstract characterization vectors to obtain lightweight predicted click rate and a deep perceptron network for inputting the abstract characterization vectors to obtain deep predicted click rate;
S50, discretizing the test sample to obtain discrete features of the test sample, and carrying out matching mapping on the discrete features of the test sample based on a mapping dictionary M to obtain feature id index codes of the test sample;
s60, inputting the characteristic id index code of the test sample into the deep migration network for prediction to obtain click rate prediction of the test sample.
Preferably, the depth migration network includes an embedded layer Embedding, a factorized FM network, a shared perceptron network, a lightweight perceptron network, and a deep perceptron network, and the S40 includes:
s401, inputting the characteristic id index code into an embedded layer Embedding of a depth migration network to obtain a corresponding characterization vector;
S402S401 inputs the feature id index code into an embedded layer Embedding of the depth migration network to obtain a corresponding characterization vector;
s402, inputting the corresponding characterization vector into a factorization FM network, and carrying out inner product on the corresponding characterization vector by the factorization FM network to obtain an FM predicted click rate p fm (x) Wherein the inner product formula is as follows:
Figure BDA0002237793400000031
wherein x is the input feature id index code, v is the characterization vector representing the feature id index code, i, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, W fm Weight parameter for FM network linear regression term, b fm Is an offset parameter of the linear regression term of the FM network,<v i ,v j >to characterize vector v i And v j Is calculated by the inner product of (2);
the corresponding characterization vector is input into a shared perceptron network to carry out nonlinear transformation, so as to obtain an abstract characterization vector: h is a s =sigmoid(W s v+b s ) Wherein v is a characterization vector input to the shared perceptron network, W s And weight parameters for the shared perceptron network, b s To share bias parameters of perceptron network, h s The abstract representation vector is output by the shared perceptron;
abstract representation vector h s Inputting into a deep perceptron network, and obtaining the predicted click rate p of the deep perceptron network through the following feedforward calculation formula deep (x):
Figure BDA0002237793400000032
Figure BDA0002237793400000033
Where the ReLU is the activation function,
Figure BDA0002237793400000034
for the first layer weight parameter of the deep perceptron network, < ->
Figure BDA0002237793400000035
For the first layer bias parameter of the deep perceptron network, < ->
Figure BDA0002237793400000036
For the output vector of the first layer, h s The output vector of the shared perceptron network is needed to be set manually for the specific layer number;
abstract representation vector h s Inputting the light-weight perceptron network, and obtaining the predicted click rate p of the light-weight perceptron network through the following feedforward calculation formula light (x):
Figure BDA0002237793400000037
Figure BDA0002237793400000038
Where the ReLU is the activation function,
Figure BDA0002237793400000039
layer one weight parameter for lightweight perceptron network, < ->
Figure BDA00022377934000000310
Layer one bias parameters for lightweight perceptron network,/->
Figure BDA00022377934000000311
For the output vector of the first layer, h s The number of layers takes an empirical value for sharing the output vector of the perceptron network, and the number of layers of the light-weight perceptron network is less than that of the deep perceptron network;
s403, calculating click rate loss L (x; W, b) by integrating predicted click rates of an FM network, a lightweight perceptron network and a deep perceptron network, wherein the calculation formula is as follows:
L(x;W,b)=H(y,p fm (x))+H(y,p light (x))+H(y,p deep (x))+λ||z light (x)-z deep (x)|| 2
wherein H (y, p) is a cross entropy loss function commonly used for classification tasks, x is an input characteristic id index code, y is a classification label value of training data, and p is p fm (x) The click rate predicted value of the FM network is represented by the time, and p is p light (x) The click rate predicted value of the light-weight perceptron network is expressed, and p is p deep (x) When the click rate predicted value of the depth network is represented, lambda is a weight value for determining the predicted error of the light-weight perceptron network and the depth network, lambda takes a checked value, and z light (x) Model output value p before sigmoid conversion for light-weight perceptron network light (x)=sigmoid(z light (x)),z deep (x) Model output value p before sigmoid conversion for depth network deep (x)=sigmoid(z deep (x));
S404, updating all parameters of the deep migration network by adopting a back propagation algorithm according to the click rate loss L (x; W, b).
Preferably, the formula of the discretization in S10 is specifically as follows:
Figure BDA0002237793400000041
wherein V is the value of the continuous characteristic, D is the integer value after discretization, N is the constant, >
Figure BDA0002237793400000042
To round down the symbol, N needs to be determined according to a specific characteristic value range.
Preferably, the step S30 specifically includes:
s301, taking the co-occurrence frequency of the training sample feature id index code in different samples as a matrix element, and creating a feature co-occurrence frequency matrix C n×n
S302 feature co-occurrence frequency matrix C n×n Matrix decomposition is performed based on matrix multiplication recovery errors to obtain a matrix V n×k Parameter and gradient descent update matrix V n×k Parameters, matrix decomposition formulas such as;
Figure BDA0002237793400000043
wherein->
Figure BDA0002237793400000047
Is a matrix C n×n Element C of (2) i,j I, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, b i And b j Is a bias term. The calculation is based on matrix element C i,j And its estimated value +.>
Figure BDA0002237793400000045
Error J of (c):
Figure BDA0002237793400000046
C i,j is a matrix C n×n V of elements (v) i And v j Is a matrix V n×k Row vector, b i And b j As a bias term and with the aim of minimizing the error J, V is updated by gradient descent n×k Parameters;
s303 to update the matrix V n×k As a token vector for the corresponding feature.
The invention also discloses a click rate prediction device based on the deep migration network, which is used for realizing the method and comprises the following steps:
the discrete module is used for carrying out discretization processing on continuous fields of the training samples so as to obtain discrete characteristics of the training samples;
The mapping module is used for creating a unique feature id index code for each training sample discrete feature and creating a mapping dictionary M of the discrete feature according to the mapping relation between the training sample discrete feature and the feature id index code;
the characteristic characterization module is used for counting the co-occurrence frequency of the characteristic id index codes in different samples to create a characteristic co-occurrence frequency matrix, converting the characteristic id index codes into a characterization vector matrix of the characteristic co-occurrence frequency matrix through a Glove model, and taking the characterization vector as an initialization parameter of an Embedding layer of the depth migration network;
the training module is used for inputting the characteristic id index code into the deep migration network for training to acquire click rate loss, and updating all parameters of the deep migration network by adopting a back propagation algorithm;
the test module is used for discretizing the test sample to obtain discrete characteristics of the test sample, and carrying out matching mapping on the discrete characteristics of the test sample based on the mapping dictionary M to obtain a characteristic id index code of the test sample;
and the prediction module is used for inputting the characteristic id index code of the test sample into the deep migration network for training to obtain click rate prediction of the test sample.
Preferably, the training module comprises:
The Embedding sub-module is used for inputting the characteristic id index code into an Embedding layer Embedding of the depth migration network to obtain a corresponding characterization vector;
the subnet prediction submodule is used for inputting the corresponding characterization vector into a factorization FM network, and the factorization FM network carries out inner product on the corresponding characterization vector to obtain an FM predicted click rate p fm (x) Wherein the inner product formula is as follows:
Figure BDA0002237793400000051
wherein x is the input feature id index code, v is the characterization vector representing the feature id index code, i, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, W fm Weight parameter for FM network linear regression term, b fm Is an offset parameter of the linear regression term of the FM network,<v i ,v j >to characterize vector v i And v j Is calculated by the inner product of (2);
the corresponding characterization vector is input into a shared perceptron network to carry out nonlinear transformation, so as to obtain an abstract characterization vector: h is a s =sigmoid(W s v+b s ) Wherein v is a characterization vector input to the shared perceptron network, W s And weight parameters for the shared perceptron network, b s To share bias parameters of perceptron network, h s The abstract representation vector is output by the shared perceptron;
abstract representation vector h s Inputting into a deep perceptron network, and obtaining the predicted click rate p of the deep perceptron network through the following feedforward calculation formula deep (x):
Figure BDA0002237793400000052
Figure BDA0002237793400000053
Where the ReLU is the activation function,
Figure BDA0002237793400000061
is a first layer weight parameter of the deep layer perceptron network,/>
Figure BDA0002237793400000062
for the first layer bias parameter of the deep perceptron network, < ->
Figure BDA0002237793400000063
For the output vector of the first layer, h s The output vector of the shared perceptron network is needed to be set manually for the specific layer number;
abstract representation vector h s Inputting the light-weight perceptron network, and obtaining the predicted click rate p of the light-weight perceptron network through the following feedforward calculation formula light (x):
Figure BDA0002237793400000064
Figure BDA0002237793400000065
Where the ReLU is the activation function,
Figure BDA0002237793400000066
layer one weight parameter for lightweight perceptron network, < ->
Figure BDA0002237793400000067
Layer one bias parameters for lightweight perceptron network,/->
Figure BDA0002237793400000068
For the output vector of the first layer, h s The number of layers takes an empirical value for sharing the output vector of the perceptron network, and the number of layers of the light-weight perceptron network is less than that of the deep perceptron network;
the integrated prediction sub-module is used for integrating the predicted click rate of the FM network, the light-weight perceptron network and the deep perceptron network to calculate the click rate loss L (x; W, b), and the calculation formula is as follows:
L(x;W,b)=H(y,p fm (x))+H(y,p light (x))+H(y,p deep (x))+λ||z light (x)-z deep (x)|| 2
wherein H (y, p) is a cross entropy loss function commonly used for classification tasks, x is an input characteristic id index code, y is a value of a classification label of training data, and p=p fm (x) When the click rate is predicted value of FM network, p=p light (x) Predicted click rate for lightweight perceptron network, p=p deep (x) For the click rate predicted value of the depth network, lambda is a weight value for determining the predicted errors of the light-weight perceptron network and the depth network, lambda takes a checked value, and z light (x) Model output value p before sigmoid conversion for light-weight perceptron network light (x)=sigmoid(z light (x)),z deep (x) Model output value p before sigmoid conversion for depth network deep (x)=sigmoid(z deep (x));
And the parameter updating sub-module is used for updating all parameters of the deep migration network by adopting a back propagation algorithm according to the click rate loss L (x; W, b).
Preferably, the formula of the discretization processing in the discretization module is specifically as follows:
Figure BDA0002237793400000069
wherein V is the value of the continuous characteristic, D is the integer value after discretization, N is the constant,>
Figure BDA00022377934000000610
to round down the symbol, N needs to be determined according to a specific characteristic value range.
Preferably, the feature characterization module includes:
the feature co-occurrence sub-module is used for creating a feature co-occurrence frequency matrix C by taking the co-occurrence frequency of the training sample feature id index code in different samples as a matrix element n×n
A decomposition sub-module for co-occurrence frequency matrix C n×n Matrix decomposition is performed based on matrix multiplication recovery errors to obtain a matrix V n×k Parameter and gradient descent update matrix V n×k Parameters, matrix decomposition formulas such as;
Figure BDA0002237793400000071
wherein->
Figure BDA0002237793400000072
Is a matrix C n×n Element C of (2) i,j I, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, b i And b j For the bias term, the calculation is based on matrix element C, targeting error minimization i,j And its estimated value +.>
Figure BDA0002237793400000073
Error J of (c):
Figure BDA0002237793400000074
C i,j is a matrix C n×n V of elements (v) i And v j Is a matrix V n×k Row vector, b i And b j Updating V by gradient descent for bias term n×k
A feature characterization sub-module for characterizing the matrix V after decomposition n×k As a token vector for the corresponding feature.
Compared with the prior art, the invention has the beneficial effects that: the defect that an FM model in the prior art does not contain the feature combination with the second order or more is overcome, the deep perceptron network is guided to learn the light perceptron network by utilizing the strong migration learning capability in the deep migration network, so that the light perceptron network with better effect and better performance is obtained, finally the FM network, the light perceptron network and the deep perceptron network are integrated to train and obtain the click rate loss, the click rate prediction method is optimized, and the prediction accuracy is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram of a model architecture of a deep migration network of the present invention;
the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, if directional indications (such as up, down, left, right, front, and rear … …) are included in the embodiments of the present invention, the directional indications are merely used to explain the relative positional relationship, movement conditions, etc. between the components in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are correspondingly changed.
In addition, if there is a description of "first", "second", etc. in the embodiments of the present invention, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.
The click rate prediction method based on the deep migration network provided by the invention comprises the following steps:
s10, discretizing continuous fields of the training samples to obtain discrete features of the training samples;
s20, creating a unique feature id index code for each training sample discrete feature, and creating a mapping dictionary M of the discrete feature according to the mapping relation between the training sample discrete feature and the feature id index code;
s30, counting the co-occurrence frequency of the feature id index codes in different samples to create a feature co-occurrence frequency matrix, converting the feature id index codes into a feature vector matrix of the feature co-occurrence frequency matrix through a Glove model, training the feature vector as a depth migration network to obtain click rate loss, and updating initialization parameters of an Embeddding layer of the depth migration network by adopting a back propagation algorithm;
s40, inputting the feature id index code into a depth migration network to obtain cross entropy loss of predicted click rate, and updating all parameters of the depth migration network by adopting a back propagation algorithm, wherein the depth migration network comprises an embedded layer Embedding for converting the feature id index code into corresponding characterization vectors, a factorized FM network for carrying out inner product on the corresponding characterization vectors to obtain FM predicted click rate, a shared perceptron network for carrying out nonlinear transformation on the corresponding characterization vectors to obtain abstract characterization vectors, a lightweight perceptron network for inputting the abstract characterization vectors to obtain lightweight predicted click rate and a deep perceptron network for inputting the abstract characterization vectors to obtain deep predicted click rate;
S50, discretizing the test sample to obtain discrete features of the test sample, and carrying out matching mapping on the discrete features of the test sample based on a mapping dictionary M to obtain feature id index codes of the test sample;
s60, inputting the characteristic id index code of the test sample into the deep migration network for prediction to obtain click rate prediction of the test sample.
In the embodiment of the invention, the migration learning is considered to be combined into the click rate estimation method, so that the aim of improving the prediction effect of the light-weight perceptron network while keeping the low-time delay advantage of the light-weight perceptron network is fulfilled. Meanwhile, the Glove technology is combined to initialize the characterization vector, so that the training process of the deep migration network is more stable.
Besides the data to be processed before being transmitted into the input assembly, the initial parameters of the Embedding layer in the input assembly before the training stage are initialized by the characterization vector obtained by the Glove technology.
The invention overcomes the defect that the FM model in the prior art does not contain the feature combination with the second order or more, utilizes the strong migration learning capability in the deep migration network to enable the deep perceptron network to guide the learning of the light perceptron network, thereby obtaining the light perceptron network with better effect and better performance, and finally integrating the FM network, the light perceptron network and the deep perceptron network to train and acquire the click rate loss, thereby optimizing the click rate prediction method and improving the prediction accuracy.
Preferably, the depth migration network comprises an embedded layer Embedding for converting the feature id index code into corresponding characterization vectors, a factorization FM network for carrying out inner product on the corresponding characterization vectors to obtain FM predicted click rate, a shared perceptron network for carrying out nonlinear transformation on the corresponding characterization vectors to obtain abstract characterization vectors, a lightweight perceptron network for inputting the abstract characterization vectors to obtain lightweight predicted click rate and a deep perceptron network for inputting the abstract characterization vectors to obtain deep predicted click rate.
Preferably, the S40 includes:
s401, inputting the characteristic id index code into an embedded layer Embedding of a depth migration network to obtain a corresponding characterization vector;
s402, inputting the corresponding characterization vector into a factorization FM network, and carrying out inner product on the corresponding characterization vector by the factorization FM network to obtain an FM predicted click rate p fm (x) Wherein the inner product formula is as follows:
Figure BDA0002237793400000091
wherein x is the input feature id index code, v is the characterization vector representing the feature id index code, i, j e n, i and j areDifferent feature id index code subscripts, n is the total number of feature id index codes, W fm Weight parameter for FM network linear regression term, b fm Is an offset parameter of the linear regression term of the FM network, <v i ,v j >To characterize vector v i And v j Is calculated by the inner product of (2);
the corresponding characterization vector is input into a shared perceptron network to carry out nonlinear transformation, so as to obtain an abstract characterization vector: h is a s =sigmoid(W s v+b s ) Wherein v is a characterization vector input to the shared perceptron network, W s And weight parameters for the shared perceptron network, b s To share bias parameters of perceptron network, h s The abstract representation vector is output by the shared perceptron;
abstract representation vector h s Inputting into a deep perceptron network, and obtaining the predicted click rate p of the deep perceptron network through the following feedforward calculation formula deep (x):
Figure BDA0002237793400000101
Figure BDA0002237793400000102
Where the ReLU is the activation function,
Figure BDA0002237793400000103
for the first layer weight parameter of the deep perceptron network, < ->
Figure BDA0002237793400000104
For the first layer bias parameter of the deep perceptron network, < ->
Figure BDA0002237793400000105
For the output vector of the first layer, h s The output vector of the shared perceptron network is needed to be set manually for the specific layer number;
abstract representation vector h s Input light-weight sensing machine netObtaining the predicted click rate p of the light-weight perceptron network through the following feedforward calculation formula light (x):
Figure BDA0002237793400000106
Figure BDA0002237793400000107
Where the ReLU is the activation function,
Figure BDA0002237793400000108
layer one weight parameter for lightweight perceptron network, < ->
Figure BDA0002237793400000109
Layer one bias parameters for lightweight perceptron network,/->
Figure BDA00022377934000001010
For the output vector of the first layer, h s The number of layers takes an empirical value for sharing the output vector of the perceptron network, and the number of layers of the light-weight perceptron network is less than that of the deep perceptron network; / >
S403, calculating click rate loss L (x; W, b) by integrating predicted click rates of an FM network, a lightweight perceptron network and a deep perceptron network, wherein the calculation formula is as follows:
L(x;W,b)=H(y,p fm (x))+H(y,p light (x))+H(y,p deep (x))+λ||z light (x)-z deep (x)|| 2
wherein H (y, p) is a cross entropy loss function commonly used for classification tasks, x is an input characteristic id index code, y is a classification label value of training data, and p is p fm (x) The click rate predicted value of the FM network is represented by the time, and p is p light (x) The click rate predicted value of the light-weight perceptron network is expressed, and p is p deep (x) And represents the click rate predicted value of the depth network, lambda is a weight value for determining the predicted errors of the lightweight perceptron network and the depth network, lambda takes the checked value,z light (x) Model output value p before sigmoid conversion for light-weight perceptron network light (x)=sigmoid(z light (x)),z deep (x) Model output value p before sigmoid conversion for depth network deep (x)=sigmoid(z deep (x));
S404, updating all parameters of the deep migration network by adopting a back propagation algorithm according to the click rate loss L (x; W, b).
Preferably, the formula of the discretization in S10 is specifically as follows:
Figure BDA0002237793400000111
wherein V is the value of the continuous characteristic, D is the integer value after discretization, N is the constant,>
Figure BDA0002237793400000112
to round down the symbol, N needs to be determined according to a specific characteristic value range.
Preferably, the step S30 specifically includes:
S301, taking the co-occurrence frequency of the training sample feature id index code in different samples as a matrix element, and creating a feature co-occurrence frequency matrix C n×n
S302 feature co-occurrence frequency matrix C n×n Matrix decomposition is performed based on matrix multiplication recovery errors to obtain a matrix V n×k Parameter and gradient descent update matrix V n×k Parameters, matrix decomposition formulas such as;
Figure BDA0002237793400000113
wherein->
Figure BDA0002237793400000114
Is a matrix C n×n Element C of (2) i,j I, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, b i And b j Is a bias term. The calculation is based on matrix element C i,j And its estimated value +.>
Figure BDA0002237793400000115
Error J of (c):
Figure BDA0002237793400000116
C i,j is a matrix C n×n V of elements (v) i And v j Is a matrix V n×k Row vector, b i And b j As a bias term and with the aim of minimizing the error J, V is updated by gradient descent n×k Parameters;
s303 to update the matrix V n×k As a token vector for the corresponding feature.
Preferably, the training samples are data for marking click category labels, the click category comprises non-click and click, the non-click category label is 0, and the click category label is 1; the test sample is data of a click-free class label.
The invention also discloses a click rate prediction device based on the deep migration network, which is used for realizing the method, and the method refers to the embodiment, and the device adopts all the technical schemes of all the embodiments, so that the device at least has all the beneficial effects brought by the technical schemes of the embodiments, and the description is omitted herein. It comprises the following steps:
The discrete module is used for carrying out discretization processing on continuous fields of the training samples so as to obtain discrete characteristics of the training samples;
the mapping module is used for creating a unique feature id index code for each training sample discrete feature and creating a mapping dictionary M of the discrete feature according to the mapping relation between the training sample discrete feature and the feature id index code;
the characteristic characterization module is used for counting the co-occurrence frequency of the characteristic id index codes in different samples to create a characteristic co-occurrence frequency matrix, converting the characteristic id index codes into a characterization vector matrix of the characteristic co-occurrence frequency matrix through a Glove model, and taking the characterization vector as an initialization parameter of an Embedding layer of the depth migration network;
the training module is used for inputting the characteristic id index code into the deep migration network for training to acquire click rate loss, and updating all parameters of the deep migration network by adopting a back propagation algorithm;
the test module is used for discretizing the test sample to obtain discrete characteristics of the test sample, and carrying out matching mapping on the discrete characteristics of the test sample based on the mapping dictionary M to obtain a characteristic id index code of the test sample;
and the prediction module is used for inputting the characteristic id index code of the test sample into the deep migration network for training to obtain click rate prediction of the test sample.
Preferably, the training module comprises:
the Embedding sub-module is used for inputting the characteristic id index code into an Embedding layer Embedding of the depth migration network to obtain a corresponding characterization vector;
the subnet prediction submodule is used for inputting the corresponding characterization vector into a factorization FM network, and the factorization FM network carries out inner product on the corresponding characterization vector to obtain an FM predicted click rate p fm (x) Wherein the inner product formula is as follows:
Figure BDA0002237793400000121
wherein x is the input feature id index code, v is the characterization vector representing the feature id index code, i, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, W fm Weight parameter for FM network linear regression term, b fm Is an offset parameter of the linear regression term of the FM network,<v i ,v j >to characterize vector v i And v j Is calculated by the inner product of (2);
the corresponding characterization vector is input into a shared perceptron network to carry out nonlinear transformation, so as to obtain an abstract characterization vector: h is a s =sigmoid(W s v+b s ) Wherein v is a characterization vector input to the shared perceptron network, W s And weight parameters for the shared perceptron network, b s To share bias parameters of perceptron network, h s The abstract representation vector is output by the shared perceptron;
abstract representation vector h s Inputting into a deep perceptron network, and obtaining the predicted click rate p of the deep perceptron network through the following feedforward calculation formula deep (x):
Figure BDA0002237793400000122
Figure BDA0002237793400000123
Where the ReLU is the activation function,
Figure BDA0002237793400000124
for the first layer weight parameter of the deep perceptron network, < ->
Figure BDA0002237793400000125
For the first layer bias parameter of the deep perceptron network, < ->
Figure BDA0002237793400000126
For the output vector of the first layer, h s The output vector of the shared perceptron network is needed to be set manually for the specific layer number;
abstract representation vector h s Inputting the light-weight perceptron network, and obtaining the predicted click rate p of the light-weight perceptron network through the following feedforward calculation formula light (x):
Figure BDA0002237793400000131
Figure BDA0002237793400000132
Where the ReLU is the activation function,
Figure BDA0002237793400000133
layer one weight parameter for lightweight perceptron network, < ->
Figure BDA0002237793400000134
Layer one bias parameters for lightweight perceptron network,/->
Figure BDA0002237793400000135
For the output vector of the first layer, h s The number of layers takes an empirical value for sharing the output vector of the perceptron network, and the number of layers of the light-weight perceptron network is less than that of the deep perceptron network;
the integrated prediction sub-module is used for integrating the predicted click rate of the FM network, the light-weight perceptron network and the deep perceptron network to calculate the click rate loss L (x; W, b), and the calculation formula is as follows:
L(x;W,b)=H(y,p fm (x))+H(y,p light (x))+H(y,p deep (x))+λ||z light (x)-z deep (x)|| 2
wherein H (y, p) is a cross entropy loss function commonly used for classification tasks, x is an input characteristic id index code, y is a value of a classification label of training data, and p=p fm (x) When the click rate is predicted value of FM network, p=p light (x) Predicted click rate for lightweight perceptron network, p=p deep (x) For the click rate predicted value of the depth network, lambda is a weight value for determining the predicted errors of the light-weight perceptron network and the depth network, lambda takes a checked value, and z light (x) Model output value p before sigmoid conversion for light-weight perceptron network light (x)=sigmoid(z light (x)),z deep (x) Model output value p before sigmoid conversion for depth network deep (x)=sigmoid(z deep (x));
And the parameter updating sub-module is used for updating all parameters of the deep migration network by adopting a back propagation algorithm according to the click rate loss L (x; W, b).
Preferably, the formula of the discretization processing in the discretization module is specifically as follows:
Figure BDA0002237793400000136
wherein V is the value of the continuous characteristic, D isDiscretized integer value, N is a constant, < ->
Figure BDA0002237793400000137
To round down the symbol, N needs to be determined according to a specific characteristic value range.
Preferably, the feature characterization module includes:
the feature co-occurrence sub-module is used for creating a feature co-occurrence frequency matrix C by taking the co-occurrence frequency of the training sample feature id index code in different samples as a matrix element n×n
A decomposition sub-module for co-occurrence frequency matrix C n×n Matrix decomposition is performed based on matrix multiplication recovery errors to obtain a matrix V n×k Parameter and gradient descent update matrix V n×k Parameters, matrix decomposition formulas such as;
Figure BDA0002237793400000138
wherein->
Figure BDA0002237793400000139
Is a matrix C n×n Element C of (2) i,j I, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, b i And b j For the bias term, the calculation is based on matrix element C, targeting error minimization i,j And its estimated value +.>
Figure BDA0002237793400000141
Error J of (c):
Figure BDA0002237793400000142
C i,j is a matrix C n×n V of elements (v) i And v j Is a matrix V n×k Row vector, b i And b j Updating V by gradient descent for bias term n×k
A feature characterization sub-module for characterizing the matrix V after decomposition n×k As corresponding featuresIs described.
It is additionally understood that the model architecture of the deep migration network is shown in fig. 1, and the model has five components:
1. input assembly
The input component needs data to be input in a discrete feature id code form, and then the feature id code is transmitted into the coding layer to generate a characterization vector, wherein the feature id code is processed into the feature id code, so that the coding layer can conveniently and quickly take out the corresponding characterization vector. Therefore, before data is transmitted to the input assembly, discretization processing is performed on continuous floating point values such as the amount of money, and specifically, discretization processing can be performed on long-tail distributed floating point values with a very wide value range according to the following formula:
Figure BDA0002237793400000143
wherein V is the value of the continuous characteristic, D is the integer value after discretization, N is the constant,
Figure BDA0002237793400000144
To round down the symbol. Furthermore, the features that are discrete in nature, such as gender= [ male, female, ], are also processed]To be converted into gender= [0,1]The method comprises the steps of carrying out a first treatment on the surface of the For age= [12, 13, ], 80]To convert to age= [0,1, ], 68]。
Besides processing the data before the data is transmitted into the input assembly, the initial parameters of the Embedding layer in the input assembly before the training phase are initialized by the characterization vector obtained by the Glove technology, and the steps of converting the feature id code into the characterization vector by utilizing the Glove are as follows:
s3.1: counting the number of times of the co-occurrence of the different samples in the data according to the co-occurrence characteristics of the different samples to finally obtain a characteristic co-occurrence frequency matrix C n×n
S3.2: the characteristic co-occurrence frequency matrix is subjected to the following matrix decomposition:
Figure BDA0002237793400000145
wherein V is n×k For the decomposed matrix, bias is the bias term. Recovering an error based on the following matrix multiplication:
Figure BDA0002237793400000146
wherein C is i,j Is a matrix C n×n V of elements (v) i And v j Is a matrix V n×k Row vector, b i And b j As a deviation term, a matrix V is obtained through gradient descent update n×k Parameters of (2)
S3.3: matrix V after characteristic co-occurrence frequency matrix decomposition n×k The row vector serves as a characterization vector for the corresponding feature.
FM network
The FM network takes the characterization vector as input, takes the inner product of the characterization vector as the characteristic combination, and estimates the scene with high data sparseness at the click rate in the explicit second-order characteristic combination mode, so that the scene has higher efficiency and better generalization capability. Compared with the fact that the perceptron network does not contain explicit feature combinations, the FM network is integrated into the deep migration network, and an Embedding layer in the deep migration network can be guided to learn better characterization vectors.
3. Shared perceptron network
The shared perceptron network takes the characterization vector as input, converts the original characterization vector into a more abstract characterization vector by using the complex nonlinear mapping capability of the perceptron network, and enables the subsequent lightweight perceptron network and the deep perceptron network to share the abstract characterization vector as the input of the network.
4. Deep layer perceptron network
The deep perceptron network takes the abstract representation vector converted by the shared perceptron network as input, further carries out nonlinear combination through more layers, so that the deep perceptron network has the capability of representing higher-order feature combination, and therefore, the light perceptron network has better performance, in order to transfer the information learned by the deep perceptron network to the light perceptron network, the output click rate of the deep perceptron network is utilized to enrich whether the original data is clicked or not to obtain 0-1 labels, the training sample adopts the data marked with the click category labels, the click category comprises non-clicked and clicked, the non-clicked category label is 0, and the clicked category label is 1; the test sample is data of a click-free class label. Compared with the prior label which only provides the attribution information 1 or 0 of the category, the output click rate of the deep perception machine network can provide more information, so that the probability of one category is known to be larger than that of the other category, and the exact probability value strength information can be known.
5. Light-weight perceptron network
The light-weight perceptron network takes the abstract representation vector converted by the shared perceptron network as input, on one hand, the information of the abstract representation vector is utilized through nonlinear combination of a plurality of shallow layers to achieve the aim of more accurately predicting the click rate, and on the other hand, the information mined from the data by the deep perceptron network is learned through fitting the predicted click rate of the deep perceptron network to achieve the aim of improving the prediction effect and simultaneously keeping the light-weight perceptron network low-delay.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structural changes made by the description of the present invention and the accompanying drawings or direct/indirect application in other related technical fields are included in the scope of the invention.

Claims (8)

1. The click rate prediction method based on the deep migration network is characterized by comprising the following steps of:
s10, discretizing continuous fields of the training samples to obtain discrete features of the training samples;
s20, creating a unique feature id index code for each training sample discrete feature, and creating a mapping dictionary M of the discrete feature according to the mapping relation between the training sample discrete feature and the feature id index code;
S30, counting the co-occurrence frequency of the feature id index codes in different samples to create a feature co-occurrence frequency matrix, converting the feature id index codes into a feature vector matrix of the feature co-occurrence frequency matrix through a Glove model, and taking the feature vector as an initialization parameter of an Embedding layer of the depth migration network;
s40, inputting the characteristic id index code into the depth migration network to obtain cross entropy loss of predicted click rate, and updating all parameters of the depth migration network by adopting a back propagation algorithm; comprising the following steps:
s401, inputting the characteristic id index code into an embedded layer Embedding of a depth migration network to obtain a corresponding characterization vector;
s402, inputting the corresponding characterization vector into a factorization FM network, and carrying out inner product on the corresponding characterization vector by the factorization FM network to obtain an FM predicted click rate p fm (x) Wherein the inner product formula is as follows:
Figure QLYQS_1
wherein x is the input feature id index code, v is the characterization vector representing the feature id index code, i, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, W fm Weight parameter for FM network linear regression term, b fm Bias parameter of FM network linear regression term, < v i ,v j Is > the characterization vector v i And v j Is calculated by the inner product of (2);
the corresponding characterization vector is input into a shared perceptron network to carry out nonlinear transformation, so as to obtain an abstract characterization vector: h is a s =sigmoid(W s v+b s ) Wherein v is a characterization vector input to the shared perceptron network, W s And weight parameters for the shared perceptron network, b s To share bias parameters of perceptron network, h s The abstract representation vector is output by the shared perceptron;
abstract representation vector h s Inputting into deep perceptron network, obtaining deep by feedforward calculation formula as followsLayer perceptron network predictive click rate p deep (x):
Figure QLYQS_2
Figure QLYQS_3
Where the ReLU is the activation function,
Figure QLYQS_4
for the first layer weight parameter of the deep perceptron network, < ->
Figure QLYQS_5
For the first layer bias parameter of the deep perceptron network, < ->
Figure QLYQS_6
For the output vector of the first layer, h s The output vector of the shared perceptron network is needed to be set manually for the specific layer number;
abstract representation vector h s Inputting the light-weight perceptron network, and obtaining the predicted click rate p of the light-weight perceptron network through the following feedforward calculation formula light (x):
Figure QLYQS_7
Figure QLYQS_8
Where the ReLU is the activation function,
Figure QLYQS_9
layer one weight parameter for lightweight perceptron network, < ->
Figure QLYQS_10
Layer one bias parameters for lightweight perceptron network,/->
Figure QLYQS_11
For the output vector of the first layer, h s The number of layers takes an empirical value for sharing the output vector of the perceptron network, and the number of layers of the light-weight perceptron network is less than that of the deep perceptron network;
s403, calculating click rate loss L (x; W, b) by integrating predicted click rates of an FM network, a lightweight perceptron network and a deep perceptron network, wherein the calculation formula is as follows:
L(x;W,b)=H(y,p fm (x))+H(y,p light (x))+H(y,p deep (x))+λ||z light (x)-z deep (x)|| 2
Wherein H (y, p) is a cross entropy loss function commonly used for classification tasks, x is an input characteristic id index code, y is a classification label value of training data, and p is p fm (x) The click rate predicted value of the FM network is represented by the time, and p is p light (x) The click rate predicted value of the light-weight perceptron network is expressed, and p is p deep (x) When the click rate predicted value of the depth network is represented, lambda is a weight value for determining the predicted error of the light-weight perceptron network and the depth network, lambda takes a checked value, and z light (x) Model output value p before sigmoid conversion for light-weight perceptron network light (x)=sigmoid(z light (x)),z deep (x) Model output value p before sigmoid conversion for depth network deep (x)=sigmoid(z deep (x));
S404, updating all parameters of the deep migration network by adopting a back propagation algorithm according to the click rate loss L (x; W, b);
s50, discretizing the test sample to obtain discrete features of the test sample, and carrying out matching mapping on the discrete features of the test sample based on a mapping dictionary M to obtain feature id index codes of the test sample;
s60, inputting the characteristic id index code of the test sample into the deep migration network for prediction, and obtaining click rate prediction of the test sample.
2. The click rate prediction method based on a depth migration network of claim 1, wherein the depth migration network comprises an embedded layer embedded for converting feature id index codes into corresponding token vectors, a factorized FM network for inner product of the corresponding token vectors to obtain FM predicted click rates, a shared perceptron network for non-linear transformation of the corresponding token vectors to obtain abstract token vectors, a lightweight perceptron network for input abstract token vectors to obtain lightweight predicted click rates, and a deep perceptron network for input abstract token vectors to obtain deep predicted click rates.
3. The click rate prediction method based on the depth migration network according to claim 1, wherein the discretization processing formula in S10 is specifically as follows:
Figure QLYQS_12
wherein V is the value of the continuous characteristic, D is the integer value after discretization, N is the constant,>
Figure QLYQS_13
to round down the symbol, N needs to be determined according to a specific characteristic value range.
4. The click rate prediction method based on the depth migration network of claim 1, wherein S30 specifically comprises:
s301, taking the feature id index codes of training samples as matrix elements and creating a feature co-occurrence frequency matrix C n×n
S302 feature co-occurrence frequency matrix C n×n Matrix decomposition is performed based on matrix multiplication recovery errors to obtain a matrix V n×k Parameter and gradient descent update matrix V n×k Parameters, matrix decomposition formulas such as;
Figure QLYQS_14
wherein->
Figure QLYQS_15
Is a matrix C n×n Element C of (2) i,j I, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, b i And b j For the bias term, the calculation is based on matrix element C i,j And its estimated value +.>
Figure QLYQS_16
Error J of (c):
Figure QLYQS_17
C i,j is a matrix C n×n V of elements (v) i And v j Is a matrix V n×k Row vector, b i And b j As a bias term and with the aim of minimizing the error J, V is updated by gradient descent n×k Parameters;
s303 to update the matrix V n×k As a token vector for the corresponding feature.
5. The click rate prediction method based on the deep migration network according to claim 1, wherein the training samples are data for marking click category labels, the click categories include not clicked and clicked, the not clicked category labels are 0, and the clicked category labels are 1; the test sample is data of a click-free class label.
6. A click rate prediction apparatus based on a deep migration network, comprising:
the discrete module is used for carrying out discretization processing on continuous fields of the training samples so as to obtain discrete characteristics of the training samples;
the mapping module is used for creating a unique feature id index code for each training sample discrete feature and creating a mapping dictionary M of the discrete feature according to the mapping relation between the training sample discrete feature and the feature id index code;
the characteristic characterization module is used for counting the co-occurrence frequency of the characteristic id index codes in different samples to create a characteristic co-occurrence frequency matrix, converting the characteristic id index codes into a characterization vector matrix of the characteristic co-occurrence frequency matrix through a Glove model, and taking the characterization vector as an initialization parameter of an Embedding layer of the depth migration network;
The training module is used for inputting the characteristic id index code into the deep migration network for training to acquire click rate loss, and updating all parameters of the deep migration network by adopting a back propagation algorithm; comprising the following steps:
the Embedding sub-module is used for inputting the characteristic id index code into an Embedding layer Embedding of the depth migration network to obtain a corresponding characterization vector;
the subnet prediction submodule is used for inputting the corresponding characterization vector into a factorization FM network, and the factorization FM network carries out inner product on the corresponding characterization vector to obtain an FM predicted click rate p fm (x) Wherein the inner product formula is as follows:
Figure QLYQS_18
wherein x is the input feature id index code, v is the characterization vector representing the feature id index code, i, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, W fm Weight parameter for FM network linear regression term, b fm Bias parameter of FM network linear regression term, < v i ,v j Is > the characterization vector v i And v j Is calculated by the inner product of (2);
the corresponding characterization vector is input into a shared perceptron network to carry out nonlinear transformation, so as to obtain an abstract characterization vector: h is a s =sigmoid(W s v+b s ) Wherein v is a characterization vector input to the shared perceptron network, W s And weight parameters for the shared perceptron network, b s To share bias parameters of perceptron network, h s The abstract representation vector is output by the shared perceptron;
abstract representation vector h s Inputting into a deep perceptron network, and obtaining the predicted click rate p of the deep perceptron network through the following feedforward calculation formula deep (x):
Figure QLYQS_19
Figure QLYQS_20
Where the ReLU is the activation function,
Figure QLYQS_21
for the first layer weight parameter of the deep perceptron network, < ->
Figure QLYQS_22
For the first layer bias parameter of the deep perceptron network, < ->
Figure QLYQS_23
For the output vector of the first layer, h s The output vector of the shared perceptron network is needed to be set manually for the specific layer number;
abstract representation vector h s Inputting the light-weight perceptron network, and obtaining the predicted click rate p of the light-weight perceptron network through the following feedforward calculation formula light (x):
Figure QLYQS_24
Figure QLYQS_25
Where the ReLU is the activation function,
Figure QLYQS_26
layer one weight parameter for lightweight perceptron network, < ->
Figure QLYQS_27
Layer one bias for lightweight perceptron networkParameter setting up->
Figure QLYQS_28
For the output vector of the first layer, h s The number of layers takes an empirical value for sharing the output vector of the perceptron network, and the number of layers of the light-weight perceptron network is less than that of the deep perceptron network;
the integrated prediction sub-module is used for integrating the predicted click rate of the FM network, the light-weight perceptron network and the deep perceptron network to calculate the click rate loss L (x; W, b), and the calculation formula is as follows:
L(x;W,b)=H(y,p fm (x))+H(y,p light (x))+H(y,p deep (x))+λ||z light (x)-z deep (x)|| 2
wherein H (y, p) is a cross entropy loss function commonly used for classification tasks, x is an input characteristic id index code, y is a value of a classification label of training data, and p=p fm (x) When the click rate is predicted value of FM network, p=p light (x) Predicted click rate for lightweight perceptron network, p=p deep (x) For the click rate predicted value of the depth network, lambda is a weight value for determining the predicted errors of the light-weight perceptron network and the depth network, lambda takes a checked value, and z light (x) Model output value p before sigmoid conversion for light-weight perceptron network light (x)=sigmoid(z light (x)),z deep (x) Model output value p before sigmoid conversion for depth network deep (x)=sigmoid(z deep (x));
The parameter updating sub-module is used for updating all parameters of the deep migration network by adopting a back propagation algorithm according to the click rate loss L (x; W, b);
the test module is used for discretizing the test sample to obtain discrete characteristics of the test sample, and carrying out matching mapping on the discrete characteristics of the test sample based on the mapping dictionary M to obtain a characteristic id index code of the test sample;
and the prediction module is used for inputting the characteristic id index code of the test sample into the deep migration network for training to obtain click rate prediction of the test sample.
7. The click rate prediction apparatus based on a depth migration network of claim 6, wherein the discretization processing formula in the discretization module is specifically as follows:
Figure QLYQS_29
wherein V is the value of the continuous characteristic, D is the integer value after discretization, N is the constant, >
Figure QLYQS_30
To round down the symbol, N needs to be determined according to a specific characteristic value range.
8. The depth migration network-based click rate prediction apparatus of claim 6, wherein the feature characterization module comprises:
the feature co-occurrence sub-module is used for creating a feature co-occurrence frequency matrix C by taking the co-occurrence frequency of the training sample feature id index code in different samples as a matrix element n×n
A decomposition sub-module for co-occurrence frequency matrix C n×n Matrix decomposition is performed based on matrix multiplication recovery errors to obtain a matrix V n×k Parameter and gradient descent update matrix V n×k Parameters, matrix decomposition formulas such as;
Figure QLYQS_31
wherein->
Figure QLYQS_32
Is a matrix C n×n Element C of (2) i,j I, j e n, i and j are different feature id index code subscripts, n is the total number of feature id index codes, b i And b j For the bias term, the calculation is based on matrix element C, targeting error minimization i,j And its estimated value +.>
Figure QLYQS_33
Error J of (c):
Figure QLYQS_34
C i,j is a matrix C n×n V of elements (v) i And v j Is a matrix V n×k Row vector, b i And b j Updating V by gradient descent for bias term n×k
A feature characterization sub-module for characterizing the updated matrix V n×k As a token vector for the corresponding feature.
CN201910991888.2A 2019-10-17 2019-10-17 Click rate prediction method and device based on deep migration network Active CN110738314B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910991888.2A CN110738314B (en) 2019-10-17 2019-10-17 Click rate prediction method and device based on deep migration network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910991888.2A CN110738314B (en) 2019-10-17 2019-10-17 Click rate prediction method and device based on deep migration network

Publications (2)

Publication Number Publication Date
CN110738314A CN110738314A (en) 2020-01-31
CN110738314B true CN110738314B (en) 2023-05-02

Family

ID=69269257

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910991888.2A Active CN110738314B (en) 2019-10-17 2019-10-17 Click rate prediction method and device based on deep migration network

Country Status (1)

Country Link
CN (1) CN110738314B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529151A (en) * 2020-12-02 2021-03-19 华为技术有限公司 Data processing method and device
CN112632319B (en) * 2020-12-22 2023-04-11 天津大学 Method for improving overall classification accuracy of long-tail distributed speech based on transfer learning
CN112949752B (en) * 2021-03-25 2022-09-06 支付宝(杭州)信息技术有限公司 Training method and device of business prediction system
US20240046314A1 (en) * 2022-08-03 2024-02-08 Hong Kong Applied Science and Technology Research Institute Company Limited Systems and methods for multidimensional knowledge transfer for click through rate prediction

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103365924A (en) * 2012-04-09 2013-10-23 北京大学 Method, device and terminal for searching information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7933762B2 (en) * 2004-04-16 2011-04-26 Fortelligent, Inc. Predictive model generation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103365924A (en) * 2012-04-09 2013-10-23 北京大学 Method, device and terminal for searching information

Also Published As

Publication number Publication date
CN110738314A (en) 2020-01-31

Similar Documents

Publication Publication Date Title
CN110738314B (en) Click rate prediction method and device based on deep migration network
CN110851713B (en) Information processing method, recommending method and related equipment
CN110232480B (en) Project recommendation method realized by using variational regularized stream and model training method
CN111339415B (en) Click rate prediction method and device based on multi-interactive attention network
CN109508584B (en) Video classification method, information processing method and server
CN109062962B (en) Weather information fused gated cyclic neural network interest point recommendation method
CN112035743B (en) Data recommendation method and device, computer equipment and storage medium
CN109816101A (en) A kind of session sequence of recommendation method and system based on figure convolutional neural networks
CN107562787B (en) POI (point of interest) encoding method and device, POI recommendation method and electronic equipment
CN112288042B (en) Updating method and device of behavior prediction system, storage medium and computing equipment
CN115658864A (en) Conversation recommendation method based on graph neural network and interest attention network
CN116664719B (en) Image redrawing model training method, image redrawing method and device
CN111950593A (en) Method and device for recommending model training
CN113010656A (en) Visual question-answering method based on multi-mode fusion and structural control
CN109189922B (en) Comment evaluation model training method and device
CN113377914A (en) Recommended text generation method and device, electronic equipment and computer readable medium
CN111832637B (en) Distributed deep learning classification method based on alternating direction multiplier method ADMM
CN113420212A (en) Deep feature learning-based recommendation method, device, equipment and storage medium
CN112053188A (en) Internet advertisement recommendation method based on hybrid deep neural network model
CN113868451B (en) Cross-modal conversation method and device for social network based on up-down Wen Jilian perception
CN115631008B (en) Commodity recommendation method, device, equipment and medium
CN114528490A (en) Self-supervision sequence recommendation method based on long-term and short-term interests of user
CN116401372A (en) Knowledge graph representation learning method and device, electronic equipment and readable storage medium
CN114529007A (en) Resource operation data prediction method, prediction model training method and device
CN114238773A (en) Next interest point recommendation method and device based on comparative learning and bilateral collaboration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant