CN115271272B - Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation - Google Patents

Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation Download PDF

Info

Publication number
CN115271272B
CN115271272B CN202211200198.9A CN202211200198A CN115271272B CN 115271272 B CN115271272 B CN 115271272B CN 202211200198 A CN202211200198 A CN 202211200198A CN 115271272 B CN115271272 B CN 115271272B
Authority
CN
China
Prior art keywords
feature
model
network
order
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211200198.9A
Other languages
Chinese (zh)
Other versions
CN115271272A (en
Inventor
李广丽
许广鑫
吴光庭
李传秀
叶艺源
张红斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Jiaotong University
Original Assignee
East China Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Jiaotong University filed Critical East China Jiaotong University
Priority to CN202211200198.9A priority Critical patent/CN115271272B/en
Publication of CN115271272A publication Critical patent/CN115271272A/en
Application granted granted Critical
Publication of CN115271272B publication Critical patent/CN115271272B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Human Resources & Organizations (AREA)
  • Finance (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Operations Research (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Tourism & Hospitality (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a click rate prediction method and a click rate prediction system for multi-order feature optimization and mixed knowledge distillation.A user behavior data and advertisement data clicked by a user are analyzed to construct an embedded feature vector of the user behavior data and the advertisement data, and a SENET network, domain feature interaction, a CIN model and a DNN model are combined around the embedded feature vector to realize multi-order feature optimization and generate features capable of accurately describing user interest; and then designing a mixed knowledge distillation framework, and outputting a lightweight click rate prediction model with stronger real-time reasoning capability and excellent recommendation precision based on the mixed knowledge distillation framework so as to realize efficient and high-quality advertisement click prediction, improve user recommendation experience and create good economic and social benefits for Internet companies.

Description

Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation
Technical Field
The invention relates to the technical field of advertisement recommendation, in particular to a click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation.
Background
The problem of information overload is more and more serious due to the overlarge information amount of the internet. The recommendation system can effectively relieve the information overload problem, analyzes the characteristics of user habits, interests, preference and the like according to the interactive historical data between the users and the items, analyzes the characteristics of the items according to the characteristics of the items, finally establishes an important relation between the users and the items to be recommended, and recommends the items which may be interested in the users to the users.
The click-through rate is usually used for predicting the click-through probability of the user on the internet advertisement or the online commodity, and the click-through rate prediction is an important component of a recommendation system and plays a very important role in an internet commercial platform. As is well known, the Internet advertisement has huge economic benefits, and the advertisement clicking means potential purchase, so that the click rate prediction plays a vital role in promoting the development of society and economy. Therefore, accurate recommendation of the advertisement can improve the user experience and bring abundant economic benefits to the Internet company.
However, the existing standard prediction technology for the click rate of the advertisement has the following problems: (1) Firstly, the feature representation is single, only explicit features or implicit features are used, and the complementarity between the two is not synthesized; (2) And secondly, the feature optimization method is simple and does not consider multi-order feature optimization. Based on the two points, the final characteristics are not strong in discriminability, and the click rate prediction precision is seriously restricted; meanwhile, the existing click rate prediction technology mostly adopts very complicated and huge prediction models, such as DIFM, autoInt and the like, so that the real-time reasoning efficiency is low, the recommendation experience of a user is seriously influenced, and the landing application of the models is also restricted.
Disclosure of Invention
In view of the above situation, the main objective of the present invention is to provide a click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation, so as to solve the problems of simple feature optimization method, low click rate prediction accuracy and low real-time inference efficiency in the prior art.
The invention provides a click rate prediction method for multi-order feature optimization and mixed knowledge distillation, which comprises the following steps:
step one, data preprocessing:
performing feature extraction on the obtained original user behavior data and the clicked advertisement data, and performing one-hot coding conversion to obtain a user behavior feature embedded vector and an advertisement feature embedded vector respectively;
step two, model training:
inputting the user behavior feature embedding vector and the advertisement feature embedding vector into a SENET network, and then performing feature optimization based on channel attention to generate first-order features;
constructing a domain feature interactive network, and executing feature optimization based on domain symmetric matrix embedding on the acquired first-order features to generate second-order features;
inputting the first-order features into a compression interactive network to output to obtain explicit high-order features, inputting the second-order features into a deep neural network to output to obtain implicit high-order features, performing weighted splicing on the explicit high-order features and the implicit high-order features to generate third-order features in a fusion mode, and generating a click rate prediction model based on the third-order features;
step three, predicting the click rate;
pre-training a click rate prediction model, an AutoInt model and a DIFM model, and then respectively carrying out self-distillation and then combining to construct a teacher network;
pre-training a DNN model and an FM model, mutually distilling, and combining to construct a student network;
designing a gate control network, calculating teacher model knowledge weight in a teacher network through the gate control network, and performing click rate prediction guidance on each student model in the student network by the teacher network based on the teacher model knowledge weight to realize mixed knowledge distillation; wherein the teacher model knowledge weight represents a knowledge weight of each student model in a teacher model teaching student network;
step four, recommending advertisements;
and (3) carrying out online deployment on the student network with mixed knowledge distillation output to obtain a plurality of predicted values, carrying out descending order arrangement, selecting a preset number of advertisements with the highest predicted values, and recommending the advertisements to users to finish click rate prediction.
The invention also provides a click rate prediction system of multi-order feature optimization and mixed knowledge distillation, wherein the system comprises:
a data pre-processing module to:
performing feature extraction on the obtained original user behavior data and the clicked advertisement data, and performing one-hot coding conversion to obtain a user behavior feature embedded vector and an advertisement feature embedded vector respectively;
a model training module to:
inputting the user behavior feature embedding vector and the advertisement feature embedding vector into a SENET, and then performing feature optimization based on channel attention to generate first-order features;
constructing a domain feature interactive network, and executing feature optimization based on domain symmetric matrix embedding on the acquired first-order features to generate second-order features;
inputting the first-order features into a compression interactive network to output to obtain explicit high-order features, inputting the second-order features into a deep neural network to output to obtain implicit high-order features, performing weighted splicing on the explicit high-order features and the implicit high-order features to generate third-order features in a fusion mode, and generating a click rate prediction model based on the third-order features;
a click rate prediction module for;
pre-training a click rate prediction model, an AutoInt model and a DIFM model, and then respectively carrying out self-distillation and then combining to construct a teacher network;
pre-training a DNN model and an FM model, mutually distilling, and combining to construct a student network;
designing a gate control network, calculating teacher model knowledge weight in a teacher network through the gate control network, and performing click rate prediction guidance on each student model in the student network by the teacher network based on the teacher model knowledge weight to realize mixed knowledge distillation; wherein the teacher model knowledge weight represents a knowledge weight of each student model in a teacher model teaching student network;
an advertisement recommendation module for;
and (3) carrying out online deployment on the student network with mixed knowledge distillation output to obtain a plurality of predicted values, carrying out descending order arrangement, selecting a preset number of advertisements with the highest predicted values, and recommending the advertisements to users to finish click rate prediction.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a click rate prediction method of multi-order feature optimization and mixed knowledge distillation, which comprises the steps of on one hand, analyzing user behavior data and advertisement data clicked by a user, constructing embedded feature vectors of the user behavior data and the advertisement data, surrounding the embedded feature vectors, combining a SENET network, domain feature interaction, a CIN model and a DNN model, realizing multi-order feature optimization, and generating features capable of accurately describing user interest;
on the other hand, a hybrid knowledge distillation framework is designed, and a lightweight click rate prediction model with stronger real-time reasoning capability and excellent recommendation precision is output based on the hybrid knowledge distillation framework, so that efficient and high-quality advertisement click prediction is realized, the user recommendation experience is improved, and good economic and social benefits are created for internet companies.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
FIG. 1 is a flow chart of a click rate prediction method for multi-order feature optimization and mixed knowledge distillation according to the present invention;
FIG. 2 is a flow chart of a click through rate prediction model (Se-xDeepEFEFM) in the present invention;
FIG. 3 is a flow diagram of a hybrid knowledge distillation framework of the present invention;
fig. 4 is a structural diagram of a click rate prediction system with multi-order feature optimization and mixed knowledge distillation according to the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
These and other aspects of embodiments of the invention will be apparent with reference to the following description and attached drawings. In the description and drawings, particular embodiments of the invention have been disclosed in detail as being indicative of some of the ways in which the principles of the embodiments of the invention may be practiced, but it is understood that the scope of the embodiments of the invention is not limited correspondingly. On the contrary, the embodiments of the invention include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.
Referring to fig. 1 to 3, the present invention provides a click rate prediction method for multi-order feature optimization and mixed knowledge distillation, wherein the method comprises the following steps:
s101, data preprocessing:
and performing feature extraction on the obtained original user behavior data and the clicked advertisement data, and performing one-hot coding conversion to obtain a user behavior feature embedded vector and an advertisement feature embedded vector respectively.
In step S101, the method for extracting features of the acquired original user behavior data and the clicked advertisement data, and performing unique hot code conversion to obtain a user behavior feature embedded vector and an advertisement feature embedded vector respectively includes the following steps:
s1011, preprocessing the user behavior data and the clicked advertisement data, wherein the preprocessing comprises:
extracting corresponding discrete features from relevant fields of age, gender and user type, and processing the discrete features by an embedding method to enable semantically similar features to be gathered to close positions in a feature space;
extracting corresponding continuous features from relevant fields of price and time, normalizing the continuous features, and compressing the feature value to [0,1].
And S1012, generating a user behavior feature embedded vector according to the preprocessed user behavior data, and generating an advertisement feature embedded vector according to the preprocessed clicked advertisement data.
Wherein the user behavior feature embedded vector and the advertisement feature embedded vector are marked as feature embedded vectors
Figure 298494DEST_PATH_IMAGE001
S102, model training:
s1021, inputting the user behavior feature embedded vector and the advertisement feature embedded vector into a SENET network, and then performing feature optimization based on channel attention to generate first-order features;
s1022, a domain feature interaction network is constructed, and feature optimization based on domain symmetric matrix embedding is performed on the acquired first-order features to generate second-order features;
s1023, inputting the first-order features into a Compression Interactive Network (CIN) to output explicit high-order features, inputting the second-order features into a deep neural network to output implicit high-order features, performing weighted splicing on the explicit high-order features and the implicit high-order features to generate third-order features in a fusion mode, and generating a click rate prediction model based on the third-order features.
Specifically, in step S102, the method of inputting the user behavior feature embedding vector and the advertisement feature embedding vector into a SENET network, and then performing feature optimization based on channel attention to generate the first-order features includes the following steps:
s1021a, embedding the vector into the feature through an average pooling operation by utilizing a SENET network
Figure 17051DEST_PATH_IMAGE001
Compressing to calculate a statistical vector;
s1021b, designing two full-connection layers based on the statistical vector to obtain attention weight through calculation;
s1021c, embedding the feature into the vector according to the attention weight
Figure 471166DEST_PATH_IMAGE001
Weighting to generate the first order features.
The first order features are expressed as:
Figure 249766DEST_PATH_IMAGE002
Figure 523753DEST_PATH_IMAGE003
Figure 547466DEST_PATH_IMAGE004
wherein the content of the first and second substances,
Figure 856088DEST_PATH_IMAGE005
the first-order features are represented as,
Figure 602327DEST_PATH_IMAGE006
representing the embedding of vectors to the features
Figure 363609DEST_PATH_IMAGE001
The attention-weighting is carried out such that,
Figure 689548DEST_PATH_IMAGE007
the weight of attention is represented as a weight of attention,
Figure 852677DEST_PATH_IMAGE008
a feature-embedded vector is represented that is,
Figure 707500DEST_PATH_IMAGE009
to represent
Figure 221658DEST_PATH_IMAGE001
To middle
Figure 85709DEST_PATH_IMAGE010
The features are embedded into a vector of the image,
Figure 601879DEST_PATH_IMAGE011
to represent
Figure 955500DEST_PATH_IMAGE001
To middle
Figure 691374DEST_PATH_IMAGE012
The features are embedded into a vector of the feature,
Figure 359116DEST_PATH_IMAGE013
to represent
Figure 231257DEST_PATH_IMAGE009
The attention weight of (a) is given,
Figure 693462DEST_PATH_IMAGE014
to represent
Figure 916633DEST_PATH_IMAGE011
The attention weight of (a) is given,
Figure 122487DEST_PATH_IMAGE015
first order features
Figure 911451DEST_PATH_IMAGE012
The value of the characteristic is used as the characteristic value,
Figure 810137DEST_PATH_IMAGE016
first order features
Figure 756490DEST_PATH_IMAGE017
The value of the characteristic is used as the characteristic value,
Figure 766034DEST_PATH_IMAGE018
a function representing the calculation of the attention weight,
Figure 612767DEST_PATH_IMAGE019
a first activation function representing a fully connected layer,
Figure 416775DEST_PATH_IMAGE020
a second activation function representing a fully connected layer,
Figure 411276DEST_PATH_IMAGE021
a first parameter representing a fully connected layer,
Figure 958932DEST_PATH_IMAGE022
a second parameter representing a fully connected layer,
Figure 925751DEST_PATH_IMAGE023
the statistical vector is represented by a vector of numbers,
Figure 900660DEST_PATH_IMAGE024
Figure 585720DEST_PATH_IMAGE025
represents the calculated second
Figure 170022DEST_PATH_IMAGE012
The features are embedded into the corresponding statistical information values of the vector,
Figure 991348DEST_PATH_IMAGE026
a function representing the value of the calculated statistical information,
Figure 402738DEST_PATH_IMAGE027
representing feature embedding vectors
Figure 309514DEST_PATH_IMAGE001
The dimension (c) of (a) is,
Figure 995710DEST_PATH_IMAGE028
representation is calculated from dimension 1 to
Figure 937121DEST_PATH_IMAGE027
Additionally, since the first-order features are subjected to attention weighting, important features are highlighted, and secondary features are suppressed, a solid foundation is laid for the extraction and click rate prediction of the subsequent second-order and third-order features (see fig. 2 for the principle).
Further, a domain feature interaction network is constructed, and feature optimization based on domain symmetric matrix embedding is performed on the acquired first-order features, corresponding to the following formula:
Figure 253833DEST_PATH_IMAGE029
wherein the content of the first and second substances,
Figure 382326DEST_PATH_IMAGE030
represents the output of the domain feature interaction network,
Figure 75476DEST_PATH_IMAGE031
represents one
Figure 372858DEST_PATH_IMAGE032
The symmetric matrix is a matrix of a plurality of,
Figure 126051DEST_PATH_IMAGE033
representing the basis weighting parameters learnable by the domain feature interaction network,
Figure 804157DEST_PATH_IMAGE034
representation domain feature interactive network learnable
Figure 35418DEST_PATH_IMAGE012
The weighting parameters of the individual feature embedding vectors,
Figure 685842DEST_PATH_IMAGE035
the number of features is represented by a number of features,
Figure 344356DEST_PATH_IMAGE036
is shown as
Figure 713021DEST_PATH_IMAGE012
The value of the individual feature embedding vector is,
Figure 747973DEST_PATH_IMAGE037
denotes the first
Figure 252904DEST_PATH_IMAGE038
The value of the individual feature embedding vector is,
Figure 144636DEST_PATH_IMAGE039
denotes the first
Figure 499132DEST_PATH_IMAGE040
The domain characteristics of the individual fields are,
Figure 72196DEST_PATH_IMAGE041
is shown as
Figure 431633DEST_PATH_IMAGE042
The domain characteristics of the individual fields are,
Figure 697529DEST_PATH_IMAGE043
first order features
Figure 775207DEST_PATH_IMAGE038
And (4) the characteristic value.
Further, the formula of the second-order feature is expressed as:
Figure 417541DEST_PATH_IMAGE044
wherein, the first and the second end of the pipe are connected with each other,
Figure 897063DEST_PATH_IMAGE045
which represents the second-order characteristics of the image,
Figure 130599DEST_PATH_IMAGE046
it is shown that the splicing operation is performed,
Figure 695572DEST_PATH_IMAGE047
representing the output result obtained by inputting the initial feature embedding vector into the domain feature interaction network,
Figure 123622DEST_PATH_IMAGE048
represents the output result of the first-order features input into the domain feature interaction network,
Figure 457652DEST_PATH_IMAGE049
to show the spliced second
Figure 65350DEST_PATH_IMAGE050
The number of feature vectors of each interaction is,
Figure 117620DEST_PATH_IMAGE051
and representing the number of the interactive feature vectors generated by the domain feature interactive network.
It should be noted here that, because the feature embedding vector and the high-order representation of the first-order feature are fused, the second-order feature contains richer semantic information, which is helpful to improve the click prediction accuracy.
Further, the output first-order features are input into a Compressed Interactive Network (CIN) to output explicit high-order features. The generation formula of the explicit high-order feature is as follows:
Figure 836177DEST_PATH_IMAGE052
Figure 87030DEST_PATH_IMAGE053
Figure 865630DEST_PATH_IMAGE054
wherein the content of the first and second substances,
Figure 405196DEST_PATH_IMAGE055
is shown as
Figure 927444DEST_PATH_IMAGE056
First in a layer high order matrix
Figure 469022DEST_PATH_IMAGE057
A high-order feature vector of the plurality of high-order feature vectors,
Figure 418523DEST_PATH_IMAGE058
is shown as
Figure 179806DEST_PATH_IMAGE059
First in a layer high order matrix
Figure 302483DEST_PATH_IMAGE060
A plurality of high-order feature vectors,
Figure 465611DEST_PATH_IMAGE061
representing the second in the first-order features
Figure 586013DEST_PATH_IMAGE062
The value of the characteristic is used as the characteristic value,
Figure 100171DEST_PATH_IMAGE063
Figure 698643DEST_PATH_IMAGE064
representing first-order features generating
Figure 716278DEST_PATH_IMAGE065
First of layer high order eigenvectors
Figure 774626DEST_PATH_IMAGE066
A parameter matrix of a high-order feature,
Figure 510500DEST_PATH_IMAGE067
represents the number of layer 0 feature embedding vectors,
Figure 709401DEST_PATH_IMAGE068
is shown as
Figure 847121DEST_PATH_IMAGE069
The number of layer feature embedding vectors,
Figure 574906DEST_PATH_IMAGE070
is shown as
Figure 266918DEST_PATH_IMAGE071
The first in the layer high order feature vector
Figure 738351DEST_PATH_IMAGE072
The characteristics of the device are as follows,
Figure 730577DEST_PATH_IMAGE073
is shown as
Figure 363684DEST_PATH_IMAGE071
The first in the layer high order feature vector
Figure 870889DEST_PATH_IMAGE072
A first feature of
Figure 378968DEST_PATH_IMAGE074
The feature vector of a dimension is then calculated,
Figure 225702DEST_PATH_IMAGE075
representing the explicit high-order features that are ultimately generated,
Figure 29709DEST_PATH_IMAGE076
the total number of layers representing the explicit high-order features,
Figure 961893DEST_PATH_IMAGE077
representing the hadamard product.
Further, the second-order features are input into a Deep Neural Network (DNN) to output implicit high-order features. The generation formula of the implicit high-order characteristic is as follows:
Figure 775129DEST_PATH_IMAGE078
wherein the content of the first and second substances,
Figure 741948DEST_PATH_IMAGE079
representing the second in a deep neural network
Figure 513595DEST_PATH_IMAGE080
The neural network output of the layer(s),
Figure 198654DEST_PATH_IMAGE081
it is shown that the activation function is,
Figure 550001DEST_PATH_IMAGE082
representing the second in a deep neural network
Figure 872791DEST_PATH_IMAGE080
The weight of a layer is determined by the weight of the layer,
Figure 753022DEST_PATH_IMAGE083
representing the second in a deep neural network
Figure 659798DEST_PATH_IMAGE080
The amount of offset of the layers is such that,
Figure 611574DEST_PATH_IMAGE080
the number of layers of the deep neural network is represented.
And combining the explicit high-order features output by the CIN and the implicit high-order features output by the DNN to complete feature fusion and generate third-order features, wherein the third-order features fully utilize the complementarity between the implicit high-order features and the explicit high-order features, and are beneficial to improving feature discriminability and final click prediction precision.
The formula for generating the click rate prediction model based on the third-order features is expressed as:
Figure 287406DEST_PATH_IMAGE084
wherein the content of the first and second substances,
Figure 869697DEST_PATH_IMAGE085
the predicted value of the click rate is shown,
Figure 998190DEST_PATH_IMAGE086
representsigmoidThe function operates on the basis of the function,
Figure 956919DEST_PATH_IMAGE087
all represent the parameters of the click-through rate prediction model,
Figure 487257DEST_PATH_IMAGE088
s103, click rate prediction:
and S1031, pre-training the click rate prediction model, the AutoInt model and the DIFM model, and then respectively performing self-distillation and then combining to construct a teacher network.
S1032, pre-training the DNN model and the FM model, and then mutually distilling and combining to construct the student network.
The lightweight DNN model (equivalent to the student model 1 in fig. 3) and the FM model (equivalent to the student model 2 in fig. 3) are pre-trained and used as student models to construct a student network. Mutual distillation is carried out between the DNN model and the FM model, so that the integration of diversity information in each student model is facilitated, and the click prediction precision of each student model is improved through mutual distillation.
S1033, designing a gate control network, calculating teacher model knowledge weight in a teacher network through the gate control network, and carrying out click rate prediction guidance on each student model in the student network by the teacher network based on the teacher model knowledge weight so as to realize mixed knowledge distillation; and the teacher model knowledge weight represents the knowledge weight of each student model in the teacher model for guiding the student network.
The specific process of mutual distillation between the DNN model and the FM model is as follows:
Figure 738985DEST_PATH_IMAGE089
Figure 354774DEST_PATH_IMAGE090
wherein the content of the first and second substances,
Figure 648352DEST_PATH_IMAGE091
representing the loss function of the FM model in the student network,
Figure 298776DEST_PATH_IMAGE092
the presence of a real label is indicated,
Figure 222870DEST_PATH_IMAGE093
represents the output of the FM model in the student network,
Figure 591534DEST_PATH_IMAGE094
indicating that the FM model in the student network fits the real tags,
Figure 360907DEST_PATH_IMAGE095
representing KL loss of the FM model versus the DNN model in the student network,
Figure 865838DEST_PATH_IMAGE096
to represent
Figure 757570DEST_PATH_IMAGE095
The weight of (c);
Figure 347952DEST_PATH_IMAGE097
representing the loss function of the DNN model in the student network,
Figure 688060DEST_PATH_IMAGE098
represents the output of the DNN model in the student network,
Figure 313076DEST_PATH_IMAGE099
representing that the DNN model in the student network fits the real tags,
Figure 313393DEST_PATH_IMAGE100
representing the KL loss of the DNN model relative to the FM model in the student network,
Figure 391070DEST_PATH_IMAGE101
represent
Figure 767825DEST_PATH_IMAGE100
The weight of (c).
Further, a DIFM model (equivalent to teacher model 1 in FIG. 3), an Autoint model (equivalent to teacher model 2 in FIG. 3) and a Se-xDeeepEFEM model (equivalent to teacher model 3 in FIG. 3) are pre-trained, and the pre-trained three models are self-distilled and then combined into a teacher network. Because the teacher models in the teacher network are mutually heterogeneous, more diverse knowledge can be provided for the student models so as to promote the improvement of the click prediction precision of the student models; and designing a GATE mechanism, adaptively adjusting the knowledge weight of each teacher model in the teacher network to each student model in the student network, wherein the larger the knowledge weight is, the more valuable knowledge is provided for the student models in knowledge distillation by the corresponding teacher model, so that the click rate prediction accuracy of the student models is promoted.
Specifically, the formula for self-distillation of the click rate prediction model (Se-xDeepFEFM model), the AutoInt model and the diff model is as follows:
Figure 44086DEST_PATH_IMAGE102
Figure 480883DEST_PATH_IMAGE103
Figure 45857DEST_PATH_IMAGE104
wherein the content of the first and second substances,
Figure 226302DEST_PATH_IMAGE105
a loss function representing the diff model,
Figure 58867DEST_PATH_IMAGE106
represents the output of the diff model in the teacher network for the unenhanced sample,
Figure 666566DEST_PATH_IMAGE107
represents the output of the diff model in the teacher network for the enhanced sample,
Figure 453256DEST_PATH_IMAGE108
to represent
Figure 437393DEST_PATH_IMAGE109
The weight of (a) is determined,
Figure 688245DEST_PATH_IMAGE110
diff models in the teacher network representing the unenhanced samples fit the authentic labels,
Figure 466846DEST_PATH_IMAGE109
fitting the real label by using a DIFM model in the teacher network representing the enhanced sample;
Figure 740832DEST_PATH_IMAGE111
represents the penalty function of the AutoInt model,
Figure 263080DEST_PATH_IMAGE112
represents the output of the AutoInt model in the teacher network for the unenhanced sample,
Figure 306123DEST_PATH_IMAGE113
represents the output of the AutoInt model in the teacher network for the enhanced sample,
Figure 745370DEST_PATH_IMAGE114
represent
Figure 772232DEST_PATH_IMAGE115
The weight of (a) is determined,
Figure 629330DEST_PATH_IMAGE116
the AutoInt model in the teacher network representing the unenhanced sample fits the real tags,
Figure 58037DEST_PATH_IMAGE115
the AutoInt model in the teacher network representing the enhanced sample is used for fitting the real label;
Figure 178440DEST_PATH_IMAGE117
a loss function representing a click-through rate prediction model,
Figure 161439DEST_PATH_IMAGE118
representing the output of the click rate prediction model in the teacher's network for the unenhanced sample,
Figure 25490DEST_PATH_IMAGE119
representing the output of the click-through rate prediction model in the teacher's network for the enhanced sample,
Figure 308704DEST_PATH_IMAGE120
to represent
Figure 396746DEST_PATH_IMAGE121
The weight of (a) is calculated,
Figure 132620DEST_PATH_IMAGE122
the click-through rate prediction model in the teacher's network representing the unenhanced sample fits the true label,
Figure 298897DEST_PATH_IMAGE123
and fitting the real label by a click rate prediction model in the teacher network representing the enhanced sample.
In the invention, the Se-xDeeepEFM model completes self-distillation through sample diversity, and the self-distillation can compress the scale of the teacher model, thereby being beneficial to reducing the 'generation ditch' between the teacher model and the student model so as to better train a mixed knowledge distillation frame.
The total loss function for the mixed knowledge distillation is expressed as:
Figure 171038DEST_PATH_IMAGE124
wherein the content of the first and second substances,
Figure 633244DEST_PATH_IMAGE125
represents the total loss function corresponding to the mixed knowledge distillation,
Figure 856415DEST_PATH_IMAGE126
representing teacher in network
Figure 327847DEST_PATH_IMAGE127
The number of the teacher models is set according to the teacher model,
Figure 116812DEST_PATH_IMAGE128
a network of teachers is represented and,
Figure 749918DEST_PATH_IMAGE129
a network of students is represented and,
Figure 460385DEST_PATH_IMAGE130
representing the number of teacher models in the teacher network,
Figure 469930DEST_PATH_IMAGE131
representing teacher in network
Figure 818128DEST_PATH_IMAGE127
Knowledge weights for individual teacher models.
S104, recommending advertisements;
and (3) carrying out online deployment on the student network with mixed knowledge distillation output to obtain a plurality of predicted values, carrying out descending order arrangement, selecting a preset number of advertisements with the highest predicted values, and recommending the advertisements to users to finish click rate prediction.
And performing joint training on the teacher network and the student network, namely, transmitting knowledge from the teacher model in the teacher network to the student models in the student network through the GATE to realize mixed knowledge distillation. The mixed knowledge distillation framework outputs a lightweight student model, and the lightweight student model is used for calculating the click prediction value, so that the prediction precision is ensured, the real-time prediction efficiency is improved, and the real-time reasoning capability of the click rate prediction model is enhanced.
Referring to fig. 4, the present invention provides a click rate prediction system for multi-level feature optimization and mixed knowledge distillation, wherein the system comprises:
a data pre-processing module to:
performing feature extraction on the obtained original user behavior data and the clicked advertisement data, and performing one-hot coding conversion to obtain a user behavior feature embedded vector and an advertisement feature embedded vector respectively;
a model training module to:
inputting the user behavior feature embedding vector and the advertisement feature embedding vector into a SENET network, and then performing feature optimization based on channel attention to generate first-order features;
constructing a domain feature interactive network, and executing feature optimization based on domain symmetric matrix embedding on the acquired first-order features to generate second-order features;
inputting the first-order features into a compression interactive network to output to obtain explicit high-order features, inputting the second-order features into a deep neural network to output to obtain implicit high-order features, performing weighted splicing on the explicit high-order features and the implicit high-order features to generate third-order features in a fusion mode, and generating a click rate prediction model based on the third-order features;
the click rate prediction module is used for predicting click rate;
pre-training a click rate prediction model, an AutoInt model and a DIFM model, and then respectively carrying out self-distillation and then combining to construct a teacher network;
pre-training a DNN model and an FM model, mutually distilling, and combining to construct a student network;
designing a gate control network, calculating the knowledge weight of a teacher model in the teacher network through the gate control network, and performing click rate prediction guidance on each student model in the student network by the teacher network based on the knowledge weight of the teacher model so as to realize mixed knowledge distillation; wherein the teacher model knowledge weight represents a knowledge weight of each student model in a teacher model guidance student network;
an advertisement recommendation module for;
and (3) carrying out online deployment on the student network with mixed knowledge distillation output to obtain a plurality of predicted values, carrying out descending order arrangement, selecting a preset number of advertisements with the highest predicted values, and recommending the advertisements to users to finish click rate prediction.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (6)

1. A click rate prediction method for multi-order feature optimization and mixed knowledge distillation is characterized by comprising the following steps:
step one, data preprocessing:
performing feature extraction on the obtained original user behavior data and the clicked advertisement data, and performing one-hot coding conversion to obtain a user behavior feature embedded vector and an advertisement feature embedded vector respectively;
step two, model training:
inputting the user behavior feature embedding vector and the advertisement feature embedding vector into a SENET, and then performing feature optimization based on channel attention to generate first-order features;
constructing a domain feature interactive network, and executing feature optimization based on domain symmetric matrix embedding on the acquired first-order features to generate second-order features;
inputting the first-order features into a compression interactive network to output to obtain explicit high-order features, inputting the second-order features into a deep neural network to output to obtain implicit high-order features, performing weighted splicing on the explicit high-order features and the implicit high-order features to generate third-order features in a fusion mode, and generating a click rate prediction model based on the third-order features;
step three, predicting the click rate;
pre-training a click rate prediction model, an AutoInt model and a DIFM model, and then respectively carrying out self-distillation and then combining to construct a teacher network;
pre-training a DNN model and an FM model, mutually distilling, and combining to construct a student network;
designing a gate control network, calculating the knowledge weight of a teacher model in the teacher network through the gate control network, and performing click rate prediction guidance on each student model in the student network by the teacher network based on the knowledge weight of the teacher model so as to realize mixed knowledge distillation; wherein the teacher model knowledge weight represents a knowledge weight of each student model in a teacher model teaching student network;
step four, recommending advertisements;
deploying a student network which is output by mixed knowledge distillation on line to obtain a plurality of predicted values, arranging the predicted values in a descending order, and selecting a preset number of advertisements with the highest predicted values to recommend the advertisements to a user to complete click rate prediction;
in the second step, the first-order feature is input into a compressed interactive network to output a method for obtaining an explicit high-order feature, and a generation formula of the explicit high-order feature is as follows:
Figure 972369DEST_PATH_IMAGE001
Figure 479573DEST_PATH_IMAGE002
Figure 678998DEST_PATH_IMAGE003
wherein the content of the first and second substances,
Figure 853627DEST_PATH_IMAGE004
denotes the first
Figure 595318DEST_PATH_IMAGE005
First in a layer high order matrix
Figure 855398DEST_PATH_IMAGE006
A high-order feature vector of the plurality of high-order feature vectors,
Figure 590005DEST_PATH_IMAGE007
denotes the first
Figure 619141DEST_PATH_IMAGE008
First in a layer high order matrix
Figure 531733DEST_PATH_IMAGE009
A high-order feature vector of the plurality of high-order feature vectors,
Figure 279110DEST_PATH_IMAGE010
representing the first in the first order features
Figure 817407DEST_PATH_IMAGE011
The value of the characteristic is used as the characteristic value,
Figure 701050DEST_PATH_IMAGE012
Figure 50122DEST_PATH_IMAGE013
representing first-order features generating
Figure 753636DEST_PATH_IMAGE014
First of layer high order eigenvectors
Figure 830045DEST_PATH_IMAGE015
A parameter matrix of a high-order feature,
Figure 833774DEST_PATH_IMAGE016
represents the number of layer 0 feature embedding vectors,
Figure 88169DEST_PATH_IMAGE017
is shown as
Figure 544558DEST_PATH_IMAGE018
The number of layer feature embedding vectors,
Figure 427588DEST_PATH_IMAGE019
denotes the first
Figure 285822DEST_PATH_IMAGE020
The first in the layer high order feature vector
Figure 976698DEST_PATH_IMAGE021
The characteristics of the device are as follows,
Figure 920383DEST_PATH_IMAGE022
is shown as
Figure 338595DEST_PATH_IMAGE020
The first in the layer high order feature vector
Figure 316915DEST_PATH_IMAGE009
A first feature of
Figure 913113DEST_PATH_IMAGE023
The feature vector of the dimension(s),
Figure 344094DEST_PATH_IMAGE024
representing the explicit high-order features that are ultimately generated,
Figure 565997DEST_PATH_IMAGE025
the total number of layers representing the explicit high-order features,
Figure 398823DEST_PATH_IMAGE026
representing a Hadamard product;
in the method for inputting the second-order feature into the deep neural network to output the implicit high-order feature, a generation formula of the implicit high-order feature is as follows:
Figure 431502DEST_PATH_IMAGE027
wherein the content of the first and second substances,
Figure 349779DEST_PATH_IMAGE028
representing the second in a deep neural network
Figure 109793DEST_PATH_IMAGE029
The neural network output of the layer(s),
Figure 797127DEST_PATH_IMAGE030
it is shown that the activation function is,
Figure 706DEST_PATH_IMAGE031
representing the first in a deep neural network
Figure 140700DEST_PATH_IMAGE029
The weight of a layer is determined by the weight of the layer,
Figure 972915DEST_PATH_IMAGE032
representing the first in a deep neural network
Figure 514754DEST_PATH_IMAGE029
The amount of offset of the layer(s),
Figure 889235DEST_PATH_IMAGE029
representing the number of layers of the deep neural network;
in the second step, the formula for generating the click rate prediction model based on the third-order feature is represented as:
Figure 516526DEST_PATH_IMAGE033
wherein the content of the first and second substances,
Figure 618343DEST_PATH_IMAGE034
the predicted value of the click-through rate is shown,
Figure 280268DEST_PATH_IMAGE035
to representsigmoidThe function operates on the basis of the function,
Figure 825650DEST_PATH_IMAGE036
all represent the parameters of the click-through rate prediction model,
Figure 940237DEST_PATH_IMAGE037
in the third step, the formula corresponding to the mixed knowledge distillation is expressed as follows:
Figure 845745DEST_PATH_IMAGE038
Figure 362177DEST_PATH_IMAGE039
wherein the content of the first and second substances,
Figure 78460DEST_PATH_IMAGE040
representing the loss function of the FM model in the student network,
Figure 680342DEST_PATH_IMAGE041
the presence of a real label is indicated,
Figure 468170DEST_PATH_IMAGE042
representing the output of the FM model in the student network,
Figure 963742DEST_PATH_IMAGE043
indicating that the FM model in the student network fits the real tags,
Figure 975561DEST_PATH_IMAGE044
representing KL loss of the FM model versus the DNN model in the student network,
Figure 674526DEST_PATH_IMAGE045
to represent
Figure 62782DEST_PATH_IMAGE044
The weight of (c);
Figure 427509DEST_PATH_IMAGE046
representing the loss function of the DNN model in the student network,
Figure 610229DEST_PATH_IMAGE047
represents the output of the DNN model in the student network,
Figure 62070DEST_PATH_IMAGE048
representing that the DNN model in the student network fits the real tags,
Figure 988438DEST_PATH_IMAGE049
representing the KL loss of the DNN model relative to the FM model in the student network,
Figure 193023DEST_PATH_IMAGE050
to represent
Figure 484327DEST_PATH_IMAGE049
The weight of (c);
the formula for self-distillation of the click rate prediction model, the AutoInt model and the DIFM model is as follows:
Figure 485781DEST_PATH_IMAGE051
Figure 871632DEST_PATH_IMAGE052
Figure 274932DEST_PATH_IMAGE053
wherein the content of the first and second substances,
Figure 65033DEST_PATH_IMAGE054
a loss function representing the diff model,
Figure 960308DEST_PATH_IMAGE055
represents the output of the diff model in the teacher network for the unenhanced sample,
Figure 494057DEST_PATH_IMAGE056
represents the output of the diff model in the teacher network for the enhanced sample,
Figure 673235DEST_PATH_IMAGE057
to represent
Figure 368658DEST_PATH_IMAGE058
The weight of (a) is determined,
Figure 16809DEST_PATH_IMAGE059
diff models in the teacher network representing the unenhanced samples fit the authentic labels,
Figure 88670DEST_PATH_IMAGE058
fitting the real label by using a DIFM model in the teacher network representing the enhanced sample;
Figure 125283DEST_PATH_IMAGE060
denotes Authe loss function of the topin model,
Figure 257187DEST_PATH_IMAGE061
represents the output of the AutoInt model in the teacher network for the unenhanced sample,
Figure 127055DEST_PATH_IMAGE062
represents the output of the AutoInt model in the teacher network for the enhanced sample,
Figure 2607DEST_PATH_IMAGE063
to represent
Figure 421956DEST_PATH_IMAGE064
The weight of (a) is determined,
Figure 459182DEST_PATH_IMAGE065
the AutoInt model in the teacher network representing the unenhanced sample fits the real tags,
Figure 81924DEST_PATH_IMAGE064
the AutoInt model in the teacher network representing the enhanced sample is used for fitting the real label;
Figure 495588DEST_PATH_IMAGE066
a loss function representing a click-through rate prediction model,
Figure 503864DEST_PATH_IMAGE067
representing the output of the click rate prediction model in the teacher's network for the unenhanced sample,
Figure 711991DEST_PATH_IMAGE068
representing the output of the click-through rate prediction model in the teacher's network for the enhanced sample,
Figure 556451DEST_PATH_IMAGE069
to represent
Figure 773805DEST_PATH_IMAGE070
The weight of (a) is determined,
Figure 636588DEST_PATH_IMAGE071
the click-through rate prediction model in the teacher's network representing the unenhanced sample fits the true label,
Figure 281196DEST_PATH_IMAGE072
fitting the real label by a click rate prediction model in a teacher network representing an enhanced sample;
the total loss function for the mixed knowledge distillation is expressed as:
Figure 612951DEST_PATH_IMAGE073
wherein the content of the first and second substances,
Figure 368418DEST_PATH_IMAGE074
showing the total loss function corresponding to the mixed knowledge distillation,
Figure 354216DEST_PATH_IMAGE075
representing teacher in network
Figure 169725DEST_PATH_IMAGE076
The number of the teacher models is set according to the teacher model,
Figure 723197DEST_PATH_IMAGE077
a network of teachers is represented and,
Figure 485617DEST_PATH_IMAGE078
a network of students is represented and,
Figure 119729DEST_PATH_IMAGE079
representing the number of teacher models in the teacher network,
Figure 309402DEST_PATH_IMAGE080
representing teacher in network
Figure 412488DEST_PATH_IMAGE076
Knowledge weights for individual teacher models.
2. The method for predicting click rate of multi-order feature optimization and hybrid knowledge distillation as claimed in claim 1, wherein in the step one, the steps of performing feature extraction on the obtained original user behavior data and clicked advertisement data, and performing one-hot code transformation to obtain the user behavior feature embedded vector and the advertisement feature embedded vector respectively comprise the following steps:
preprocessing the user behavior data and the clicked advertisement data, wherein the preprocessing comprises the following steps:
extracting corresponding discrete features from relevant fields of age, gender and user type, and processing the discrete features by an embedding method to gather semantically similar features to a close position in a feature space;
the pre-processing further comprises:
extracting corresponding continuous features from relevant fields of price and time, carrying out normalization processing on the continuous features, and compressing the feature value to [0,1];
generating a user behavior feature embedded vector according to the preprocessed user behavior data, and generating an advertisement feature embedded vector according to the preprocessed clicked advertisement data; wherein the user behavior feature embedding vector and the advertisement feature embedding vector are marked as feature embedding vectors
Figure 713019DEST_PATH_IMAGE081
3. The method for click-through rate prediction with multi-order feature optimization and mixed knowledge distillation as claimed in claim 2, wherein in the second step, the user behavior feature embedding vector and the advertisement feature embedding vector are inputted into a SENET, and then the feature optimization based on channel attention is performed to generate the first-order features, comprising the following steps:
embedding the feature into a vector by an averaging pooling operation using a SENET network
Figure 201638DEST_PATH_IMAGE081
Compressing to calculate a statistical vector;
designing two full-connection layers based on the statistical vector to calculate attention weight;
embedding vectors for the features according to the attention weights
Figure 358950DEST_PATH_IMAGE081
Weighting to generate the first order features;
the first order features are represented as:
Figure 887014DEST_PATH_IMAGE082
Figure 787974DEST_PATH_IMAGE083
Figure 599941DEST_PATH_IMAGE084
wherein the content of the first and second substances,
Figure 928154DEST_PATH_IMAGE085
a first-order feature is represented by,
Figure 943515DEST_PATH_IMAGE086
representing the embedding of vectors to the features
Figure 382586DEST_PATH_IMAGE081
The attention-weighting is performed such that,
Figure 51990DEST_PATH_IMAGE087
the weight of attention is represented as a weight of attention,
Figure 551104DEST_PATH_IMAGE088
a feature-embedded vector is represented that is,
Figure 53761DEST_PATH_IMAGE089
to represent
Figure 296523DEST_PATH_IMAGE081
To middle
Figure 83083DEST_PATH_IMAGE090
The features are embedded into a vector of the feature,
Figure 753098DEST_PATH_IMAGE091
to represent
Figure 477472DEST_PATH_IMAGE081
To middle
Figure 523925DEST_PATH_IMAGE009
The features are embedded into a vector of the feature,
Figure 164991DEST_PATH_IMAGE092
to represent
Figure 5908DEST_PATH_IMAGE089
The attention weight of (a) is given,
Figure 217578DEST_PATH_IMAGE093
represent
Figure 802143DEST_PATH_IMAGE091
Attention weight of,
Figure 297715DEST_PATH_IMAGE094
First order features
Figure 317056DEST_PATH_IMAGE009
The value of the characteristic is used as the characteristic value,
Figure 406235DEST_PATH_IMAGE095
first order features
Figure 653545DEST_PATH_IMAGE096
A characteristic value;
Figure 144570DEST_PATH_IMAGE097
a function representing the calculation of the attention weight,
Figure 202655DEST_PATH_IMAGE098
a first activation function representing a fully connected layer,
Figure 513551DEST_PATH_IMAGE099
a second activation function representing a fully connected layer,
Figure 364220DEST_PATH_IMAGE100
a first parameter representing a fully connected layer,
Figure 585117DEST_PATH_IMAGE101
a second parameter representing a fully connected layer,
Figure 938738DEST_PATH_IMAGE102
a statistical vector is represented that represents the statistical vector,
Figure 861563DEST_PATH_IMAGE103
Figure 794884DEST_PATH_IMAGE104
represents the calculated second
Figure 667025DEST_PATH_IMAGE009
The features are embedded into the corresponding statistical information values of the vector,
Figure 394810DEST_PATH_IMAGE105
a function representing the value of the calculated statistical information,
Figure 680298DEST_PATH_IMAGE106
representing feature embedding vectors
Figure 338681DEST_PATH_IMAGE081
The dimension (c) of (a) is,
Figure 393225DEST_PATH_IMAGE107
representation is calculated from dimension 1 to
Figure 964014DEST_PATH_IMAGE106
4. The method for predicting click rate of multi-order feature optimization and mixed-type knowledge distillation as claimed in claim 3, wherein in the second step, in the step of constructing a domain feature interaction network and performing feature optimization based on domain-symmetric matrix embedding on the obtained first-order features, the following formula is applied:
Figure 736798DEST_PATH_IMAGE108
wherein, the first and the second end of the pipe are connected with each other,
Figure 933293DEST_PATH_IMAGE109
represents the output of the domain feature interaction network,
Figure 842344DEST_PATH_IMAGE110
represents one
Figure 849614DEST_PATH_IMAGE111
The symmetric matrix is a matrix of a plurality of,
Figure 844115DEST_PATH_IMAGE112
representing the basis weighting parameters learnable by the domain feature interaction network,
Figure 847230DEST_PATH_IMAGE113
representation domain feature interactive network learnable
Figure 610787DEST_PATH_IMAGE009
The weighting parameters of the individual feature embedding vectors,
Figure 788958DEST_PATH_IMAGE114
the number of features is represented by a number of features,
Figure 536335DEST_PATH_IMAGE115
is shown as
Figure 74632DEST_PATH_IMAGE009
The value of the individual feature embedding vector is,
Figure 958275DEST_PATH_IMAGE116
is shown as
Figure 307347DEST_PATH_IMAGE117
The value of the individual feature embedding vector is,
Figure 10861DEST_PATH_IMAGE118
is shown as
Figure 352850DEST_PATH_IMAGE119
The domain characteristics of the individual fields are,
Figure 90999DEST_PATH_IMAGE120
is shown as
Figure 876552DEST_PATH_IMAGE121
The domain characteristics of the individual fields are,
Figure 67362DEST_PATH_IMAGE122
first order features
Figure 947462DEST_PATH_IMAGE117
And (4) the characteristic value.
5. The method as claimed in claim 4, wherein the second order features are formulated as:
Figure 805697DEST_PATH_IMAGE123
wherein, the first and the second end of the pipe are connected with each other,
Figure 496572DEST_PATH_IMAGE124
which represents the second-order characteristics of the image,
Figure 440257DEST_PATH_IMAGE125
it is shown that the splicing operation is performed,
Figure 138697DEST_PATH_IMAGE126
representing the output result obtained by inputting the initial feature embedding vector into the domain feature interaction network,
Figure 851438DEST_PATH_IMAGE127
represents the output result of the first-order feature input into the domain feature interaction network,
Figure 713215DEST_PATH_IMAGE128
to show the spliced second
Figure 878617DEST_PATH_IMAGE129
The number of the feature vectors of each interaction,
Figure 100520DEST_PATH_IMAGE130
and representing the number of the interactive feature vectors generated by the domain feature interactive network.
6. A click-through rate prediction system for multi-order feature optimization and mixed knowledge distillation, wherein the system applies the click-through rate prediction method for multi-order feature optimization and mixed knowledge distillation as claimed in any one of claims 1 to 5, and the system comprises:
a data pre-processing module to:
performing feature extraction on the obtained original user behavior data and the clicked advertisement data, and performing one-hot coding conversion to obtain a user behavior feature embedded vector and an advertisement feature embedded vector respectively;
a model training module to:
inputting the user behavior feature embedding vector and the advertisement feature embedding vector into a SENET, and then performing feature optimization based on channel attention to generate first-order features;
constructing a domain feature interactive network, and executing feature optimization based on domain symmetric matrix embedding on the acquired first-order features to generate second-order features;
inputting the first-order features into a compression interactive network to output to obtain explicit high-order features, inputting the second-order features into a deep neural network to output to obtain implicit high-order features, performing weighted splicing on the explicit high-order features and the implicit high-order features to generate third-order features in a fusion mode, and generating a click rate prediction model based on the third-order features;
the click rate prediction module is used for predicting click rate;
pre-training a click rate prediction model, an AutoInt model and a DIFM model, and then respectively carrying out self-distillation and then combining to construct a teacher network;
pre-training a DNN model and an FM model, mutually distilling, and combining to construct a student network;
designing a gate control network, calculating the knowledge weight of a teacher model in the teacher network through the gate control network, and performing click rate prediction guidance on each student model in the student network by the teacher network based on the knowledge weight of the teacher model so as to realize mixed knowledge distillation; wherein the teacher model knowledge weight represents a knowledge weight of each student model in a teacher model guidance student network;
an advertisement recommendation module for;
and (3) carrying out online deployment on the student network with mixed knowledge distillation output to obtain a plurality of predicted values, carrying out descending order arrangement, selecting a preset number of advertisements with the highest predicted values, and recommending the advertisements to users to finish click rate prediction.
CN202211200198.9A 2022-09-29 2022-09-29 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation Active CN115271272B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211200198.9A CN115271272B (en) 2022-09-29 2022-09-29 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211200198.9A CN115271272B (en) 2022-09-29 2022-09-29 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation

Publications (2)

Publication Number Publication Date
CN115271272A CN115271272A (en) 2022-11-01
CN115271272B true CN115271272B (en) 2022-12-27

Family

ID=83756968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211200198.9A Active CN115271272B (en) 2022-09-29 2022-09-29 Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation

Country Status (1)

Country Link
CN (1) CN115271272B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110870019A (en) * 2017-10-16 2020-03-06 因美纳有限公司 Semi-supervised learning for training deep convolutional neural network sets
CN111325579A (en) * 2020-02-25 2020-06-23 华南师范大学 Advertisement click rate prediction method
CN111563770A (en) * 2020-04-27 2020-08-21 杭州金智塔科技有限公司 Click rate estimation method based on feature differentiation learning
CN112395876A (en) * 2021-01-21 2021-02-23 华东交通大学 Knowledge distillation and multitask learning-based chapter relationship identification method and device
CN112967088A (en) * 2021-03-03 2021-06-15 上海数鸣人工智能科技有限公司 Marketing activity prediction model structure and prediction method based on knowledge distillation
CN113344615A (en) * 2021-05-27 2021-09-03 上海数鸣人工智能科技有限公司 Marketing activity prediction method based on GBDT and DL fusion model
CN113887694A (en) * 2020-07-01 2022-01-04 复旦大学 Click rate estimation model based on characteristic representation under attention mechanism
CN113962384A (en) * 2021-10-15 2022-01-21 清华大学 Automatic integrated architecture search system and method for click rate prediction model
CN114241007A (en) * 2021-12-20 2022-03-25 江南大学 Multi-target tracking method based on cross-task mutual learning, terminal equipment and medium
CN114781503A (en) * 2022-04-09 2022-07-22 东华大学 Click rate estimation method based on depth feature fusion
CN115048855A (en) * 2022-05-06 2022-09-13 南宁师范大学 Click rate prediction model, training method and application device thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130106683A1 (en) * 2011-10-31 2013-05-02 Elwha LLC, a limited liability company of the State of Delaware Context-sensitive query enrichment
US20220076136A1 (en) * 2020-09-09 2022-03-10 Peyman PASSBAN Method and system for training a neural network model using knowledge distillation

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110870019A (en) * 2017-10-16 2020-03-06 因美纳有限公司 Semi-supervised learning for training deep convolutional neural network sets
CN111325579A (en) * 2020-02-25 2020-06-23 华南师范大学 Advertisement click rate prediction method
CN111563770A (en) * 2020-04-27 2020-08-21 杭州金智塔科技有限公司 Click rate estimation method based on feature differentiation learning
CN113887694A (en) * 2020-07-01 2022-01-04 复旦大学 Click rate estimation model based on characteristic representation under attention mechanism
CN112395876A (en) * 2021-01-21 2021-02-23 华东交通大学 Knowledge distillation and multitask learning-based chapter relationship identification method and device
CN112967088A (en) * 2021-03-03 2021-06-15 上海数鸣人工智能科技有限公司 Marketing activity prediction model structure and prediction method based on knowledge distillation
CN113344615A (en) * 2021-05-27 2021-09-03 上海数鸣人工智能科技有限公司 Marketing activity prediction method based on GBDT and DL fusion model
CN113962384A (en) * 2021-10-15 2022-01-21 清华大学 Automatic integrated architecture search system and method for click rate prediction model
CN114241007A (en) * 2021-12-20 2022-03-25 江南大学 Multi-target tracking method based on cross-task mutual learning, terminal equipment and medium
CN114781503A (en) * 2022-04-09 2022-07-22 东华大学 Click rate estimation method based on depth feature fusion
CN115048855A (en) * 2022-05-06 2022-09-13 南宁师范大学 Click rate prediction model, training method and application device thereof

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction;Huifeng G.;《ResearchGate》;20181231;全文 *
How to Measure The Operating Efficiency of Internet Group-Buying Platform?;Ping Yuan;《Procedia Computer Science》;20151231;全文 *
基于浅层模型与深度模型融合的点击率预测模型研究;鲍俊梅;《中国优秀硕士学位论文全文库 信息科技》;20220215;全文 *
基于深度网络模型压缩的广告点击率预估模型研究;李致贤;《中国优秀硕士学位论文全文库 信息科技》;20220215;全文 *
相关性视觉对抗贝叶斯个性化排序推荐模型;李广丽 等;《工程科学与技术》;20220512;全文 *

Also Published As

Publication number Publication date
CN115271272A (en) 2022-11-01

Similar Documents

Publication Publication Date Title
CN109241424B (en) A kind of recommended method
Wei et al. Some dependent aggregation operators with 2-tuple linguistic information and their application to multiple attribute group decision making
CN111753092B (en) Data processing method, model training method, device and electronic equipment
CN110851713A (en) Information processing method, recommendation method and related equipment
CN111708950A (en) Content recommendation method and device and electronic equipment
CN111325579A (en) Advertisement click rate prediction method
CN111127146A (en) Information recommendation method and system based on convolutional neural network and noise reduction self-encoder
CN103020712B (en) A kind of distributed sorter of massive micro-blog data and method
CN110866542A (en) Depth representation learning method based on feature controllable fusion
CN111563770A (en) Click rate estimation method based on feature differentiation learning
Van Cranenburgh et al. Choice modelling in the age of machine learning
CN110619540A (en) Click stream estimation method of neural network
CN110659411A (en) Personalized recommendation method based on neural attention self-encoder
Shen et al. A voice of the customer real-time strategy: An integrated quality function deployment approach
CN115270004B (en) Educational resource recommendation method based on field factor decomposition
CN117390141A (en) Agricultural socialization service quality user evaluation data analysis method
CN115495654A (en) Click rate estimation method and device based on subspace projection neural network
Zuo et al. A property perceived service quality evaluation method for public buildings based on multisource heterogeneous information fusion
Zhang et al. Multi-scale and multi-channel neural network for click-through rate prediction
CN115271272B (en) Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation
CN113065027A (en) Video recommendation method and device, electronic equipment and storage medium
CN115658936A (en) Personalized program recommendation method and system based on double-layer attention model
CN114565436A (en) Vehicle model recommendation system, method, device and storage medium based on time sequence modeling
Truong et al. Applied Decision Support System Using TOPSIS–AHP, and ICT Newhouse Indicators for Evaluation of Courses at University of Economics Ho Chi Minh City (UEH), Vietnam
CN115203516A (en) Information recommendation method, device, equipment and storage medium based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant