CN115271272B - Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation - Google Patents
Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation Download PDFInfo
- Publication number
- CN115271272B CN115271272B CN202211200198.9A CN202211200198A CN115271272B CN 115271272 B CN115271272 B CN 115271272B CN 202211200198 A CN202211200198 A CN 202211200198A CN 115271272 B CN115271272 B CN 115271272B
- Authority
- CN
- China
- Prior art keywords
- feature
- model
- network
- order
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Human Resources & Organizations (AREA)
- Finance (AREA)
- Game Theory and Decision Science (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Operations Research (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Tourism & Hospitality (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a click rate prediction method and a click rate prediction system for multi-order feature optimization and mixed knowledge distillation.A user behavior data and advertisement data clicked by a user are analyzed to construct an embedded feature vector of the user behavior data and the advertisement data, and a SENET network, domain feature interaction, a CIN model and a DNN model are combined around the embedded feature vector to realize multi-order feature optimization and generate features capable of accurately describing user interest; and then designing a mixed knowledge distillation framework, and outputting a lightweight click rate prediction model with stronger real-time reasoning capability and excellent recommendation precision based on the mixed knowledge distillation framework so as to realize efficient and high-quality advertisement click prediction, improve user recommendation experience and create good economic and social benefits for Internet companies.
Description
Technical Field
The invention relates to the technical field of advertisement recommendation, in particular to a click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation.
Background
The problem of information overload is more and more serious due to the overlarge information amount of the internet. The recommendation system can effectively relieve the information overload problem, analyzes the characteristics of user habits, interests, preference and the like according to the interactive historical data between the users and the items, analyzes the characteristics of the items according to the characteristics of the items, finally establishes an important relation between the users and the items to be recommended, and recommends the items which may be interested in the users to the users.
The click-through rate is usually used for predicting the click-through probability of the user on the internet advertisement or the online commodity, and the click-through rate prediction is an important component of a recommendation system and plays a very important role in an internet commercial platform. As is well known, the Internet advertisement has huge economic benefits, and the advertisement clicking means potential purchase, so that the click rate prediction plays a vital role in promoting the development of society and economy. Therefore, accurate recommendation of the advertisement can improve the user experience and bring abundant economic benefits to the Internet company.
However, the existing standard prediction technology for the click rate of the advertisement has the following problems: (1) Firstly, the feature representation is single, only explicit features or implicit features are used, and the complementarity between the two is not synthesized; (2) And secondly, the feature optimization method is simple and does not consider multi-order feature optimization. Based on the two points, the final characteristics are not strong in discriminability, and the click rate prediction precision is seriously restricted; meanwhile, the existing click rate prediction technology mostly adopts very complicated and huge prediction models, such as DIFM, autoInt and the like, so that the real-time reasoning efficiency is low, the recommendation experience of a user is seriously influenced, and the landing application of the models is also restricted.
Disclosure of Invention
In view of the above situation, the main objective of the present invention is to provide a click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation, so as to solve the problems of simple feature optimization method, low click rate prediction accuracy and low real-time inference efficiency in the prior art.
The invention provides a click rate prediction method for multi-order feature optimization and mixed knowledge distillation, which comprises the following steps:
step one, data preprocessing:
performing feature extraction on the obtained original user behavior data and the clicked advertisement data, and performing one-hot coding conversion to obtain a user behavior feature embedded vector and an advertisement feature embedded vector respectively;
step two, model training:
inputting the user behavior feature embedding vector and the advertisement feature embedding vector into a SENET network, and then performing feature optimization based on channel attention to generate first-order features;
constructing a domain feature interactive network, and executing feature optimization based on domain symmetric matrix embedding on the acquired first-order features to generate second-order features;
inputting the first-order features into a compression interactive network to output to obtain explicit high-order features, inputting the second-order features into a deep neural network to output to obtain implicit high-order features, performing weighted splicing on the explicit high-order features and the implicit high-order features to generate third-order features in a fusion mode, and generating a click rate prediction model based on the third-order features;
step three, predicting the click rate;
pre-training a click rate prediction model, an AutoInt model and a DIFM model, and then respectively carrying out self-distillation and then combining to construct a teacher network;
pre-training a DNN model and an FM model, mutually distilling, and combining to construct a student network;
designing a gate control network, calculating teacher model knowledge weight in a teacher network through the gate control network, and performing click rate prediction guidance on each student model in the student network by the teacher network based on the teacher model knowledge weight to realize mixed knowledge distillation; wherein the teacher model knowledge weight represents a knowledge weight of each student model in a teacher model teaching student network;
step four, recommending advertisements;
and (3) carrying out online deployment on the student network with mixed knowledge distillation output to obtain a plurality of predicted values, carrying out descending order arrangement, selecting a preset number of advertisements with the highest predicted values, and recommending the advertisements to users to finish click rate prediction.
The invention also provides a click rate prediction system of multi-order feature optimization and mixed knowledge distillation, wherein the system comprises:
a data pre-processing module to:
performing feature extraction on the obtained original user behavior data and the clicked advertisement data, and performing one-hot coding conversion to obtain a user behavior feature embedded vector and an advertisement feature embedded vector respectively;
a model training module to:
inputting the user behavior feature embedding vector and the advertisement feature embedding vector into a SENET, and then performing feature optimization based on channel attention to generate first-order features;
constructing a domain feature interactive network, and executing feature optimization based on domain symmetric matrix embedding on the acquired first-order features to generate second-order features;
inputting the first-order features into a compression interactive network to output to obtain explicit high-order features, inputting the second-order features into a deep neural network to output to obtain implicit high-order features, performing weighted splicing on the explicit high-order features and the implicit high-order features to generate third-order features in a fusion mode, and generating a click rate prediction model based on the third-order features;
a click rate prediction module for;
pre-training a click rate prediction model, an AutoInt model and a DIFM model, and then respectively carrying out self-distillation and then combining to construct a teacher network;
pre-training a DNN model and an FM model, mutually distilling, and combining to construct a student network;
designing a gate control network, calculating teacher model knowledge weight in a teacher network through the gate control network, and performing click rate prediction guidance on each student model in the student network by the teacher network based on the teacher model knowledge weight to realize mixed knowledge distillation; wherein the teacher model knowledge weight represents a knowledge weight of each student model in a teacher model teaching student network;
an advertisement recommendation module for;
and (3) carrying out online deployment on the student network with mixed knowledge distillation output to obtain a plurality of predicted values, carrying out descending order arrangement, selecting a preset number of advertisements with the highest predicted values, and recommending the advertisements to users to finish click rate prediction.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a click rate prediction method of multi-order feature optimization and mixed knowledge distillation, which comprises the steps of on one hand, analyzing user behavior data and advertisement data clicked by a user, constructing embedded feature vectors of the user behavior data and the advertisement data, surrounding the embedded feature vectors, combining a SENET network, domain feature interaction, a CIN model and a DNN model, realizing multi-order feature optimization, and generating features capable of accurately describing user interest;
on the other hand, a hybrid knowledge distillation framework is designed, and a lightweight click rate prediction model with stronger real-time reasoning capability and excellent recommendation precision is output based on the hybrid knowledge distillation framework, so that efficient and high-quality advertisement click prediction is realized, the user recommendation experience is improved, and good economic and social benefits are created for internet companies.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
FIG. 1 is a flow chart of a click rate prediction method for multi-order feature optimization and mixed knowledge distillation according to the present invention;
FIG. 2 is a flow chart of a click through rate prediction model (Se-xDeepEFEFM) in the present invention;
FIG. 3 is a flow diagram of a hybrid knowledge distillation framework of the present invention;
fig. 4 is a structural diagram of a click rate prediction system with multi-order feature optimization and mixed knowledge distillation according to the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
These and other aspects of embodiments of the invention will be apparent with reference to the following description and attached drawings. In the description and drawings, particular embodiments of the invention have been disclosed in detail as being indicative of some of the ways in which the principles of the embodiments of the invention may be practiced, but it is understood that the scope of the embodiments of the invention is not limited correspondingly. On the contrary, the embodiments of the invention include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.
Referring to fig. 1 to 3, the present invention provides a click rate prediction method for multi-order feature optimization and mixed knowledge distillation, wherein the method comprises the following steps:
s101, data preprocessing:
and performing feature extraction on the obtained original user behavior data and the clicked advertisement data, and performing one-hot coding conversion to obtain a user behavior feature embedded vector and an advertisement feature embedded vector respectively.
In step S101, the method for extracting features of the acquired original user behavior data and the clicked advertisement data, and performing unique hot code conversion to obtain a user behavior feature embedded vector and an advertisement feature embedded vector respectively includes the following steps:
s1011, preprocessing the user behavior data and the clicked advertisement data, wherein the preprocessing comprises:
extracting corresponding discrete features from relevant fields of age, gender and user type, and processing the discrete features by an embedding method to enable semantically similar features to be gathered to close positions in a feature space;
extracting corresponding continuous features from relevant fields of price and time, normalizing the continuous features, and compressing the feature value to [0,1].
And S1012, generating a user behavior feature embedded vector according to the preprocessed user behavior data, and generating an advertisement feature embedded vector according to the preprocessed clicked advertisement data.
Wherein the user behavior feature embedded vector and the advertisement feature embedded vector are marked as feature embedded vectors。
S102, model training:
s1021, inputting the user behavior feature embedded vector and the advertisement feature embedded vector into a SENET network, and then performing feature optimization based on channel attention to generate first-order features;
s1022, a domain feature interaction network is constructed, and feature optimization based on domain symmetric matrix embedding is performed on the acquired first-order features to generate second-order features;
s1023, inputting the first-order features into a Compression Interactive Network (CIN) to output explicit high-order features, inputting the second-order features into a deep neural network to output implicit high-order features, performing weighted splicing on the explicit high-order features and the implicit high-order features to generate third-order features in a fusion mode, and generating a click rate prediction model based on the third-order features.
Specifically, in step S102, the method of inputting the user behavior feature embedding vector and the advertisement feature embedding vector into a SENET network, and then performing feature optimization based on channel attention to generate the first-order features includes the following steps:
s1021a, embedding the vector into the feature through an average pooling operation by utilizing a SENET networkCompressing to calculate a statistical vector;
s1021b, designing two full-connection layers based on the statistical vector to obtain attention weight through calculation;
s1021c, embedding the feature into the vector according to the attention weightWeighting to generate the first order features.
The first order features are expressed as:
wherein the content of the first and second substances,the first-order features are represented as,representing the embedding of vectors to the featuresThe attention-weighting is carried out such that,the weight of attention is represented as a weight of attention,a feature-embedded vector is represented that is,to representTo middleThe features are embedded into a vector of the image,to representTo middleThe features are embedded into a vector of the feature,to representThe attention weight of (a) is given,to representThe attention weight of (a) is given,first order featuresThe value of the characteristic is used as the characteristic value,first order featuresThe value of the characteristic is used as the characteristic value,a function representing the calculation of the attention weight,a first activation function representing a fully connected layer,a second activation function representing a fully connected layer,a first parameter representing a fully connected layer,a second parameter representing a fully connected layer,the statistical vector is represented by a vector of numbers,,represents the calculated secondThe features are embedded into the corresponding statistical information values of the vector,a function representing the value of the calculated statistical information,representing feature embedding vectorsThe dimension (c) of (a) is,representation is calculated from dimension 1 to。
Additionally, since the first-order features are subjected to attention weighting, important features are highlighted, and secondary features are suppressed, a solid foundation is laid for the extraction and click rate prediction of the subsequent second-order and third-order features (see fig. 2 for the principle).
Further, a domain feature interaction network is constructed, and feature optimization based on domain symmetric matrix embedding is performed on the acquired first-order features, corresponding to the following formula:
wherein the content of the first and second substances,represents the output of the domain feature interaction network,represents oneThe symmetric matrix is a matrix of a plurality of,representing the basis weighting parameters learnable by the domain feature interaction network,representation domain feature interactive network learnableThe weighting parameters of the individual feature embedding vectors,the number of features is represented by a number of features,is shown asThe value of the individual feature embedding vector is,denotes the firstThe value of the individual feature embedding vector is,denotes the firstThe domain characteristics of the individual fields are,is shown asThe domain characteristics of the individual fields are,first order featuresAnd (4) the characteristic value.
Further, the formula of the second-order feature is expressed as:
wherein, the first and the second end of the pipe are connected with each other,which represents the second-order characteristics of the image,it is shown that the splicing operation is performed,representing the output result obtained by inputting the initial feature embedding vector into the domain feature interaction network,represents the output result of the first-order features input into the domain feature interaction network,to show the spliced secondThe number of feature vectors of each interaction is,and representing the number of the interactive feature vectors generated by the domain feature interactive network.
It should be noted here that, because the feature embedding vector and the high-order representation of the first-order feature are fused, the second-order feature contains richer semantic information, which is helpful to improve the click prediction accuracy.
Further, the output first-order features are input into a Compressed Interactive Network (CIN) to output explicit high-order features. The generation formula of the explicit high-order feature is as follows:
wherein the content of the first and second substances,is shown asFirst in a layer high order matrixA high-order feature vector of the plurality of high-order feature vectors,is shown asFirst in a layer high order matrixA plurality of high-order feature vectors,representing the second in the first-order featuresThe value of the characteristic is used as the characteristic value,,representing first-order features generatingFirst of layer high order eigenvectorsA parameter matrix of a high-order feature,represents the number of layer 0 feature embedding vectors,is shown asThe number of layer feature embedding vectors,is shown asThe first in the layer high order feature vectorThe characteristics of the device are as follows,is shown asThe first in the layer high order feature vectorA first feature ofThe feature vector of a dimension is then calculated,representing the explicit high-order features that are ultimately generated,the total number of layers representing the explicit high-order features,representing the hadamard product.
Further, the second-order features are input into a Deep Neural Network (DNN) to output implicit high-order features. The generation formula of the implicit high-order characteristic is as follows:
wherein the content of the first and second substances,representing the second in a deep neural networkThe neural network output of the layer(s),it is shown that the activation function is,representing the second in a deep neural networkThe weight of a layer is determined by the weight of the layer,representing the second in a deep neural networkThe amount of offset of the layers is such that,the number of layers of the deep neural network is represented.
And combining the explicit high-order features output by the CIN and the implicit high-order features output by the DNN to complete feature fusion and generate third-order features, wherein the third-order features fully utilize the complementarity between the implicit high-order features and the explicit high-order features, and are beneficial to improving feature discriminability and final click prediction precision.
The formula for generating the click rate prediction model based on the third-order features is expressed as:
wherein the content of the first and second substances,the predicted value of the click rate is shown,representsigmoidThe function operates on the basis of the function,all represent the parameters of the click-through rate prediction model,。
s103, click rate prediction:
and S1031, pre-training the click rate prediction model, the AutoInt model and the DIFM model, and then respectively performing self-distillation and then combining to construct a teacher network.
S1032, pre-training the DNN model and the FM model, and then mutually distilling and combining to construct the student network.
The lightweight DNN model (equivalent to the student model 1 in fig. 3) and the FM model (equivalent to the student model 2 in fig. 3) are pre-trained and used as student models to construct a student network. Mutual distillation is carried out between the DNN model and the FM model, so that the integration of diversity information in each student model is facilitated, and the click prediction precision of each student model is improved through mutual distillation.
S1033, designing a gate control network, calculating teacher model knowledge weight in a teacher network through the gate control network, and carrying out click rate prediction guidance on each student model in the student network by the teacher network based on the teacher model knowledge weight so as to realize mixed knowledge distillation; and the teacher model knowledge weight represents the knowledge weight of each student model in the teacher model for guiding the student network.
The specific process of mutual distillation between the DNN model and the FM model is as follows:
wherein the content of the first and second substances,representing the loss function of the FM model in the student network,the presence of a real label is indicated,represents the output of the FM model in the student network,indicating that the FM model in the student network fits the real tags,representing KL loss of the FM model versus the DNN model in the student network,to representThe weight of (c);
representing the loss function of the DNN model in the student network,represents the output of the DNN model in the student network,representing that the DNN model in the student network fits the real tags,representing the KL loss of the DNN model relative to the FM model in the student network,representThe weight of (c).
Further, a DIFM model (equivalent to teacher model 1 in FIG. 3), an Autoint model (equivalent to teacher model 2 in FIG. 3) and a Se-xDeeepEFEM model (equivalent to teacher model 3 in FIG. 3) are pre-trained, and the pre-trained three models are self-distilled and then combined into a teacher network. Because the teacher models in the teacher network are mutually heterogeneous, more diverse knowledge can be provided for the student models so as to promote the improvement of the click prediction precision of the student models; and designing a GATE mechanism, adaptively adjusting the knowledge weight of each teacher model in the teacher network to each student model in the student network, wherein the larger the knowledge weight is, the more valuable knowledge is provided for the student models in knowledge distillation by the corresponding teacher model, so that the click rate prediction accuracy of the student models is promoted.
Specifically, the formula for self-distillation of the click rate prediction model (Se-xDeepFEFM model), the AutoInt model and the diff model is as follows:
wherein the content of the first and second substances,a loss function representing the diff model,represents the output of the diff model in the teacher network for the unenhanced sample,represents the output of the diff model in the teacher network for the enhanced sample,to representThe weight of (a) is determined,diff models in the teacher network representing the unenhanced samples fit the authentic labels,fitting the real label by using a DIFM model in the teacher network representing the enhanced sample;
represents the penalty function of the AutoInt model,represents the output of the AutoInt model in the teacher network for the unenhanced sample,represents the output of the AutoInt model in the teacher network for the enhanced sample,representThe weight of (a) is determined,the AutoInt model in the teacher network representing the unenhanced sample fits the real tags,the AutoInt model in the teacher network representing the enhanced sample is used for fitting the real label;
a loss function representing a click-through rate prediction model,representing the output of the click rate prediction model in the teacher's network for the unenhanced sample,representing the output of the click-through rate prediction model in the teacher's network for the enhanced sample,to representThe weight of (a) is calculated,the click-through rate prediction model in the teacher's network representing the unenhanced sample fits the true label,and fitting the real label by a click rate prediction model in the teacher network representing the enhanced sample.
In the invention, the Se-xDeeepEFM model completes self-distillation through sample diversity, and the self-distillation can compress the scale of the teacher model, thereby being beneficial to reducing the 'generation ditch' between the teacher model and the student model so as to better train a mixed knowledge distillation frame.
The total loss function for the mixed knowledge distillation is expressed as:
wherein the content of the first and second substances,represents the total loss function corresponding to the mixed knowledge distillation,representing teacher in networkThe number of the teacher models is set according to the teacher model,a network of teachers is represented and,a network of students is represented and,representing the number of teacher models in the teacher network,representing teacher in networkKnowledge weights for individual teacher models.
S104, recommending advertisements;
and (3) carrying out online deployment on the student network with mixed knowledge distillation output to obtain a plurality of predicted values, carrying out descending order arrangement, selecting a preset number of advertisements with the highest predicted values, and recommending the advertisements to users to finish click rate prediction.
And performing joint training on the teacher network and the student network, namely, transmitting knowledge from the teacher model in the teacher network to the student models in the student network through the GATE to realize mixed knowledge distillation. The mixed knowledge distillation framework outputs a lightweight student model, and the lightweight student model is used for calculating the click prediction value, so that the prediction precision is ensured, the real-time prediction efficiency is improved, and the real-time reasoning capability of the click rate prediction model is enhanced.
Referring to fig. 4, the present invention provides a click rate prediction system for multi-level feature optimization and mixed knowledge distillation, wherein the system comprises:
a data pre-processing module to:
performing feature extraction on the obtained original user behavior data and the clicked advertisement data, and performing one-hot coding conversion to obtain a user behavior feature embedded vector and an advertisement feature embedded vector respectively;
a model training module to:
inputting the user behavior feature embedding vector and the advertisement feature embedding vector into a SENET network, and then performing feature optimization based on channel attention to generate first-order features;
constructing a domain feature interactive network, and executing feature optimization based on domain symmetric matrix embedding on the acquired first-order features to generate second-order features;
inputting the first-order features into a compression interactive network to output to obtain explicit high-order features, inputting the second-order features into a deep neural network to output to obtain implicit high-order features, performing weighted splicing on the explicit high-order features and the implicit high-order features to generate third-order features in a fusion mode, and generating a click rate prediction model based on the third-order features;
the click rate prediction module is used for predicting click rate;
pre-training a click rate prediction model, an AutoInt model and a DIFM model, and then respectively carrying out self-distillation and then combining to construct a teacher network;
pre-training a DNN model and an FM model, mutually distilling, and combining to construct a student network;
designing a gate control network, calculating the knowledge weight of a teacher model in the teacher network through the gate control network, and performing click rate prediction guidance on each student model in the student network by the teacher network based on the knowledge weight of the teacher model so as to realize mixed knowledge distillation; wherein the teacher model knowledge weight represents a knowledge weight of each student model in a teacher model guidance student network;
an advertisement recommendation module for;
and (3) carrying out online deployment on the student network with mixed knowledge distillation output to obtain a plurality of predicted values, carrying out descending order arrangement, selecting a preset number of advertisements with the highest predicted values, and recommending the advertisements to users to finish click rate prediction.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (6)
1. A click rate prediction method for multi-order feature optimization and mixed knowledge distillation is characterized by comprising the following steps:
step one, data preprocessing:
performing feature extraction on the obtained original user behavior data and the clicked advertisement data, and performing one-hot coding conversion to obtain a user behavior feature embedded vector and an advertisement feature embedded vector respectively;
step two, model training:
inputting the user behavior feature embedding vector and the advertisement feature embedding vector into a SENET, and then performing feature optimization based on channel attention to generate first-order features;
constructing a domain feature interactive network, and executing feature optimization based on domain symmetric matrix embedding on the acquired first-order features to generate second-order features;
inputting the first-order features into a compression interactive network to output to obtain explicit high-order features, inputting the second-order features into a deep neural network to output to obtain implicit high-order features, performing weighted splicing on the explicit high-order features and the implicit high-order features to generate third-order features in a fusion mode, and generating a click rate prediction model based on the third-order features;
step three, predicting the click rate;
pre-training a click rate prediction model, an AutoInt model and a DIFM model, and then respectively carrying out self-distillation and then combining to construct a teacher network;
pre-training a DNN model and an FM model, mutually distilling, and combining to construct a student network;
designing a gate control network, calculating the knowledge weight of a teacher model in the teacher network through the gate control network, and performing click rate prediction guidance on each student model in the student network by the teacher network based on the knowledge weight of the teacher model so as to realize mixed knowledge distillation; wherein the teacher model knowledge weight represents a knowledge weight of each student model in a teacher model teaching student network;
step four, recommending advertisements;
deploying a student network which is output by mixed knowledge distillation on line to obtain a plurality of predicted values, arranging the predicted values in a descending order, and selecting a preset number of advertisements with the highest predicted values to recommend the advertisements to a user to complete click rate prediction;
in the second step, the first-order feature is input into a compressed interactive network to output a method for obtaining an explicit high-order feature, and a generation formula of the explicit high-order feature is as follows:
wherein the content of the first and second substances,denotes the firstFirst in a layer high order matrixA high-order feature vector of the plurality of high-order feature vectors,denotes the firstFirst in a layer high order matrixA high-order feature vector of the plurality of high-order feature vectors,representing the first in the first order featuresThe value of the characteristic is used as the characteristic value,,representing first-order features generatingFirst of layer high order eigenvectorsA parameter matrix of a high-order feature,represents the number of layer 0 feature embedding vectors,is shown asThe number of layer feature embedding vectors,denotes the firstThe first in the layer high order feature vectorThe characteristics of the device are as follows,is shown asThe first in the layer high order feature vectorA first feature ofThe feature vector of the dimension(s),representing the explicit high-order features that are ultimately generated,the total number of layers representing the explicit high-order features,representing a Hadamard product;
in the method for inputting the second-order feature into the deep neural network to output the implicit high-order feature, a generation formula of the implicit high-order feature is as follows:
wherein the content of the first and second substances,representing the second in a deep neural networkThe neural network output of the layer(s),it is shown that the activation function is,representing the first in a deep neural networkThe weight of a layer is determined by the weight of the layer,representing the first in a deep neural networkThe amount of offset of the layer(s),representing the number of layers of the deep neural network;
in the second step, the formula for generating the click rate prediction model based on the third-order feature is represented as:
wherein the content of the first and second substances,the predicted value of the click-through rate is shown,to representsigmoidThe function operates on the basis of the function,all represent the parameters of the click-through rate prediction model,;
in the third step, the formula corresponding to the mixed knowledge distillation is expressed as follows:
wherein the content of the first and second substances,representing the loss function of the FM model in the student network,the presence of a real label is indicated,representing the output of the FM model in the student network,indicating that the FM model in the student network fits the real tags,representing KL loss of the FM model versus the DNN model in the student network,to representThe weight of (c);
representing the loss function of the DNN model in the student network,represents the output of the DNN model in the student network,representing that the DNN model in the student network fits the real tags,representing the KL loss of the DNN model relative to the FM model in the student network,to representThe weight of (c);
the formula for self-distillation of the click rate prediction model, the AutoInt model and the DIFM model is as follows:
wherein the content of the first and second substances,a loss function representing the diff model,represents the output of the diff model in the teacher network for the unenhanced sample,represents the output of the diff model in the teacher network for the enhanced sample,to representThe weight of (a) is determined,diff models in the teacher network representing the unenhanced samples fit the authentic labels,fitting the real label by using a DIFM model in the teacher network representing the enhanced sample;
denotes Authe loss function of the topin model,represents the output of the AutoInt model in the teacher network for the unenhanced sample,represents the output of the AutoInt model in the teacher network for the enhanced sample,to representThe weight of (a) is determined,the AutoInt model in the teacher network representing the unenhanced sample fits the real tags,the AutoInt model in the teacher network representing the enhanced sample is used for fitting the real label;
a loss function representing a click-through rate prediction model,representing the output of the click rate prediction model in the teacher's network for the unenhanced sample,representing the output of the click-through rate prediction model in the teacher's network for the enhanced sample,to representThe weight of (a) is determined,the click-through rate prediction model in the teacher's network representing the unenhanced sample fits the true label,fitting the real label by a click rate prediction model in a teacher network representing an enhanced sample;
the total loss function for the mixed knowledge distillation is expressed as:
wherein the content of the first and second substances,showing the total loss function corresponding to the mixed knowledge distillation,representing teacher in networkThe number of the teacher models is set according to the teacher model,a network of teachers is represented and,a network of students is represented and,representing the number of teacher models in the teacher network,representing teacher in networkKnowledge weights for individual teacher models.
2. The method for predicting click rate of multi-order feature optimization and hybrid knowledge distillation as claimed in claim 1, wherein in the step one, the steps of performing feature extraction on the obtained original user behavior data and clicked advertisement data, and performing one-hot code transformation to obtain the user behavior feature embedded vector and the advertisement feature embedded vector respectively comprise the following steps:
preprocessing the user behavior data and the clicked advertisement data, wherein the preprocessing comprises the following steps:
extracting corresponding discrete features from relevant fields of age, gender and user type, and processing the discrete features by an embedding method to gather semantically similar features to a close position in a feature space;
the pre-processing further comprises:
extracting corresponding continuous features from relevant fields of price and time, carrying out normalization processing on the continuous features, and compressing the feature value to [0,1];
generating a user behavior feature embedded vector according to the preprocessed user behavior data, and generating an advertisement feature embedded vector according to the preprocessed clicked advertisement data; wherein the user behavior feature embedding vector and the advertisement feature embedding vector are marked as feature embedding vectors。
3. The method for click-through rate prediction with multi-order feature optimization and mixed knowledge distillation as claimed in claim 2, wherein in the second step, the user behavior feature embedding vector and the advertisement feature embedding vector are inputted into a SENET, and then the feature optimization based on channel attention is performed to generate the first-order features, comprising the following steps:
embedding the feature into a vector by an averaging pooling operation using a SENET networkCompressing to calculate a statistical vector;
designing two full-connection layers based on the statistical vector to calculate attention weight;
embedding vectors for the features according to the attention weightsWeighting to generate the first order features;
the first order features are represented as:
wherein the content of the first and second substances,a first-order feature is represented by,representing the embedding of vectors to the featuresThe attention-weighting is performed such that,the weight of attention is represented as a weight of attention,a feature-embedded vector is represented that is,to representTo middleThe features are embedded into a vector of the feature,to representTo middleThe features are embedded into a vector of the feature,to representThe attention weight of (a) is given,representAttention weight of,First order featuresThe value of the characteristic is used as the characteristic value,first order featuresA characteristic value;
a function representing the calculation of the attention weight,a first activation function representing a fully connected layer,a second activation function representing a fully connected layer,a first parameter representing a fully connected layer,a second parameter representing a fully connected layer,a statistical vector is represented that represents the statistical vector,,represents the calculated secondThe features are embedded into the corresponding statistical information values of the vector,a function representing the value of the calculated statistical information,representing feature embedding vectorsThe dimension (c) of (a) is,representation is calculated from dimension 1 to。
4. The method for predicting click rate of multi-order feature optimization and mixed-type knowledge distillation as claimed in claim 3, wherein in the second step, in the step of constructing a domain feature interaction network and performing feature optimization based on domain-symmetric matrix embedding on the obtained first-order features, the following formula is applied:
wherein, the first and the second end of the pipe are connected with each other,represents the output of the domain feature interaction network,represents oneThe symmetric matrix is a matrix of a plurality of,representing the basis weighting parameters learnable by the domain feature interaction network,representation domain feature interactive network learnableThe weighting parameters of the individual feature embedding vectors,the number of features is represented by a number of features,is shown asThe value of the individual feature embedding vector is,is shown asThe value of the individual feature embedding vector is,is shown asThe domain characteristics of the individual fields are,is shown asThe domain characteristics of the individual fields are,first order featuresAnd (4) the characteristic value.
5. The method as claimed in claim 4, wherein the second order features are formulated as:
wherein, the first and the second end of the pipe are connected with each other,which represents the second-order characteristics of the image,it is shown that the splicing operation is performed,representing the output result obtained by inputting the initial feature embedding vector into the domain feature interaction network,represents the output result of the first-order feature input into the domain feature interaction network,to show the spliced secondThe number of the feature vectors of each interaction,and representing the number of the interactive feature vectors generated by the domain feature interactive network.
6. A click-through rate prediction system for multi-order feature optimization and mixed knowledge distillation, wherein the system applies the click-through rate prediction method for multi-order feature optimization and mixed knowledge distillation as claimed in any one of claims 1 to 5, and the system comprises:
a data pre-processing module to:
performing feature extraction on the obtained original user behavior data and the clicked advertisement data, and performing one-hot coding conversion to obtain a user behavior feature embedded vector and an advertisement feature embedded vector respectively;
a model training module to:
inputting the user behavior feature embedding vector and the advertisement feature embedding vector into a SENET, and then performing feature optimization based on channel attention to generate first-order features;
constructing a domain feature interactive network, and executing feature optimization based on domain symmetric matrix embedding on the acquired first-order features to generate second-order features;
inputting the first-order features into a compression interactive network to output to obtain explicit high-order features, inputting the second-order features into a deep neural network to output to obtain implicit high-order features, performing weighted splicing on the explicit high-order features and the implicit high-order features to generate third-order features in a fusion mode, and generating a click rate prediction model based on the third-order features;
the click rate prediction module is used for predicting click rate;
pre-training a click rate prediction model, an AutoInt model and a DIFM model, and then respectively carrying out self-distillation and then combining to construct a teacher network;
pre-training a DNN model and an FM model, mutually distilling, and combining to construct a student network;
designing a gate control network, calculating the knowledge weight of a teacher model in the teacher network through the gate control network, and performing click rate prediction guidance on each student model in the student network by the teacher network based on the knowledge weight of the teacher model so as to realize mixed knowledge distillation; wherein the teacher model knowledge weight represents a knowledge weight of each student model in a teacher model guidance student network;
an advertisement recommendation module for;
and (3) carrying out online deployment on the student network with mixed knowledge distillation output to obtain a plurality of predicted values, carrying out descending order arrangement, selecting a preset number of advertisements with the highest predicted values, and recommending the advertisements to users to finish click rate prediction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211200198.9A CN115271272B (en) | 2022-09-29 | 2022-09-29 | Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211200198.9A CN115271272B (en) | 2022-09-29 | 2022-09-29 | Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115271272A CN115271272A (en) | 2022-11-01 |
CN115271272B true CN115271272B (en) | 2022-12-27 |
Family
ID=83756968
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211200198.9A Active CN115271272B (en) | 2022-09-29 | 2022-09-29 | Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115271272B (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110870019A (en) * | 2017-10-16 | 2020-03-06 | 因美纳有限公司 | Semi-supervised learning for training deep convolutional neural network sets |
CN111325579A (en) * | 2020-02-25 | 2020-06-23 | 华南师范大学 | Advertisement click rate prediction method |
CN111563770A (en) * | 2020-04-27 | 2020-08-21 | 杭州金智塔科技有限公司 | Click rate estimation method based on feature differentiation learning |
CN112395876A (en) * | 2021-01-21 | 2021-02-23 | 华东交通大学 | Knowledge distillation and multitask learning-based chapter relationship identification method and device |
CN112967088A (en) * | 2021-03-03 | 2021-06-15 | 上海数鸣人工智能科技有限公司 | Marketing activity prediction model structure and prediction method based on knowledge distillation |
CN113344615A (en) * | 2021-05-27 | 2021-09-03 | 上海数鸣人工智能科技有限公司 | Marketing activity prediction method based on GBDT and DL fusion model |
CN113887694A (en) * | 2020-07-01 | 2022-01-04 | 复旦大学 | Click rate estimation model based on characteristic representation under attention mechanism |
CN113962384A (en) * | 2021-10-15 | 2022-01-21 | 清华大学 | Automatic integrated architecture search system and method for click rate prediction model |
CN114241007A (en) * | 2021-12-20 | 2022-03-25 | 江南大学 | Multi-target tracking method based on cross-task mutual learning, terminal equipment and medium |
CN114781503A (en) * | 2022-04-09 | 2022-07-22 | 东华大学 | Click rate estimation method based on depth feature fusion |
CN115048855A (en) * | 2022-05-06 | 2022-09-13 | 南宁师范大学 | Click rate prediction model, training method and application device thereof |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130106683A1 (en) * | 2011-10-31 | 2013-05-02 | Elwha LLC, a limited liability company of the State of Delaware | Context-sensitive query enrichment |
US20220076136A1 (en) * | 2020-09-09 | 2022-03-10 | Peyman PASSBAN | Method and system for training a neural network model using knowledge distillation |
-
2022
- 2022-09-29 CN CN202211200198.9A patent/CN115271272B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110870019A (en) * | 2017-10-16 | 2020-03-06 | 因美纳有限公司 | Semi-supervised learning for training deep convolutional neural network sets |
CN111325579A (en) * | 2020-02-25 | 2020-06-23 | 华南师范大学 | Advertisement click rate prediction method |
CN111563770A (en) * | 2020-04-27 | 2020-08-21 | 杭州金智塔科技有限公司 | Click rate estimation method based on feature differentiation learning |
CN113887694A (en) * | 2020-07-01 | 2022-01-04 | 复旦大学 | Click rate estimation model based on characteristic representation under attention mechanism |
CN112395876A (en) * | 2021-01-21 | 2021-02-23 | 华东交通大学 | Knowledge distillation and multitask learning-based chapter relationship identification method and device |
CN112967088A (en) * | 2021-03-03 | 2021-06-15 | 上海数鸣人工智能科技有限公司 | Marketing activity prediction model structure and prediction method based on knowledge distillation |
CN113344615A (en) * | 2021-05-27 | 2021-09-03 | 上海数鸣人工智能科技有限公司 | Marketing activity prediction method based on GBDT and DL fusion model |
CN113962384A (en) * | 2021-10-15 | 2022-01-21 | 清华大学 | Automatic integrated architecture search system and method for click rate prediction model |
CN114241007A (en) * | 2021-12-20 | 2022-03-25 | 江南大学 | Multi-target tracking method based on cross-task mutual learning, terminal equipment and medium |
CN114781503A (en) * | 2022-04-09 | 2022-07-22 | 东华大学 | Click rate estimation method based on depth feature fusion |
CN115048855A (en) * | 2022-05-06 | 2022-09-13 | 南宁师范大学 | Click rate prediction model, training method and application device thereof |
Non-Patent Citations (5)
Title |
---|
DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction;Huifeng G.;《ResearchGate》;20181231;全文 * |
How to Measure The Operating Efficiency of Internet Group-Buying Platform?;Ping Yuan;《Procedia Computer Science》;20151231;全文 * |
基于浅层模型与深度模型融合的点击率预测模型研究;鲍俊梅;《中国优秀硕士学位论文全文库 信息科技》;20220215;全文 * |
基于深度网络模型压缩的广告点击率预估模型研究;李致贤;《中国优秀硕士学位论文全文库 信息科技》;20220215;全文 * |
相关性视觉对抗贝叶斯个性化排序推荐模型;李广丽 等;《工程科学与技术》;20220512;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN115271272A (en) | 2022-11-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109241424B (en) | A kind of recommended method | |
Wei et al. | Some dependent aggregation operators with 2-tuple linguistic information and their application to multiple attribute group decision making | |
CN111753092B (en) | Data processing method, model training method, device and electronic equipment | |
CN110851713A (en) | Information processing method, recommendation method and related equipment | |
CN111708950A (en) | Content recommendation method and device and electronic equipment | |
CN111325579A (en) | Advertisement click rate prediction method | |
CN111127146A (en) | Information recommendation method and system based on convolutional neural network and noise reduction self-encoder | |
CN103020712B (en) | A kind of distributed sorter of massive micro-blog data and method | |
CN110866542A (en) | Depth representation learning method based on feature controllable fusion | |
CN111563770A (en) | Click rate estimation method based on feature differentiation learning | |
Van Cranenburgh et al. | Choice modelling in the age of machine learning | |
CN110619540A (en) | Click stream estimation method of neural network | |
CN110659411A (en) | Personalized recommendation method based on neural attention self-encoder | |
Shen et al. | A voice of the customer real-time strategy: An integrated quality function deployment approach | |
CN115270004B (en) | Educational resource recommendation method based on field factor decomposition | |
CN117390141A (en) | Agricultural socialization service quality user evaluation data analysis method | |
CN115495654A (en) | Click rate estimation method and device based on subspace projection neural network | |
Zuo et al. | A property perceived service quality evaluation method for public buildings based on multisource heterogeneous information fusion | |
Zhang et al. | Multi-scale and multi-channel neural network for click-through rate prediction | |
CN115271272B (en) | Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation | |
CN113065027A (en) | Video recommendation method and device, electronic equipment and storage medium | |
CN115658936A (en) | Personalized program recommendation method and system based on double-layer attention model | |
CN114565436A (en) | Vehicle model recommendation system, method, device and storage medium based on time sequence modeling | |
Truong et al. | Applied Decision Support System Using TOPSIS–AHP, and ICT Newhouse Indicators for Evaluation of Courses at University of Economics Ho Chi Minh City (UEH), Vietnam | |
CN115203516A (en) | Information recommendation method, device, equipment and storage medium based on artificial intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |