CN115271272B

CN115271272B - Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation

Info

Publication number: CN115271272B
Application number: CN202211200198.9A
Authority: CN
Inventors: 李广丽; 许广鑫; 吴光庭; 李传秀; 叶艺源; 张红斌
Original assignee: East China Jiaotong University
Current assignee: East China Jiaotong University
Priority date: 2022-09-29
Filing date: 2022-09-29
Publication date: 2022-12-27
Anticipated expiration: 2042-09-29
Also published as: CN115271272A

Abstract

The invention provides a click rate prediction method and a click rate prediction system for multi-order feature optimization and mixed knowledge distillation.A user behavior data and advertisement data clicked by a user are analyzed to construct an embedded feature vector of the user behavior data and the advertisement data, and a SENET network, domain feature interaction, a CIN model and a DNN model are combined around the embedded feature vector to realize multi-order feature optimization and generate features capable of accurately describing user interest; and then designing a mixed knowledge distillation framework, and outputting a lightweight click rate prediction model with stronger real-time reasoning capability and excellent recommendation precision based on the mixed knowledge distillation framework so as to realize efficient and high-quality advertisement click prediction, improve user recommendation experience and create good economic and social benefits for Internet companies.

Description

Click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation

Technical Field

The invention relates to the technical field of advertisement recommendation, in particular to a click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation.

Background

The problem of information overload is more and more serious due to the overlarge information amount of the internet. The recommendation system can effectively relieve the information overload problem, analyzes the characteristics of user habits, interests, preference and the like according to the interactive historical data between the users and the items, analyzes the characteristics of the items according to the characteristics of the items, finally establishes an important relation between the users and the items to be recommended, and recommends the items which may be interested in the users to the users.

The click-through rate is usually used for predicting the click-through probability of the user on the internet advertisement or the online commodity, and the click-through rate prediction is an important component of a recommendation system and plays a very important role in an internet commercial platform. As is well known, the Internet advertisement has huge economic benefits, and the advertisement clicking means potential purchase, so that the click rate prediction plays a vital role in promoting the development of society and economy. Therefore, accurate recommendation of the advertisement can improve the user experience and bring abundant economic benefits to the Internet company.

However, the existing standard prediction technology for the click rate of the advertisement has the following problems: (1) Firstly, the feature representation is single, only explicit features or implicit features are used, and the complementarity between the two is not synthesized; (2) And secondly, the feature optimization method is simple and does not consider multi-order feature optimization. Based on the two points, the final characteristics are not strong in discriminability, and the click rate prediction precision is seriously restricted; meanwhile, the existing click rate prediction technology mostly adopts very complicated and huge prediction models, such as DIFM, autoInt and the like, so that the real-time reasoning efficiency is low, the recommendation experience of a user is seriously influenced, and the landing application of the models is also restricted.

Disclosure of Invention

In view of the above situation, the main objective of the present invention is to provide a click rate prediction method and system for multi-order feature optimization and mixed knowledge distillation, so as to solve the problems of simple feature optimization method, low click rate prediction accuracy and low real-time inference efficiency in the prior art.

The invention provides a click rate prediction method for multi-order feature optimization and mixed knowledge distillation, which comprises the following steps:

step one, data preprocessing:

performing feature extraction on the obtained original user behavior data and the clicked advertisement data, and performing one-hot coding conversion to obtain a user behavior feature embedded vector and an advertisement feature embedded vector respectively;

step two, model training:

inputting the user behavior feature embedding vector and the advertisement feature embedding vector into a SENET network, and then performing feature optimization based on channel attention to generate first-order features;

constructing a domain feature interactive network, and executing feature optimization based on domain symmetric matrix embedding on the acquired first-order features to generate second-order features;

inputting the first-order features into a compression interactive network to output to obtain explicit high-order features, inputting the second-order features into a deep neural network to output to obtain implicit high-order features, performing weighted splicing on the explicit high-order features and the implicit high-order features to generate third-order features in a fusion mode, and generating a click rate prediction model based on the third-order features;

step three, predicting the click rate;

pre-training a click rate prediction model, an AutoInt model and a DIFM model, and then respectively carrying out self-distillation and then combining to construct a teacher network;

pre-training a DNN model and an FM model, mutually distilling, and combining to construct a student network;

designing a gate control network, calculating teacher model knowledge weight in a teacher network through the gate control network, and performing click rate prediction guidance on each student model in the student network by the teacher network based on the teacher model knowledge weight to realize mixed knowledge distillation; wherein the teacher model knowledge weight represents a knowledge weight of each student model in a teacher model teaching student network;

step four, recommending advertisements;

and (3) carrying out online deployment on the student network with mixed knowledge distillation output to obtain a plurality of predicted values, carrying out descending order arrangement, selecting a preset number of advertisements with the highest predicted values, and recommending the advertisements to users to finish click rate prediction.

The invention also provides a click rate prediction system of multi-order feature optimization and mixed knowledge distillation, wherein the system comprises:

a data pre-processing module to:

a model training module to:

inputting the user behavior feature embedding vector and the advertisement feature embedding vector into a SENET, and then performing feature optimization based on channel attention to generate first-order features;

a click rate prediction module for;

an advertisement recommendation module for;

Compared with the prior art, the invention has the following beneficial effects:

the invention provides a click rate prediction method of multi-order feature optimization and mixed knowledge distillation, which comprises the steps of on one hand, analyzing user behavior data and advertisement data clicked by a user, constructing embedded feature vectors of the user behavior data and the advertisement data, surrounding the embedded feature vectors, combining a SENET network, domain feature interaction, a CIN model and a DNN model, realizing multi-order feature optimization, and generating features capable of accurately describing user interest;

on the other hand, a hybrid knowledge distillation framework is designed, and a lightweight click rate prediction model with stronger real-time reasoning capability and excellent recommendation precision is output based on the hybrid knowledge distillation framework, so that efficient and high-quality advertisement click prediction is realized, the user recommendation experience is improved, and good economic and social benefits are created for internet companies.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

FIG. 1 is a flow chart of a click rate prediction method for multi-order feature optimization and mixed knowledge distillation according to the present invention;

FIG. 2 is a flow chart of a click through rate prediction model (Se-xDeepEFEFM) in the present invention;

FIG. 3 is a flow diagram of a hybrid knowledge distillation framework of the present invention;

fig. 4 is a structural diagram of a click rate prediction system with multi-order feature optimization and mixed knowledge distillation according to the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

These and other aspects of embodiments of the invention will be apparent with reference to the following description and attached drawings. In the description and drawings, particular embodiments of the invention have been disclosed in detail as being indicative of some of the ways in which the principles of the embodiments of the invention may be practiced, but it is understood that the scope of the embodiments of the invention is not limited correspondingly. On the contrary, the embodiments of the invention include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.

Referring to fig. 1 to 3, the present invention provides a click rate prediction method for multi-order feature optimization and mixed knowledge distillation, wherein the method comprises the following steps:

s101, data preprocessing:

and performing feature extraction on the obtained original user behavior data and the clicked advertisement data, and performing one-hot coding conversion to obtain a user behavior feature embedded vector and an advertisement feature embedded vector respectively.

In step S101, the method for extracting features of the acquired original user behavior data and the clicked advertisement data, and performing unique hot code conversion to obtain a user behavior feature embedded vector and an advertisement feature embedded vector respectively includes the following steps:

s1011, preprocessing the user behavior data and the clicked advertisement data, wherein the preprocessing comprises:

extracting corresponding discrete features from relevant fields of age, gender and user type, and processing the discrete features by an embedding method to enable semantically similar features to be gathered to close positions in a feature space;

extracting corresponding continuous features from relevant fields of price and time, normalizing the continuous features, and compressing the feature value to [0,1].

And S1012, generating a user behavior feature embedded vector according to the preprocessed user behavior data, and generating an advertisement feature embedded vector according to the preprocessed clicked advertisement data.

Wherein the user behavior feature embedded vector and the advertisement feature embedded vector are marked as feature embedded vectors

。

S102, model training:

s1021, inputting the user behavior feature embedded vector and the advertisement feature embedded vector into a SENET network, and then performing feature optimization based on channel attention to generate first-order features;

s1022, a domain feature interaction network is constructed, and feature optimization based on domain symmetric matrix embedding is performed on the acquired first-order features to generate second-order features;

s1023, inputting the first-order features into a Compression Interactive Network (CIN) to output explicit high-order features, inputting the second-order features into a deep neural network to output implicit high-order features, performing weighted splicing on the explicit high-order features and the implicit high-order features to generate third-order features in a fusion mode, and generating a click rate prediction model based on the third-order features.

Specifically, in step S102, the method of inputting the user behavior feature embedding vector and the advertisement feature embedding vector into a SENET network, and then performing feature optimization based on channel attention to generate the first-order features includes the following steps:

s1021a, embedding the vector into the feature through an average pooling operation by utilizing a SENET network

Compressing to calculate a statistical vector;

s1021b, designing two full-connection layers based on the statistical vector to obtain attention weight through calculation;

s1021c, embedding the feature into the vector according to the attention weight

Weighting to generate the first order features.

The first order features are expressed as:

wherein the content of the first and second substances,

the first-order features are represented as,

representing the embedding of vectors to the features

The attention-weighting is carried out such that,

the weight of attention is represented as a weight of attention,

a feature-embedded vector is represented that is,

to represent

To middle

The features are embedded into a vector of the image,

to represent

To middle

The features are embedded into a vector of the feature,

to represent

The attention weight of (a) is given,

to represent

The attention weight of (a) is given,

first order features

The value of the characteristic is used as the characteristic value,

first order features

The value of the characteristic is used as the characteristic value,

a function representing the calculation of the attention weight,

a first activation function representing a fully connected layer,

a second activation function representing a fully connected layer,

a first parameter representing a fully connected layer,

a second parameter representing a fully connected layer,

the statistical vector is represented by a vector of numbers,

，

represents the calculated second

The features are embedded into the corresponding statistical information values of the vector,

a function representing the value of the calculated statistical information,

representing feature embedding vectors

The dimension (c) of (a) is,

representation is calculated from dimension 1 to

。

Additionally, since the first-order features are subjected to attention weighting, important features are highlighted, and secondary features are suppressed, a solid foundation is laid for the extraction and click rate prediction of the subsequent second-order and third-order features (see fig. 2 for the principle).

Further, a domain feature interaction network is constructed, and feature optimization based on domain symmetric matrix embedding is performed on the acquired first-order features, corresponding to the following formula:

wherein the content of the first and second substances,

represents the output of the domain feature interaction network,

represents one

The symmetric matrix is a matrix of a plurality of,

representing the basis weighting parameters learnable by the domain feature interaction network,

representation domain feature interactive network learnable

The weighting parameters of the individual feature embedding vectors,

the number of features is represented by a number of features,

is shown as

The value of the individual feature embedding vector is,

denotes the first

The value of the individual feature embedding vector is,

denotes the first

The domain characteristics of the individual fields are,

is shown as

The domain characteristics of the individual fields are,

first order features

And (4) the characteristic value.

Further, the formula of the second-order feature is expressed as:

wherein, the first and the second end of the pipe are connected with each other,

which represents the second-order characteristics of the image,

it is shown that the splicing operation is performed,

representing the output result obtained by inputting the initial feature embedding vector into the domain feature interaction network,

represents the output result of the first-order features input into the domain feature interaction network,

to show the spliced second

The number of feature vectors of each interaction is,

and representing the number of the interactive feature vectors generated by the domain feature interactive network.

It should be noted here that, because the feature embedding vector and the high-order representation of the first-order feature are fused, the second-order feature contains richer semantic information, which is helpful to improve the click prediction accuracy.

Further, the output first-order features are input into a Compressed Interactive Network (CIN) to output explicit high-order features. The generation formula of the explicit high-order feature is as follows:

wherein the content of the first and second substances,

is shown as

First in a layer high order matrix

A high-order feature vector of the plurality of high-order feature vectors,

is shown as

First in a layer high order matrix

A plurality of high-order feature vectors,

representing the second in the first-order features

The value of the characteristic is used as the characteristic value,

，

representing first-order features generating

First of layer high order eigenvectors

A parameter matrix of a high-order feature,

represents the number of layer 0 feature embedding vectors,

is shown as

The number of layer feature embedding vectors,

is shown as

The first in the layer high order feature vector

The characteristics of the device are as follows,

is shown as

The first in the layer high order feature vector

A first feature of

The feature vector of a dimension is then calculated,

representing the explicit high-order features that are ultimately generated,

the total number of layers representing the explicit high-order features,

representing the hadamard product.

Further, the second-order features are input into a Deep Neural Network (DNN) to output implicit high-order features. The generation formula of the implicit high-order characteristic is as follows:

wherein the content of the first and second substances,

representing the second in a deep neural network

The neural network output of the layer(s),

it is shown that the activation function is,

representing the second in a deep neural network

The weight of a layer is determined by the weight of the layer,

representing the second in a deep neural network

The amount of offset of the layers is such that,

the number of layers of the deep neural network is represented.

And combining the explicit high-order features output by the CIN and the implicit high-order features output by the DNN to complete feature fusion and generate third-order features, wherein the third-order features fully utilize the complementarity between the implicit high-order features and the explicit high-order features, and are beneficial to improving feature discriminability and final click prediction precision.

The formula for generating the click rate prediction model based on the third-order features is expressed as:

wherein the content of the first and second substances,

the predicted value of the click rate is shown,

representsigmoidThe function operates on the basis of the function,

all represent the parameters of the click-through rate prediction model,

。

s103, click rate prediction:

and S1031, pre-training the click rate prediction model, the AutoInt model and the DIFM model, and then respectively performing self-distillation and then combining to construct a teacher network.

S1032, pre-training the DNN model and the FM model, and then mutually distilling and combining to construct the student network.

The lightweight DNN model (equivalent to the student model 1 in fig. 3) and the FM model (equivalent to the student model 2 in fig. 3) are pre-trained and used as student models to construct a student network. Mutual distillation is carried out between the DNN model and the FM model, so that the integration of diversity information in each student model is facilitated, and the click prediction precision of each student model is improved through mutual distillation.

S1033, designing a gate control network, calculating teacher model knowledge weight in a teacher network through the gate control network, and carrying out click rate prediction guidance on each student model in the student network by the teacher network based on the teacher model knowledge weight so as to realize mixed knowledge distillation; and the teacher model knowledge weight represents the knowledge weight of each student model in the teacher model for guiding the student network.

The specific process of mutual distillation between the DNN model and the FM model is as follows:

wherein the content of the first and second substances,

representing the loss function of the FM model in the student network,

the presence of a real label is indicated,

represents the output of the FM model in the student network,

indicating that the FM model in the student network fits the real tags,

representing KL loss of the FM model versus the DNN model in the student network,

to represent

The weight of (c);

representing the loss function of the DNN model in the student network,

represents the output of the DNN model in the student network,

representing that the DNN model in the student network fits the real tags,

representing the KL loss of the DNN model relative to the FM model in the student network,

represent

The weight of (c).

Further, a DIFM model (equivalent to teacher model 1 in FIG. 3), an Autoint model (equivalent to teacher model 2 in FIG. 3) and a Se-xDeeepEFEM model (equivalent to teacher model 3 in FIG. 3) are pre-trained, and the pre-trained three models are self-distilled and then combined into a teacher network. Because the teacher models in the teacher network are mutually heterogeneous, more diverse knowledge can be provided for the student models so as to promote the improvement of the click prediction precision of the student models; and designing a GATE mechanism, adaptively adjusting the knowledge weight of each teacher model in the teacher network to each student model in the student network, wherein the larger the knowledge weight is, the more valuable knowledge is provided for the student models in knowledge distillation by the corresponding teacher model, so that the click rate prediction accuracy of the student models is promoted.

Specifically, the formula for self-distillation of the click rate prediction model (Se-xDeepFEFM model), the AutoInt model and the diff model is as follows:

wherein the content of the first and second substances,

a loss function representing the diff model,

represents the output of the diff model in the teacher network for the unenhanced sample,

represents the output of the diff model in the teacher network for the enhanced sample,

to represent

The weight of (a) is determined,

diff models in the teacher network representing the unenhanced samples fit the authentic labels,

fitting the real label by using a DIFM model in the teacher network representing the enhanced sample;

represents the penalty function of the AutoInt model,

represents the output of the AutoInt model in the teacher network for the unenhanced sample,

represents the output of the AutoInt model in the teacher network for the enhanced sample,

represent

The weight of (a) is determined,

the AutoInt model in the teacher network representing the unenhanced sample fits the real tags,

the AutoInt model in the teacher network representing the enhanced sample is used for fitting the real label;

a loss function representing a click-through rate prediction model,

representing the output of the click rate prediction model in the teacher's network for the unenhanced sample,

representing the output of the click-through rate prediction model in the teacher's network for the enhanced sample,

to represent

The weight of (a) is calculated,

the click-through rate prediction model in the teacher's network representing the unenhanced sample fits the true label,

and fitting the real label by a click rate prediction model in the teacher network representing the enhanced sample.

In the invention, the Se-xDeeepEFM model completes self-distillation through sample diversity, and the self-distillation can compress the scale of the teacher model, thereby being beneficial to reducing the 'generation ditch' between the teacher model and the student model so as to better train a mixed knowledge distillation frame.

The total loss function for the mixed knowledge distillation is expressed as:

wherein the content of the first and second substances,

represents the total loss function corresponding to the mixed knowledge distillation,

representing teacher in network

The number of the teacher models is set according to the teacher model,

a network of teachers is represented and,

a network of students is represented and,

representing the number of teacher models in the teacher network,

representing teacher in network

Knowledge weights for individual teacher models.

S104, recommending advertisements;

And performing joint training on the teacher network and the student network, namely, transmitting knowledge from the teacher model in the teacher network to the student models in the student network through the GATE to realize mixed knowledge distillation. The mixed knowledge distillation framework outputs a lightweight student model, and the lightweight student model is used for calculating the click prediction value, so that the prediction precision is ensured, the real-time prediction efficiency is improved, and the real-time reasoning capability of the click rate prediction model is enhanced.

Referring to fig. 4, the present invention provides a click rate prediction system for multi-level feature optimization and mixed knowledge distillation, wherein the system comprises:

a data pre-processing module to:

a model training module to:

the click rate prediction module is used for predicting click rate;

designing a gate control network, calculating the knowledge weight of a teacher model in the teacher network through the gate control network, and performing click rate prediction guidance on each student model in the student network by the teacher network based on the knowledge weight of the teacher model so as to realize mixed knowledge distillation; wherein the teacher model knowledge weight represents a knowledge weight of each student model in a teacher model guidance student network;

an advertisement recommendation module for;

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A click rate prediction method for multi-order feature optimization and mixed knowledge distillation is characterized by comprising the following steps:

step one, data preprocessing:

step two, model training:

step three, predicting the click rate;

designing a gate control network, calculating the knowledge weight of a teacher model in the teacher network through the gate control network, and performing click rate prediction guidance on each student model in the student network by the teacher network based on the knowledge weight of the teacher model so as to realize mixed knowledge distillation; wherein the teacher model knowledge weight represents a knowledge weight of each student model in a teacher model teaching student network;

step four, recommending advertisements;

deploying a student network which is output by mixed knowledge distillation on line to obtain a plurality of predicted values, arranging the predicted values in a descending order, and selecting a preset number of advertisements with the highest predicted values to recommend the advertisements to a user to complete click rate prediction;

in the second step, the first-order feature is input into a compressed interactive network to output a method for obtaining an explicit high-order feature, and a generation formula of the explicit high-order feature is as follows:

wherein the content of the first and second substances,

denotes the first

First in a layer high order matrix

A high-order feature vector of the plurality of high-order feature vectors,

denotes the first

First in a layer high order matrix

A high-order feature vector of the plurality of high-order feature vectors,

representing the first in the first order features

The value of the characteristic is used as the characteristic value,

，

representing first-order features generating

First of layer high order eigenvectors

A parameter matrix of a high-order feature,

represents the number of layer 0 feature embedding vectors,

is shown as

The number of layer feature embedding vectors,

denotes the first

The first in the layer high order feature vector

The characteristics of the device are as follows,

is shown as

The first in the layer high order feature vector

A first feature of

The feature vector of the dimension(s),

representing the explicit high-order features that are ultimately generated,

the total number of layers representing the explicit high-order features,

representing a Hadamard product;

in the method for inputting the second-order feature into the deep neural network to output the implicit high-order feature, a generation formula of the implicit high-order feature is as follows:

wherein the content of the first and second substances,

representing the second in a deep neural network

The neural network output of the layer(s),

it is shown that the activation function is,

representing the first in a deep neural network

The weight of a layer is determined by the weight of the layer,

representing the first in a deep neural network

The amount of offset of the layer(s),

representing the number of layers of the deep neural network;

in the second step, the formula for generating the click rate prediction model based on the third-order feature is represented as:

wherein the content of the first and second substances,

the predicted value of the click-through rate is shown,

to representsigmoidThe function operates on the basis of the function,

all represent the parameters of the click-through rate prediction model,

；

in the third step, the formula corresponding to the mixed knowledge distillation is expressed as follows:

wherein the content of the first and second substances,

representing the loss function of the FM model in the student network,

the presence of a real label is indicated,

representing the output of the FM model in the student network,

indicating that the FM model in the student network fits the real tags,

to represent

The weight of (c);

representing the loss function of the DNN model in the student network,

represents the output of the DNN model in the student network,

representing that the DNN model in the student network fits the real tags,

to represent

The weight of (c);

the formula for self-distillation of the click rate prediction model, the AutoInt model and the DIFM model is as follows:

wherein the content of the first and second substances,

a loss function representing the diff model,

to represent

The weight of (a) is determined,

denotes Authe loss function of the topin model,

to represent

The weight of (a) is determined,

a loss function representing a click-through rate prediction model,

to represent

The weight of (a) is determined,

fitting the real label by a click rate prediction model in a teacher network representing an enhanced sample;

the total loss function for the mixed knowledge distillation is expressed as:

wherein the content of the first and second substances,

showing the total loss function corresponding to the mixed knowledge distillation,

representing teacher in network

The number of the teacher models is set according to the teacher model,

a network of teachers is represented and,

a network of students is represented and,

representing the number of teacher models in the teacher network,

representing teacher in network

Knowledge weights for individual teacher models.

2. The method for predicting click rate of multi-order feature optimization and hybrid knowledge distillation as claimed in claim 1, wherein in the step one, the steps of performing feature extraction on the obtained original user behavior data and clicked advertisement data, and performing one-hot code transformation to obtain the user behavior feature embedded vector and the advertisement feature embedded vector respectively comprise the following steps:

preprocessing the user behavior data and the clicked advertisement data, wherein the preprocessing comprises the following steps:

extracting corresponding discrete features from relevant fields of age, gender and user type, and processing the discrete features by an embedding method to gather semantically similar features to a close position in a feature space;

the pre-processing further comprises:

extracting corresponding continuous features from relevant fields of price and time, carrying out normalization processing on the continuous features, and compressing the feature value to [0,1];

generating a user behavior feature embedded vector according to the preprocessed user behavior data, and generating an advertisement feature embedded vector according to the preprocessed clicked advertisement data; wherein the user behavior feature embedding vector and the advertisement feature embedding vector are marked as feature embedding vectors

。

3. The method for click-through rate prediction with multi-order feature optimization and mixed knowledge distillation as claimed in claim 2, wherein in the second step, the user behavior feature embedding vector and the advertisement feature embedding vector are inputted into a SENET, and then the feature optimization based on channel attention is performed to generate the first-order features, comprising the following steps:

embedding the feature into a vector by an averaging pooling operation using a SENET network

Compressing to calculate a statistical vector;

designing two full-connection layers based on the statistical vector to calculate attention weight;

embedding vectors for the features according to the attention weights

Weighting to generate the first order features;

the first order features are represented as:

wherein the content of the first and second substances,

a first-order feature is represented by,

representing the embedding of vectors to the features

The attention-weighting is performed such that,

the weight of attention is represented as a weight of attention,

a feature-embedded vector is represented that is,

to represent

To middle

The features are embedded into a vector of the feature,

to represent

To middle

The features are embedded into a vector of the feature,

to represent

The attention weight of (a) is given,

represent

Attention weight of，

First order features

The value of the characteristic is used as the characteristic value,

first order features

A characteristic value;

a function representing the calculation of the attention weight,

a first activation function representing a fully connected layer,

a second activation function representing a fully connected layer,

a first parameter representing a fully connected layer,

a second parameter representing a fully connected layer,

a statistical vector is represented that represents the statistical vector,

，

represents the calculated second

a function representing the value of the calculated statistical information,

representing feature embedding vectors

The dimension (c) of (a) is,

representation is calculated from dimension 1 to

。

4. The method for predicting click rate of multi-order feature optimization and mixed-type knowledge distillation as claimed in claim 3, wherein in the second step, in the step of constructing a domain feature interaction network and performing feature optimization based on domain-symmetric matrix embedding on the obtained first-order features, the following formula is applied:

represents the output of the domain feature interaction network,

represents one

The symmetric matrix is a matrix of a plurality of,

representation domain feature interactive network learnable

The weighting parameters of the individual feature embedding vectors,

the number of features is represented by a number of features,

is shown as

The value of the individual feature embedding vector is,

is shown as

The value of the individual feature embedding vector is,

is shown as

The domain characteristics of the individual fields are,

is shown as

The domain characteristics of the individual fields are,

first order features

And (4) the characteristic value.

5. The method as claimed in claim 4, wherein the second order features are formulated as:

which represents the second-order characteristics of the image,

it is shown that the splicing operation is performed,

represents the output result of the first-order feature input into the domain feature interaction network,

to show the spliced second

The number of the feature vectors of each interaction,

6. A click-through rate prediction system for multi-order feature optimization and mixed knowledge distillation, wherein the system applies the click-through rate prediction method for multi-order feature optimization and mixed knowledge distillation as claimed in any one of claims 1 to 5, and the system comprises:

a data pre-processing module to:

a model training module to:

the click rate prediction module is used for predicting click rate;

an advertisement recommendation module for;