CN113822689A

CN113822689A - Advertisement conversion rate estimation method and device, storage medium and electronic equipment

Info

Publication number: CN113822689A
Application number: CN202010626285.5A
Authority: CN
Inventors: 苏毓敏; 张波; 秦筱桦
Original assignee: Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2020-07-01
Filing date: 2020-07-01
Publication date: 2021-12-21

Abstract

The embodiment of the invention relates to an advertisement conversion rate estimation method and device, a storage medium and electronic equipment, which relate to the technical field of big data processing, and the method comprises the following steps: processing historical behavior data of a user and advertisement data of candidate advertisements to obtain input sequence characteristics of the user and characteristic vectors of the candidate advertisements; mining the click interest of the user according to the input sequence characteristics to obtain the global interest characteristics of the user; processing the feature vectors of the candidate advertisements to obtain deep feature vectors of the candidate advertisements, and establishing connection between the deep feature vectors and global interest features to obtain fusion feature vectors corresponding to the candidate advertisements; and predicting the conversion rate of the user to the candidate advertisement according to the fusion feature vector and the deep feature vector. The embodiment of the invention improves the accuracy of the estimation result of the candidate advertisement.

Description

Advertisement conversion rate estimation method and device, storage medium and electronic equipment

Technical Field

The embodiment of the invention relates to the technical field of deep learning, in particular to an advertisement conversion rate estimation method, an advertisement conversion rate estimation device, a computer-readable storage medium and electronic equipment.

Background

Early approaches to advertising conversion estimation, such as feature intersection algorithms, utilized the intersection characteristics of users and goods to characterize the interests of users. In this algorithm, the memory capacity brought by feature crossing is very efficient and interpretable, while the generalization capacity requires more artificial feature engineering. With the rise of deep learning, the deep advertisement conversion rate estimation model expands the early method by increasing the depth of the network, thereby strengthening the expression capability of the model. In these works, the user's historical behavior translates into embedded features at low latitudes, and the user's historical behavior is not well utilized.

In order to solve the problems, part of the schemes utilize a shallow model for advertisement conversion rate prediction. E.g., based on logistic regression models, based on decision trees, adding gradient enhancement, introducing graphical models, etc.

However, the above method has the following drawbacks: when highly nonlinear user behavior data is obtained, the deep interest of the user cannot be extracted, and the accuracy of the estimation result of the advertisement conversion rate is low.

Therefore, it is desirable to provide a new advertisement conversion rate estimation method and device.

It is to be noted that the information invented in the above background section is only for enhancing the understanding of the background of the present invention, and therefore, may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The invention aims to provide an advertisement conversion rate estimation method, an advertisement conversion rate estimation device, a computer readable storage medium and an electronic device, so as to overcome the problem of low accuracy of an estimation result of the advertisement conversion rate caused by the limitations and defects of the related technology at least to a certain extent.

According to an aspect of the present disclosure, there is provided an advertisement conversion rate estimation method, including:

processing historical behavior data of a user and advertisement data of candidate advertisements to obtain input sequence characteristics of the user and characteristic vectors of the candidate advertisements;

mining the click interest of the user according to the input sequence characteristics to obtain the global interest characteristics of the user;

processing the feature vectors of the candidate advertisements to obtain deep feature vectors of the candidate advertisements, and establishing connection between the deep feature vectors and global interest features to obtain fusion feature vectors corresponding to the candidate advertisements;

and predicting the conversion rate of the user to the candidate advertisement according to the fusion feature vector and the deep feature vector.

In an exemplary embodiment of the present disclosure, the historical behavior data includes identification class data and picture class data;

the processing of the historical behavior data of the user to obtain the input sequence characteristics of the user comprises the following steps:

establishing an index according to a name space of identification data in the historical behavior data of the user to obtain a coded value of the identification data, and performing hash processing and vectorization processing on the coded value to obtain a low-dimensional dense vector;

extracting image area features in picture data in the historical behavior data of the user, and generating a key image information vector according to the image area features;

and merging the low-dimensional dense vector and the key image information vector to obtain the input sequence characteristics of the user.

In an exemplary embodiment of the present disclosure, extracting image region features in the picture-like data in the historical behavior data of the user includes:

extracting the image region characteristics from picture data in the historical behavior data of the user based on a preset picture visual attraction model;

the preset picture visual attraction model comprises a convolutional neural network, a cyclic neural network and a deep neural network, wherein the convolutional neural network is used for acquiring key visual signals attractive to the user from the picture data, and the cyclic neural network and the deep neural network are used for acquiring interest points of the user from the historical behavior data.

In an exemplary embodiment of the present disclosure, mining the click interest of the user according to the input sequence feature, and obtaining the global interest feature of the user includes:

normalizing each sub-feature in the input sequence features, and performing nonlinear mapping on each normalized sub-feature to obtain a plurality of candidate interest points; wherein each sub-feature represents a click behavior of the user;

mining the click interest of the user according to each candidate interest point and each normalized sub-feature to obtain an interest prediction result of the user;

classifying the interest prediction result to obtain a plurality of classification results, and performing normalization processing on each classification result to obtain the global interest feature.

In an exemplary embodiment of the present disclosure, processing the feature vector of the candidate advertisement to obtain a deep feature vector of the candidate advertisement includes:

and carrying out nonlinear mapping processing on the feature vectors of the candidate advertisements to obtain deep feature vectors of the candidate advertisements.

In an exemplary embodiment of the present disclosure, establishing a connection between the deep feature vector and a global interest feature to obtain a fused feature vector corresponding to the candidate advertisement includes:

calculating the weight of the candidate advertisement in the historical behavior data according to the deep feature vector and the global interest feature;

and carrying out weighted summation on the weight and the global interest characteristics to obtain a fusion characteristic vector corresponding to the candidate advertisement.

In an exemplary embodiment of the disclosure, predicting the conversion rate of the user for the candidate advertisement according to the fused feature vector and the deep feature vector comprises:

processing the fusion feature vector and the deep layer feature vector according to elements to obtain a hidden output vector;

and carrying out normalization processing on the hidden output vector to obtain the conversion rate of the user to the candidate advertisement.

According to an aspect of the present disclosure, there is provided an advertisement conversion rate estimation apparatus including:

the first processing module is used for processing historical behavior data of a user and advertisement data of candidate advertisements to obtain input sequence characteristics of the user and characteristic vectors of the candidate advertisements;

the global interest characteristic prediction module is used for mining the click interest of the user according to the input sequence characteristics to obtain global interest characteristics of the user;

the second processing module is used for processing the feature vectors of the candidate advertisements to obtain deep feature vectors of the candidate advertisements, and establishing connection between the deep feature vectors and global interest features to obtain fusion feature vectors corresponding to the candidate advertisements;

and the conversion rate pre-estimating module is used for pre-estimating the conversion rate of the user to the candidate advertisement according to the fusion characteristic vector and the deep characteristic vector.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the advertisement conversion ratio estimation method as described in any one of the above.

According to an aspect of the present disclosure, there is provided an electronic device including:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to execute any of the advertisement conversion rate estimation methods described above via execution of the executable instructions.

On one hand, historical behavior data of a user and advertisement data of candidate advertisements are processed to obtain input sequence characteristics of the user and characteristic vectors of the candidate advertisements; predicting the click interest of the user according to the input sequence characteristics to obtain the global interest characteristics of the user; then processing the feature vectors of the candidate advertisements to obtain deep feature vectors of the candidate advertisements, and establishing connection between the deep feature vectors and global interest features to obtain fusion feature vectors corresponding to the candidate advertisements; finally, according to the fusion characteristic vector and the deep characteristic vector, the conversion rate of the user to the candidate advertisement is estimated; the method solves the problem that the depth interest of the user cannot be extracted when highly nonlinear user behavior data are obtained in the prior art, so that the accuracy of the estimation result of the advertisement conversion rate is low; on the other hand, global interest characteristics of the user are obtained by predicting click interests for the user according to the input sequence characteristics, then fusion characteristic vectors are obtained according to the deep characteristic vectors of the candidate advertisements and the global interest characteristics, and finally the conversion rate of the user to the candidate advertisements is estimated according to the fusion characteristic vectors and the deep characteristic vectors, so that the relation between the candidate advertisements and the click interests of the user is strengthened, and the accuracy of the estimation result of the advertisement conversion rate is further improved; on the other hand, after the conversion rate for each candidate advertisement is obtained, the corresponding candidate advertisement can be recommended to the user based on the conversion rate, so that the conversion rate of the candidate advertisement can be improved, and the user experience can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

FIG. 1 schematically illustrates a flow chart of a method of advertisement conversion rate estimation, according to an exemplary embodiment of the present invention.

Fig. 2 schematically illustrates a structural example diagram of a picture visual appeal model according to an example embodiment of the present invention.

FIG. 3 is a diagram schematically illustrating an example of the structure of an advertisement conversion ratio estimation model according to an exemplary embodiment of the present invention.

Fig. 4 schematically illustrates an exemplary diagram of a gated loop unit according to an exemplary embodiment of the invention.

Fig. 5 schematically shows a flowchart of a method for processing historical behavior data of a user to obtain an input sequence feature of the user according to an exemplary embodiment of the present invention.

Fig. 6 is a flowchart schematically illustrating a method for mining click interests of the user according to the input sequence features to obtain global interest features of the user, according to an exemplary embodiment of the present invention.

Fig. 7 schematically shows a block diagram of an advertisement conversion ratio estimation apparatus according to an exemplary embodiment of the present invention.

Fig. 8 schematically illustrates an electronic device for implementing the advertisement conversion rate estimation method according to an exemplary embodiment of the present invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the invention.

Furthermore, the drawings are merely schematic illustrations of the invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

In the billion dollar online presentation advertising industry, intelligent bidding (e.g., targeting Conversion costs and enhancing click-through unit price, etc.) has been an industry-recognized mode of advertising placement, and its bidding mode makes the position of Conversion Rate estimation (CVR) more and more important. Conversion estimates refer to estimating the probability that a user will place an order (or other desired action) for an advertisement given the advertisement and the user.

Most of the existing research focuses on the estimation of click-through rate (CTR) of advertisements. With the widespread application of deep learning techniques, there are studies to learn representations of user interests from user historical behaviors using Deep Interest Networks (DIN). Most of these jobs require model training using various high-dimensional and extremely sparse ID (Identity) characteristics, such as user ID and advertisement ID, and thus have a great demand for data volume. In the context of CTR prediction, such data requirements are easily met because in practical application scenarios, the exposure and click volumes are very large. However, the situation for CVR prediction is quite different.

Specifically, the number of positive samples (transformed samples) for CVR prediction is naturally much smaller than the number of positive samples (clicked samples) for CTR prediction. Worse yet, even such scarce conversion data cannot be directly collected by a Demand Side Platform (DSP) without cooperation of advertisers. Therefore, how to model complex nonlinear and sparse user behavior data by using powerful deep learning technology while avoiding potential over-fitting problem is a great challenge for CVR prediction.

Early methods of CVR estimation, such as FM (factor Machine), utilized the cross-features of the user and the merchandise to characterize the user's interest. The Memorization (memory) brought about by feature crossing is very efficient and interpretable, whereas the Generalization (Generalization) requires more artificial feature engineering. With the rise of deep learning, the deep CVR estimation model expands the early method by increasing the depth of the network, thereby strengthening the expression capability of the model. In these works, the user's historical behavior translates into embedded features at low latitudes, and the user's historical behavior is not well utilized.

The existing work is mainly to use a shallow model to perform CVR prediction. E.g., based on logistic regression models, based on decision trees, adding gradient enhancement, introducing graphical models, etc. However, these shallow approaches may have inherent drawbacks in acquiring highly non-linear user behavior data, which has prompted the introduction of a deep learning framework. Recently, the ESMM (complete Space Multi-Task Model) uses a depth framework to extract the translation signal directly from the exposure, but the extremely sparse translation is still easily swamped by the click signal.

In the present exemplary embodiment, first, an advertisement conversion rate estimation method is provided, which may be operated in a server, a server cluster or a cloud server; of course, those skilled in the art may also operate the method of the present invention on other platforms as needed, and this is not particularly limited in this exemplary embodiment. Referring to fig. 1, the advertisement conversion rate estimation method may include the following steps:

s110, processing historical behavior data of a user and advertisement data of candidate advertisements to obtain input sequence characteristics of the user and characteristic vectors of the candidate advertisements;

s120, mining the click interest of the user according to the input sequence characteristics to obtain the global interest characteristics of the user;

s130, processing the feature vectors of the candidate advertisements to obtain deep feature vectors of the candidate advertisements, and establishing connection between the deep feature vectors and global interest features to obtain fusion feature vectors corresponding to the candidate advertisements;

and S140, predicting the conversion rate of the user to the candidate advertisement according to the fusion feature vector and the deep feature vector.

In the method for estimating the advertisement conversion rate, on one hand, the input sequence characteristics of the user and the characteristic vectors of the candidate advertisements are obtained by processing the historical behavior data of the user and the advertisement data of the candidate advertisements; predicting the click interest of the user according to the input sequence characteristics to obtain the global interest characteristics of the user; then processing the feature vectors of the candidate advertisements to obtain deep feature vectors of the candidate advertisements, and establishing connection between the deep feature vectors and global interest features to obtain fusion feature vectors corresponding to the candidate advertisements; finally, according to the fusion characteristic vector and the deep characteristic vector, the conversion rate of the user to the candidate advertisement is estimated; the method solves the problem that the depth interest of the user cannot be extracted when highly nonlinear user behavior data are obtained in the prior art, so that the accuracy of the estimation result of the advertisement conversion rate is low; on the other hand, global interest characteristics of the user are obtained by predicting click interests for the user according to the input sequence characteristics, then fusion characteristic vectors are obtained according to the deep characteristic vectors of the candidate advertisements and the global interest characteristics, and finally the conversion rate of the user to the candidate advertisements is estimated according to the fusion characteristic vectors and the deep characteristic vectors, so that the relation between the candidate advertisements and the click interests of the user is strengthened, and the accuracy of the estimation result of the advertisement conversion rate is further improved; on the other hand, after the conversion rate for each candidate advertisement is obtained, the corresponding candidate advertisement can be recommended to the user based on the conversion rate, so that the conversion rate of the candidate advertisement can be improved, and the user experience can be improved.

Hereinafter, the steps involved in the advertisement conversion rate estimation method according to the exemplary embodiment of the present invention will be explained and explained in detail with reference to the drawings.

First, terms referred to in example embodiments of the present invention are explained.

Generalization ability (generalization ability) refers to the ability of a machine learning algorithm to adapt to a new sample. The purpose of learning is to learn the rules hidden behind the data, and the trained network can also give appropriate output to the data except the learning set with the same rules. The generalization ability of the model is usually evaluated through the error of a test set.

The softmax function is a very common and important function, and is widely used in a multi-classification scene. It maps some inputs to real numbers between 0-1 and the normalized guaranteed sum is 1, which can be expressed as the following equation (1):

wherein e isⁱFor the ith prediction result, Σ_je^jFor the sum of all predicted results, S_iIs the probability of the ith prediction.

AUC, Area under the ROC Curve, wherein a ROC Curve (Receiver Operating Characteristic Curve) can reflect the classification effect of the classifier to a certain extent, but is not intuitive enough, so that AUC is obtained. The AUC is actually the area under the ROC curve, and intuitively reflects the classification capability expressed by the ROC curve.

Vectorizing the characteristics: the idea is to map each feature into a low-dimensional dense real-valued vector through training. All these vectors form a vector space, and common statistical methods can be used to measure the similarity between features.

Telepath model (picture visual appeal model): the model integrates three neural networks of a convolutional neural network, a cyclic neural network and a deep neural network, wherein the convolutional neural network can be used for simulating the vision of a user and acquiring key visual signals of commodity picture attractiveness, and the cyclic neural network and the deep neural network can be used for acquiring interest information of the user based on browsing records of the user. Specifically, referring to fig. 2, the Telepath model may include: an extraction model 210 of picture visual appeal features, a user interest capture model 220, and a score prediction module 230. The extraction model of the picture visual attraction characteristics comprises a deep neural network and a convolutional neural network, and the user interest capturing model comprises a cyclic neural network and the deep neural network.

Next, the purpose of the exemplary embodiments of the present invention will be explained and explained. Specifically, the objects of the exemplary embodiments of the present invention may include two: firstly, in order to better extract the personalized product purchasing interest hidden in user behaviors, a new inner/self-attention (internal self-attentiveness mechanism) -based conversion model is proposed, and the fine-grained personalized product purchasing interest is obtained from continuous click data. Specifically, self-attentions (self-attentions mechanism) can be adopted to capture global/advanced conversion interest patterns of all users interacting with the advertisements, so that conversion items hidden in click history can be found out, relations among different conversion items can be captured, and then inner-attentions (inner-attentions mechanism) are utilized to select the most important click information related to the candidate advertisement items. Secondly, in order to solve the problem of data sparseness, click interests are extracted from the commodity pictures through a pre-trained picture visual attraction model (Telepath) for picture class characteristics, and dense image vectorization representation is generated. This dense vector is then used in place of the sparse ad ID features to help alleviate data sparsity issues. For ID class characteristics, methods such as index mapping and dense vector mapping are established through a name space to relieve the data sparsity problem.

Further, the advertisement conversion rate estimation model involved in the exemplary embodiment of the present invention is explained and illustrated. Specifically, referring to fig. 3, the advertisement conversion rate prediction model may include an input layer 310, a plurality of GRUs (Gated current units) 320, an internal self-attention mechanism module 330, a first full-link layer 340, and a second full-link layer 350. The internal self-attention mechanism module 330 may include a self-attention mechanism module and an internal attention mechanism module, and includes: the input layer is connected with the plurality of gate control circulation units and then connected with the internal self-attention mechanism module, the first full-connection layer is connected with the internal self-attention mechanism module, and the second full-connection layer is connected with the internal self-attention mechanism module and the first full-connection layer.

Fig. 4 is a diagram illustrating an exemplary structure of a gated loop unit. Specifically, referring to fig. 4, in the gated cyclic unit, the internal structure thereof is a cyclic neural network, and the following formula (2) may be specifically referred to:

wherein x is_lFeature vectors, h, for historical behavior of the user_l-1Output hidden vector for last cycle network, h_lFeature vector, W, output for the present cycle network_z、W_rAnd W is a model parameter matrix, z_lAnd r_lAn intermediate result; σ is sigmoid function and tanh is tangent function.

It should be added that, in the advertisement conversion rate estimation model, the input layer 310 may be configured to receive input sequence characteristics;

the plurality of gate control loop units 320 may be configured to mine the click interest of the user according to the input sequence feature to obtain a global interest feature of the user;

the first fully-connected layer 340 may be configured to process the feature vector of the candidate advertisement to obtain a deep feature vector of the candidate advertisement;

the internal self-attention mechanism module 330 may be configured to establish a connection between the deep feature vector and a global interest feature to obtain a fused feature vector corresponding to the candidate advertisement;

the second fully-connected layer 350 may be configured to predict a conversion rate of the user for the candidate advertisement according to the fused feature vector and the deep feature vector.

Hereinafter, the steps S110 to S140 will be explained and explained with reference to the advertisement conversion rate estimation model.

In step S110, processing historical behavior data of a user and advertisement data of a candidate advertisement to obtain an input sequence feature of the user and a feature vector of the candidate advertisement;

in this example embodiment, the historical behavior data of the user may be continuous click data of the user in a certain period of time, and the click data may include, for example, purchase data, browsing data, purchase data, favorite data, comment data, and the like, which is not limited in this example; the historical behavior data can be divided into two main types of data, one type is identification type data, and the identification type data can comprise three-level category information (cid3) of the advertisement, shop number information (shop-id) and the like; in another aspect, the picture data is picture information included in the advertisement.

Furthermore, in order to obtain the input sequence characteristics of the user, the historical behavior data of the user needs to be processed. Referring to fig. 5, processing the historical behavior data of the user to obtain the input sequence characteristics of the user may include steps S510 to S530. Wherein:

in step S510, an index is established according to a namespace of identification data in the historical behavior data of the user to obtain a coded value of the identification data, and hash processing and vectorization processing are performed on the coded value to obtain a low-dimensional dense vector;

in step S520, extracting image region features in the image data in the historical behavior data of the user, and generating a key image information vector according to the image region features;

wherein extracting image region features in the picture data in the historical behavior data of the user comprises: extracting the image region characteristics from picture data in the historical behavior data of the user based on a preset picture visual attraction model; the preset picture visual attraction model comprises a convolutional neural network, a cyclic neural network and a deep neural network, wherein the convolutional neural network is used for acquiring key visual signals attractive to the user from the picture data, and the cyclic neural network and the deep neural network are used for acquiring interest points of the user from the historical behavior data.

In step S530, the low-dimensional dense vector and the key image information vector are merged to obtain the input sequence feature of the user.

Hereinafter, steps S510 to S530 are explained and explained. Specifically, for ID data, such as the third-level category information (cid3) of the advertisement and shop number information (shop-ID), an index may be established through a corresponding namespace to obtain a corresponding code value, and then the code value is hashed, and the hashed code value is vectorized to generate a low-dimensional dense vector. Meanwhile, for the advertisement picture data, a pre-trained picture visual attraction model can be adopted to extract click interest from the advertisement picture data, and then the generated dense key image information is represented in a vectorization mode; finally, the two types of characteristics are combined to obtain a combined characteristic vector representation (input sequence characteristics), namely the input X of the bottommost layer in the CRV estimation system architecture diagram_iCan take the user's history L click behaviorObtaining input sequence features [ X ]₁,X₂,...,X_L]。

By the method, the sparse vectors can be converted into dense vectors, part of user interest is extracted from a large amount of sparse data in advance, more important feature representation information is obtained, and the problem of data sparsity of a model is relieved.

It should be noted that, since the data category included in the candidate advertisement is the same as the data category included in the historical behavior data, the processing method is also the same. Therefore, the candidate advertisement to be predicted is processed in the same way to obtain the feature vector X of the candidate advertisement_CHere, the detailed processing steps are not described in detail.

In step S120, the click interest of the user is mined according to the input sequence feature, so as to obtain a global interest feature of the user.

In this exemplary embodiment, referring to fig. 6, mining the click interest of the user according to the input sequence feature to obtain the global interest feature of the user may include steps S610 to S630. Wherein:

in step S610, each sub-feature in the input sequence features is normalized (σ function), and each normalized sub-feature is nonlinearly mapped (tanh function) to obtain a plurality of candidate interest points

Wherein each sub-feature represents a click behavior of the user;

in step S620, mining click interests of the user according to each candidate interest point and each normalized sub-feature, to obtain an interest prediction result of the user;

in step S630, the interest prediction result is classified to obtain a plurality of classification results, and each classification result is normalized to obtain the global interest feature.

Hereinafter, steps S610 to S630 will be explained and explained.

Firstly, each sub-feature in the input sequence features is respectively input into a gating cycle unit, and the following steps are provided: each sub-feature corresponds to a gating cycle unit; wherein, each sub-feature in the input sequence features can be normalized through a sigma function to obtain an intermediate variable z_lAnd r_lThe specific formula may correspond to the first two formulas in formula (2); then, each sub-feature after normalization processing is subjected to nonlinear mapping through tanh function to obtain a plurality of candidate interest points

And may specifically correspond to the third formula in formula (2). Secondly, predicting the click interest of the user according to the candidate interest points and the sub-features after normalization processing to obtain an interest prediction result h of the user_lSpecifically, the equation may correspond to the last equation in equation (2). As can be seen, the input of the loop gating loop unit layer of the step I is a certain click behavior x of the user_lOutput implicit vector h from last cycle_l-1After gating the cycle cell, one can obtain a value from [ X₁,X₂,...,X_L]The user's interest prediction result h_l。

Secondly, since the use of only the GRU layer enhances the short-term memory of the model, the user interest exhibited by the long-term behavior of the user is weakened, and in addition, the error in the cycle may be increased due to the accumulation. An attention layer (attention layer) needs to be added on top of the GRU layer to solve this problem. Continuing with the internal self-attention mechanism module 330 in FIG. 3, it can be divided into two parts, self-attention mechanism module and inner-attention mechanism module, wherein for self-attention, a global/advanced conversion interest pattern can be used to capture all users' interactions with all advertisements, avoiding the problem of rapid decay of long-term interest. It helps to find the conversion terms hidden in click history behavior and captures the relationship between different conversion terms. The method comprises the following steps:

first, preprocessing (classifying) a hidden vector (an interest prediction result) output by each loop step of the GRU layer to obtain a feature vector representation (a classification result), which can be specifically represented by the following formula (3):

wherein, F_kK ∈ { Q, K, P } is a different fully connected layer, which can be specifically shown in the following formula (4):

h_l+1＝f(W_lh_l+b_l) (ii) a Formula (4)

Wherein, W_lAnd b_lFor the parameter matrix, the activation function selects ReLU.

Then, a hidden vector representation of self-attribute is obtained, which can be specifically shown in the following formula (5):

wherein, T marks are transposed; further, by h_Q、h_KAnd h_PThe interaction of the user and the normalization of softmax, and the hidden feature vector h of the global/high-level interest expressed by the user on the historical advertisement clicking behaviors is obtained_S(global interest feature).

In step S130, processing the feature vector of the candidate advertisement to obtain a deep feature vector of the candidate advertisement, and establishing a connection between the deep feature vector and a global interest feature to obtain a fused feature vector corresponding to the candidate advertisement;

in this exemplary embodiment, first, a nonlinear mapping process is performed on the feature vector of the candidate advertisement to obtain a deep feature vector of the candidate advertisement; secondly, calculating the weight of the candidate advertisement in the historical behavior data according to the deep feature vector and the global interest feature; and finally, carrying out weighted summation on the weight and the global interest characteristics to obtain a fusion characteristic vector corresponding to the candidate advertisement.

Specifically, first, continuing with FIG. 3, in the first fully-connected layer 340 portion, multi-layer fully-connected is used to extract the feature vector X of the candidate advertisement to be predicted_CAnd representing the characteristics of the middle and deeper layers to obtain a deep characteristic vector of the candidate advertisement. Wherein, the full connection formula of each layer is as follows:

h_C＝f(W_CX_C+b_C)；

wherein, W_CAnd b_CSelecting ReLU for the parameter matrix and activating function, and obtaining deep characteristic vector representation h of candidate advertisement commodity to be predicted through multi-layer full-connection transformation_C。

Secondly, after obtaining the global interest feature representation of the user historical click behavior, a connection needs to be established with the candidate advertisement to be predicted, and an inner-attribute part is used. The section selects the most important click information related to the candidate advertisement by using an inner attribute mechanism, that is, the weight of the candidate advertisement in the historical behavior data needs to be calculated, which can be specifically calculated as shown in the following formula (6) and formula (7).

α(1,2,...,L)＝softmax(S_I ^l) (ii) a Formula (7)

Wherein h is_CIs a deep feature vector of a candidate advertisement, h_SFor the output vector (global interest feature) of self-attribute part, v, W_CAnd W_SIs a parameter matrix, T represents transposition;

representing an intermediate variable; α (1, 2...., L) represents the weight that the candidate advertisement takes in the historical behavior data.

Finally, the process is carried out in a batch,carrying out weighted summation on the weight and the global interest characteristics to obtain a fusion characteristic vector u corresponding to the candidate advertisement_aSpecifically, the following formula (8) may be used.

In step S140, a conversion rate of the user to the candidate advertisement is estimated according to the fused feature vector and the deep feature vector.

In this exemplary embodiment, first, the fused feature vector and the deep feature vector are processed by elements to obtain a hidden output vector; secondly, normalization processing is carried out on the hidden output vector to obtain the conversion rate of the user to the candidate advertisement.

In particular, with continued reference to FIG. 3, as illustrated in the second fully-connected layer 350 portion of FIG. 3, a fused feature vector u is obtained_aThereafter, in order to further enhance the characteristics h of the candidate advertisement itself_cThe relation between the features can be processed by element-wise to obtain a hidden output vector h_a. Specifically, the following formula (9) can be used:

h_a＝F(concat(h_c,u_a,h_c⊙u_a,h_c-u_a) ); formula (9)

Finally, the conversion rate Pr (C1 | X) of the candidate advertisement is predicted through a full connection layer with the activation function of softmax_i,H_i). Specifically, the following formula (10) can be used:

Pr(C＝1|X_i,H_i)＝σ(h_α) (ii) a Formula (10)

I.e. known candidate advertisement X to be predicted_iAnd historical click behavior H of user_iThe probability that the candidate advertisement will eventually be converted. The aim of predicting the conversion rate is fulfilled.

In particular, on the data set of the real e-commerce, the effect of the model is also required to be evaluated. The experiments are divided into 3 groups, DIN (Deep interest network) and Wide & Deep (Wide & Deep model) are used as control groups, GSIA (GRU-self-inner-attention, internal self-attention model) proposed by the present invention is used as an experimental group, and the results can be shown as the following 1:

TABLE 1

As can be seen from Table 1 above, the GSIA model improved 0.0184 over DIN and 0.0185 over Wide & Deep model in the offline AUC, which was a greater improvement. The difference value of the GSIA is the largest in the difference of the predicted mean values of the positive and negative samples, which shows that the distinction of the positive and negative samples is the best.

In the embodiment of the invention, by using self-attention, inner-attention, GRU and a deep network structure, the capability of capturing long-term interest of a user and generalization capability are better, the influence of data sparsity is reduced, the accuracy of advertisement conversion rate estimation is improved, and better user experience and advertisement income are brought.

The embodiment of the invention also provides an advertisement conversion rate pre-estimation device. Referring to fig. 7, the advertisement conversion rate prediction apparatus may include a first processing module 710, a global interest feature prediction module 720, a second processing module 730, and a conversion rate prediction module 740. Wherein:

the first processing module 710 may be configured to process historical behavior data of a user and advertisement data of a candidate advertisement to obtain an input sequence feature of the user and a feature vector of the candidate advertisement;

the global interest feature prediction module 720 may be configured to mine the click interest of the user according to the input sequence feature to obtain a global interest feature of the user;

the second processing module 730 may be configured to process the feature vector of the candidate advertisement to obtain a deep feature vector of the candidate advertisement, and establish a connection between the deep feature vector and the global interest feature to obtain a fused feature vector corresponding to the candidate advertisement;

the conversion rate estimation module 740 may be configured to estimate a conversion rate of the user for the candidate advertisement according to the fused feature vector and the deep feature vector.

The specific details of each module in the advertisement conversion rate estimation device have been described in detail in the corresponding advertisement conversion rate estimation method, and therefore are not described herein again.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods of the present invention are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

In an exemplary embodiment of the present invention, there is also provided an electronic device capable of implementing the above method.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 800 according to this embodiment of the invention is described below with reference to fig. 8. The electronic device 800 shown in fig. 8 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present invention.

As shown in fig. 8, electronic device 800 is in the form of a general purpose computing device. The components of the electronic device 800 may include, but are not limited to: the at least one processing unit 810, the at least one memory unit 820, a bus 830 connecting various system components (including the memory unit 820 and the processing unit 810), and a display unit 840.

Wherein the storage unit stores program code that is executable by the processing unit 810 to cause the processing unit 810 to perform steps according to various exemplary embodiments of the present invention as described in the above section "exemplary methods" of the present specification. For example, the processing unit 810 may perform step S110 as shown in fig. 1: processing historical behavior data of a user and advertisement data of candidate advertisements to obtain input sequence characteristics of the user and characteristic vectors of the candidate advertisements; step S120: mining the click interest of the user according to the input sequence characteristics to obtain the global interest characteristics of the user; step S130: processing the feature vectors of the candidate advertisements to obtain deep feature vectors of the candidate advertisements, and establishing connection between the deep feature vectors and global interest features to obtain fusion feature vectors corresponding to the candidate advertisements; step S140: and predicting the conversion rate of the user to the candidate advertisement according to the fusion feature vector and the deep feature vector.

The storage unit 820 may include readable media in the form of volatile memory units such as a random access memory unit (RAM)8201 and/or a cache memory unit 8202, and may further include a read only memory unit (ROM) 8203.

The storage unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 830 may be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 800 may also communicate with one or more external devices 900 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 800, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 800 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 850. Also, the electronic device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 860. As shown, the network adapter 860 communicates with the other modules of the electronic device 800 via the bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiment of the present invention.

In an exemplary embodiment of the present invention, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.

According to the program product for realizing the method, the portable compact disc read only memory (CD-ROM) can be adopted, the program code is included, and the program product can be operated on terminal equipment, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

Claims

1. An advertisement conversion rate estimation method is characterized by comprising the following steps:

2. The advertisement conversion rate estimation method according to claim 1, wherein the historical behavior data includes identification class data and picture class data;

3. The advertisement conversion rate estimation method according to claim 2, wherein extracting image region features in the picture-like data in the historical behavior data of the user comprises:

4. The advertisement conversion rate estimation method of claim 1, wherein mining click interests of the user according to the input sequence features to obtain global interest features of the user comprises:

5. The method of claim 1, wherein the processing the feature vector of the candidate advertisement to obtain the deep feature vector of the candidate advertisement comprises:

6. The method of claim 1, wherein the establishing a connection between the deep feature vector and a global interest feature to obtain a fused feature vector corresponding to the candidate advertisement comprises:

7. The method of claim 1, wherein predicting the conversion rate of the user for the candidate advertisement according to the fused feature vector and the deep feature vector comprises:

8. An advertisement conversion rate estimation device, comprising:

9. A computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the advertisement conversion ratio estimation method of any one of claims 1-7.

10. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the advertisement conversion rate estimation method of any of claims 1-7 via execution of the executable instructions.