CN112819523B - Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network - Google Patents

Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network Download PDF

Info

Publication number
CN112819523B
CN112819523B CN202110125002.3A CN202110125002A CN112819523B CN 112819523 B CN112819523 B CN 112819523B CN 202110125002 A CN202110125002 A CN 202110125002A CN 112819523 B CN112819523 B CN 112819523B
Authority
CN
China
Prior art keywords
user
model
neural network
bayesian neural
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110125002.3A
Other languages
Chinese (zh)
Other versions
CN112819523A (en
Inventor
项亮
方同星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Shuming Artificial Intelligence Technology Co ltd
Original Assignee
Shanghai Shuming Artificial Intelligence Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Shuming Artificial Intelligence Technology Co ltd filed Critical Shanghai Shuming Artificial Intelligence Technology Co ltd
Priority to CN202110125002.3A priority Critical patent/CN112819523B/en
Publication of CN112819523A publication Critical patent/CN112819523A/en
Application granted granted Critical
Publication of CN112819523B publication Critical patent/CN112819523B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • G06Q30/0244Optimization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Biomedical Technology (AREA)
  • Computational Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Mathematical Analysis (AREA)

Abstract

A marketing prediction method combining inner/outer product feature interaction and Bayesian neural network comprises a data preprocessing step, a data set dividing step, a model building step and a prediction step of clicking on a marketing activity; in the establishment process of the prediction model, the uncertainty of prediction is introduced into the Bayesian neural network by effectively utilizing Bayesian inference, so that the Bayesian neural network model has stronger robustness. And the characteristics are crossed by adopting an inner/outer product combination method to extract high-dimensional implicit characteristics. Therefore, the invention can effectively expand the application of deep learning to the algorithm problem of the computing advertisement and recommendation system, and remarkably improve the accuracy of the click behavior prediction of the user.

Description

Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network
Technical Field
The invention relates to the technical field of artificial intelligence in Internet marketing, in particular to a marketing prediction method combining inner/outer product feature interaction and a Bayesian neural network.
Background
Network advertising marketing is a process of maximizing the transmission of network marketing to audience members by delivering advertisements to targeted clients via a network platform. Among the algorithms for computing advertising and recommendation systems, the algorithms commonly used include linear models such as logistic regression (logistic regression, LR), factorizers (factorization machine, FM), etc.
These algorithms described above have the characteristics of good interpretability and simple algorithm implementation, however, because the algorithms themselves are simpler, the expressive power is limited. Therefore, these algorithms tend to be difficult to extract the higher order interaction information between features, thereby affecting the overall performance of the algorithm.
In addition, with the successful application of deep learning algorithms in various fields, such as natural language processing (Natural Language Processing, NLP), computer Vision (CV), etc., deep learning models are also gradually being applied to the field of mainstream advertising and recommendation systems.
Although the deep learning model has the advantages of automatic feature extraction, end-to-end learning and other traditional algorithms, the deep learning model has the following obvious disadvantages in the application of the computing advertisement and recommendation system:
(1) in most of recommendation system data sets, a large-dimension sparse matrix, namely a matrix formed by 0 and 1, is formed, and certain difficulty exists for a gradient descent-based deep learning model; meanwhile, a large sparse matrix also causes larger calculation power consumption and overlarge calculation time. Therefore, how to effectively extract the interactive information of the features while reducing the feature dimension puts higher demands on the design of feature engineering and algorithms.
(2) Preventing the over-fitting phenomenon is a very important issue in the deep learning algorithm. In general, methods such as early-stop mechanism, weight decay, L1-L2 regularization, dropout and the like can be adopted to reduce the overfitting risk of the model. However, for the precise positioning and placement problems in advertising marketing, it is often necessary to also consider the measurement of uncertainty in the model, and an overly confident algorithmic decision is often not able to yield good benefits in actual advertising. Therefore, how to add uncertainty measurement in the network architecture, so that the reliability of algorithm decision is higher, and overfitting is effectively prevented, is one of key technologies for applying deep learning to calculate the problem stress breakthrough of advertising and recommendation systems.
(3) Conventional deep learning models accomplish the intersection and combination of features directly through multiple fully connected layers, but such approaches lack some "pertinence".
Firstly, the full connection layer does not intersect different feature domains;
second, the operation of the fully connected layer is not designed directly for feature crossing.
Therefore, we need to develop deep learning models with the ability to characterize different data patterns for specific traffic.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provide a marketing prediction method combining inner/outer product feature interaction and a Bayesian neural network, which comprises a data preprocessing step S1, a data set dividing step S2 and a model building step S3;
the data preprocessing step S1 includes the following steps:
step S11: acquiring original information of a user, and extracting original characteristic information from the original information of the user; the original characteristic information comprises a user ID, a user mobile phone number attribution, a task batch number, DPI accessed by a user on the same day, DPI access frequency of the user, user access time and/or user access duration; the task batch number represents original information of a user in a date time period, DPI access frequency, DPI access time and/or user access duration of the user are/is measured by taking each task batch number as a measurement unit, and the attribution characteristics of the DPI access and the mobile phone number of the user are category characteristics;
step S12: processing the category characteristics; performing One-hot encoding processing on the home feature of the user mobile phone number and the DPI accessed by the user; wherein, the One-hot encoding process comprises:
sequentially expanding all different user access DPIs as independent features according to the task batch number, and expanding DPI access frequency into a relationship feature of DPI and user access DPI frequency according to all different user access DPIs in the task batch number;
step S13: processing the continuous features; the access time and access time length data of different dimensions are mapped to a unified interval, and the data distribution is adjusted to be approximate to Gaussian distribution;
step S14: performing dimension reduction treatment on the high-dimensional characteristics by adopting principal component analysis;
the data set dividing step S2 includes the steps of:
step S21: after preprocessing, the attribution feature and the feature of whether the user accesses the DPI on the same day are regarded as sparse features, and the user access DPI frequency is defined as continuous features;
step S22: forming training set data according to historical data of time sequences 1,2 and … t-1 before a time point t+1 to be predicted; and the data corresponding to the time point t is used as a local verification set;
the model building step S3 includes the steps of:
step S31: providing an initial Bayesian neural network model, taking class features in the training set data as M1-dimensional feature information of an input layer of the Bayesian neural network, inputting the M1-dimensional feature information into an embedded layer of the Bayesian neural network for information extraction and dimension reduction, and reducing the M1-dimensional feature information to M2-dimensional feature information; wherein, M2 is smaller than M1, the Bayesian neural network comprises an input layer, an embedded layer, a product layer, an factorization layer, a full connection layer and an output layer;
step S32: adding the M2-dimensional features after dimension reduction with M3-dimensional continuous features to form M-dimensional features, and performing multiplication operation of inner products and outer products on the M-dimensional features in a product layer to enable feature information of the M-dimensional features to be interacted;
step S33: in the factorization layer, factorizing a weight matrix of the M-dimensional feature by adopting an factorization method;
step S34: inputting the information of the M-dimensional characteristics into the full-connection layer for training to obtain a trained Bayesian neural network model, wherein the Bayesian neural network model is a user prediction model with two output layer neurons; and validating the user prediction model using the local validation set data.
Further, the marketing prediction method combining the inner/outer product feature interaction and the Bayesian neural network further comprises a step S35, wherein model evaluation index processing and tuning processing are performed on the user prediction model.
Further, the model evaluation index includes employing a logarithmic loss function, a relative information gain RIG, and an AUC value.
Further, the model evaluation index is an AUC value, and if the AUC value is smaller than a predetermined threshold, model tuning is performed on the user prediction model.
Further, the model tuning process comprises one or more of the following:
(1) increasing batch normalization, and solving the problem of internal covariate offset of data;
(2) adding a function of enabling part of neurons to be in a dormant state in the training process into a network;
(3) the learning rate is regulated, and the learning rate in the training process is regulated generally through strategies such as exponential decay and the like;
(4) setting a plurality of sub-training averages to better improve the problem of insufficient generalization capability caused by larger data variance;
(5) increasing L1 or L2 regularization, and applying punishment to the loss function to reduce the risk of overfitting;
(6) and optimizing the super parameters.
Furthermore, the optimization method for the super parameters adopts a Bayesian optimization strategy.
Further, the continuous feature is processed using a RankGauss method.
Further, the marketing prediction method combining the inner/outer product feature interaction and the bayesian neural network further comprises the step of performing anomaly detection and processing on the original information of the user after the step S11.
Further, the marketing prediction method combining the inner/outer product feature interaction and the Bayesian neural network further comprises a model prediction step S4, and the task of accurately marketing the screened users at the time point t+1 needing prediction is obtained according to the user prediction model.
Further, the bayesian neural network model hierarchical node distribution morphology includes: incremental, invariant constant, diamond or decremental Decryasing.
According to the technical scheme, the marketing prediction method combining the inner/outer product feature interaction and the Bayesian neural network can effectively utilize Bayesian inference, and introduces prediction uncertainty into the Bayesian neural network, so that the Bayesian neural network model has stronger robustness. The high-dimensional implicit features are extracted by the method of combining inner product and outer product and by the method of multiplying the inner product and the outer product to cross the features. The application of deep learning to the algorithm problems of the computing advertisement and recommendation system can be effectively expanded by combining the inner/outer product feature interaction and the Bayesian neural network model, and the accuracy of the click behavior prediction of the user is remarkably improved.
Drawings
FIG. 1 is a diagram showing the overall network structure according to an embodiment of the present invention
FIG. 2 is a schematic diagram showing four forms of node distribution of a Bayesian neural network in an embodiment of the present invention
FIG. 3 is a flow chart of a marketing prediction method combining inner/outer product feature interactions and Bayesian neural networks in an embodiment of the present invention
FIG. 4 is a schematic diagram showing the operations of the inner product (A) and the outer product (B) in the embodiment of the present invention
FIG. 5 is a schematic diagram illustrating the factorization operation of the weight matrix according to an embodiment of the present invention
FIG. 6 is a diagram showing the comparison of conventional deep learning network weights (left) and Bayesian network weights (right)
Detailed Description
The following is a further detailed description of embodiments of the invention in conjunction with the accompanying figures 1-6.
In the following detailed description of the embodiments of the present invention, the structures of the present invention are not drawn to a general scale and are not partially enlarged, deformed, or simplified, so that the present invention should not be construed as being limited thereto.
It should be noted that, in the following embodiments of the present invention, the marketing prediction method combining the inner/outer product feature interaction and the bayesian neural network is constructed in the overall structure of the bayesian neural network model. Referring to fig. 1, fig. 1 is a schematic diagram illustrating an overall network structure according to an embodiment of the invention. As shown in fig. 1, the bayesian neural network includes an Input layer (Input layer), an Embedding layer (Embedding layer), a Product layer (Product layer), a factorization layer (Factorization layer), a full-connected layer (Output layer), and an Output layer (Output layer).
The input layer is used for dividing the received feature data after preprocessing and the data set, then processing the feature data through the embedding layer, the product layer and the factorization layer, and inputting the feature data into the full-connection layer and the output layer.
The marketing prediction method combining the inner/outer product feature interaction and the Bayesian neural network comprises the steps of firstly, classifying category features in input original feature information features into different domains (fields) according to feature properties (such as age and gender included in a user ID, DPI user access time and/or user access duration and the like) after single-hot encoding (One-hot encoding); then, after the embedding layer is embedded, the interaction of the characteristic information is carried out through the inner product or the outer product among the characteristics; then, a priori Gaussian distribution assumption is introduced to network parameters, and posterior distribution is calculated through variation inference and minimized Kullback-Leibler divergence to obtain updated network weights; finally, the neural network model is obtained.
Compared with the traditional technology adopted in the data marketing field by utilizing the operator data, the invention can effectively extract the interaction information of the features and reduce the feature dimension through the ingenious design of the feature engineering and the algorithm, and simultaneously, the uncertainty measure is added into the network architecture, so that the reliability of the algorithm decision is higher, the overfitting is effectively prevented, and the deep learning model (the nervus neural network model) with the capability of representing different data modes is obtained.
Referring to fig. 2, fig. 2 is a schematic diagram of four forms of bayesian neural network hierarchical node distribution according to an embodiment of the present invention. As shown in fig. 2, the bayesian neural network model hierarchical node distribution form includes: incremental, invariant constant, diamond or decremental Decryasing. The selection of the four forms of the hierarchical node distribution may be performed according to different needs of the service, which is not described herein.
Referring to fig. 3, fig. 3 is a flowchart illustrating a marketing prediction method combining inner/outer product feature interaction and bayesian neural network according to an embodiment of the invention. As shown in fig. 3, the marketing prediction method of the manuscript-combined inner/outer product feature interaction and bayesian neural network includes a data preprocessing step S1, a data set dividing step S2, a model building step S3, and a model prediction step S4.
In the embodiment of the present invention, the data preprocessing step is very important, and the data preprocessing step S1 includes the steps of:
step S11: acquiring original information of a user, and extracting original characteristic information from the original information of the user; the original characteristic information comprises a user ID, a user mobile phone number attribution, a task batch number, user access DPI frequency, user access time and/or user access duration; the task batch numbers represent original information of users in a date time period, DPI access frequency, DPI access time and/or user access duration of the users are measured by each task batch number, and attribution features of the user access DPI and mobile phone numbers of the users are category features.
Referring to table 1 below, table 1 is a table description of the raw data before preprocessing, and taking the same batch of data as an example, the raw data before preprocessing is in the form shown in table 1 below:
table 1:
preferably, in the embodiment of the present invention, in step S11, the method may further include steps of anomaly detection and processing, category feature processing, continuous feature processing, dimension reduction processing, and the like, on the original information data of the user.
Abnormality detection and processing: in the process of combining the service requirements, deletion, filling and the like are required to be carried out on missing values, overlarge values and the like in the original data. In the data acquisition process, the general user quantity is millions, so that the missing condition can occur in the acquisition process; if the missing amount is smaller, the method can be generally used for directly eliminating; if it cannot be determined whether the missing data will affect the final model training effect, the missing values can be filled in according to average, mode, median, etc.
In addition, in the data acquisition, a problem of excessive value may be encountered, for example, a user accesses the DPI ten thousands of times in a day, which generally does not help to improve the generalization ability of the model in the actual modeling process, so that the elimination process or the filling method may be adopted to perform corresponding processing.
Step S12: processing the category characteristics; performing One-hot encoding processing on the home feature of the user mobile phone number and the DPI accessed by the user; the One-hot encoding process includes sequentially expanding all different user access DPIs as individual features according to the task lot number, and expanding the user access DPI frequency in the task lot number into a relationship feature of DPI and user access DPI frequency according to all different user access DPI.
Specifically, first, one-hot unique coding can be performed on the user access DPI and the home feature of the mobile phone number of the user, and the One-hot unique coding can be developed. Taking a user accessing a DPI as an example, if a certain user accesses a certain DPI, recording the DPI as 1, and the rest DPIs as 0; thus, if there are a total of 10 different DPIs, then eventually 10 columns of features will be formed, with only one corresponding user in each column of features being a 1 and the remainder being 0.
Step S13: processing the continuous features; that is, access time and access duration data of different dimensions are mapped to a unified interval, and the data distribution is adjusted to approximate a gaussian distribution.
Step S14: and adopting Principal Component Analysis (PCA) to perform dimension reduction treatment on the high-dimensional characteristics.
Specifically, as can be seen from the above processing of the class features, a high-dimensional sparse matrix is generally formed after the single thermal encoding, which means that there is no way to derive in many places when the error is back propagated for training the bayesian neural network, which is obviously unfavorable for training the bayesian neural network.
At the same time, the high-dimensional features also increase computational overhead. Therefore, it is necessary to perform the dimension reduction process for the high-dimensional feature first. It is clear to those skilled in the art that principal component analysis PCA achieves the purpose of dimension reduction by solving that the variance of the original data in a certain projection direction is maximum; the method can reduce the loss of the original feature containing information as much as possible while reducing the feature dimension, so as to achieve the purpose of comprehensively analyzing the collected data.
After the pretreatment in the above steps, the data format is shown in the following table 2:
next, the data set partitioning step S2 may be performed, and in an embodiment of the present invention, whether the user accesses the DPI feature click feature may be regarded as a sparse feature, and the home feature and the user accesses the DPI frequency may be defined as continuous features. Since Click-Through-Rate (CTR) problems generally relate to a significant time sequence, it is the behavior of the user that needs to be predicted at the next point in time. Therefore, the previous history data is generally referred to as training data (training data) in the time series; and performs local verification (verification data) on the data corresponding to the time point.
The data set dividing step S2 specifically includes the following steps:
step S21: after preprocessing, the attribution feature and the feature of whether the user accesses the DPI on the same day are regarded as sparse features, and the user access DPI frequency is defined as continuous features;
step S22: forming training set data according to historical data of time sequences 1,2 and … t-1 before a time point t+1 to be predicted; and the data corresponding to the time point t is used as a local verification set.
Then, training and verification of the user predictive model may be performed by performing a model building step S3. In an embodiment of the present invention, the user prediction model is a bayesian neural network model.
Specifically, step S3 may specifically include:
step S31: providing a Bayesian neural network initial model, taking class features in the training set data as M1-dimensional feature information of an input layer of the Bayesian neural network, inputting the M1-dimensional feature information into an embedded layer of the Bayesian neural network for information extraction and dimension reduction, and reducing the M1-dimensional feature information to M2-dimensional feature information; wherein M2 is less than M1.
As shown in fig. 1, the bayesian neural network includes an input layer, an embedded layer, a product layer, a factorization layer, a full connection layer, and an output layer. Assuming that the original feature is preprocessed, N and different fields (numbered Field1, field2, field3, …, field N) are formed.
The features of the N domains (Feature 1, feature2, feature3, …, feature N) form a sparse matrix of high dimension due to the step of single thermal encoding. Therefore, an embedding layer is added in the network structure, the feature is subjected to embedding (embedding) processing, and the sparse feature information is extracted and reduced in dimension once, so that a low-dimension low-dimensional vector layer is obtained.
Step S32: and adding the M2-dimensional features after dimension reduction with M3-dimensional continuous features to form M-dimensional features, and performing multiplication operation of inner products and outer products on the M-dimensional features in a product layer to enable feature information of the M-dimensional features to be interacted.
In the feature dimension reduction steps S31 and S32, the output layer of the decoder section uses a sigmoid function as an activation function, and the output value of the output layer of the decoder section is a value between 0 and 1; the activation functions of the remaining other layers all use a ReLU activation function.
Referring to fig. 4, fig. 4 is a schematic diagram illustrating the operation of the inner product (a) and the outer product (B) according to the embodiment of the invention. Different from the operation of directly adding the full connection layer in the deep learning model in the prior art, the invention also needs to multiply the inner product and the outer product of the embedded features and interact the feature information.
As can be seen from fig. 4, for the outer product operation to obtain a matrix, if the matrix has only values on the diagonal, it becomes the result of the inner product operation, so the inner product operation can be regarded as a special case of the outer product operation. In this way, the relationship between two different domains can be measured.
It will be clear to those skilled in the art that the parameters of a general model will rise after an inner or outer product operation on the features. In order to reduce the computational consumption, factorization (factorization) methods may be employed to transform a large weight matrix into a product of a small weight matrix and a transpose of the matrix. That is, step S33 is performed in which the weight matrix of the M-dimensional feature is factorized by an factorization method in the factorization layer.
Referring to fig. 5, fig. 5 is a schematic diagram illustrating a factorization operation of a weight matrix according to an embodiment of the invention. After the above steps are completed, step S34 may be executed, that is, the information of the M-dimensional features is input into the fully connected layer for training, so as to obtain a trained bayesian neural network model, where the bayesian neural network model is a user prediction model with two output layer neurons; and validating the user prediction model using the local validation set data.
Referring to fig. 6, fig. 6 is a schematic diagram showing comparison between the weights (left) of the conventional deep learning network and the weights (right) of the bayesian neural network, wherein one of the differences between the bayesian neural network model and the conventional deep learning network model is that the connection weights between networks are not a constant, but a distribution, and the distribution is obtained by bayesian inference.
In an embodiment of the present invention, the algorithmic description of the bayesian network may be as follows:
(1) from N (μ, log (1+e) ρ ) Sampling to obtain initial weight omega of the network;
(2) respectively calculating log q (omega|theta), log p (omega), log p (y|omega, x);
(3) calculating a loss function
(4) Update network parameter θ' =θ - αv θ L。
Further, after the model training is completed, the marketing prediction method combining the inner/outer product feature interaction and the bayesian neural network further includes step S35, and model evaluation index processing and tuning processing are performed on the user prediction model.
The model evaluation index includes values that may generally include taking a Log loss function (Log loss), a relative information gain (Relative Information Gain, RIG), and AUC (Area under ROC curve). Generally, the closer the AUC value is to 1, the better the classification of the user predictive model.
For example, after the data are processed according to the steps and trained by the model, the training effect of the model can be judged by the AUC value of local verification; the model evaluation index is that the AUC value is smaller than a preset threshold value, then the model tuning processing is carried out on the user prediction model, if the effect is poor, the model is generally required to be tuned, and for a deep learning algorithm, the model tuning processing can be generally carried out from the following aspects:
(1) batch normalization is added (Batch Normalization), solving the internal covariate offset problem of the data (Internal Covariate Shift).
(2) And adding Dropout (the number of dormant neurons) in the network, namely, enabling part of neurons to be in a dormant state in the training process.
(3) The learning rate is generally adjusted by strategies such as exponential decay.
(4) And setting a plurality of sub-training to average, and reducing the risk of overfitting in the training process.
(5) Increasing L1 or L2 regularization, penalties are applied to the loss function to reduce the risk of overfitting.
(6) And optimizing super parameters.
In the optimization method for the super parameter, grid Search (Grid Search) or Random Search (Random Search) can be generally adopted; however, both of the above methods are relatively expensive and inefficient in terms of computing resources. In an embodiment of the invention, a bayesian optimization (Bayesian Optimization) strategy is employed. Bayesian optimization calculates posterior probability distribution of the previous n data points through Gaussian process regression to obtain the mean value and variance of each super parameter at each value point; the Bayesian optimization is carried out by balancing the mean and the variance, and finally selecting a group of better super parameters according to the joint probability distribution among the super parameters.
After all the processing steps are finished, the characteristics can be brought into a user prediction model, partial users with higher will can be screened out in advance before advertisement delivery, and marketing advertisements can be accurately delivered to the users.
That is, the invention may further include a model prediction step S4, where the task of accurately marketing the user selected at the time point t+1 to be predicted is obtained according to the user prediction model.
The result shows that the click rate of the high willingness user selected by the user prediction model algorithm is about 10 times of the click rate of the low willingness user. Through the user prediction model, a large number of low-willingness users can be directly screened out from the throwing targets, so that a large number of marketing costs are saved, and the profit margin is increased.
The foregoing description is only of the preferred embodiments of the present invention, and the embodiments are not intended to limit the scope of the invention, so that all changes made in the equivalent structures of the present invention described in the specification and the drawings are included in the scope of the invention.

Claims (10)

1. The marketing prediction method combining the inner/outer product feature interaction and the Bayesian neural network is characterized by comprising a data preprocessing step S1, a data set dividing step S2 and a model building step S3;
the data preprocessing step S1 includes the following steps:
step S11: acquiring original information of a user, and extracting original characteristic information from the original information of the user; the original characteristic information comprises a user ID, a user mobile phone number attribution, a task batch number, DPI accessed by a user on the same day, DPI access frequency of the user, user access time and/or user access duration; the task batch number represents original information of a user in a date time period, DPI access frequency, DPI access time and/or user access duration of the user are/is measured by taking each task batch number as a measurement unit, and the attribution characteristics of the DPI access and the mobile phone number of the user are category characteristics;
step S12: processing the category characteristics; performing One-hot encoding processing on the home feature of the user mobile phone number and the DPI accessed by the user; wherein, the One-hot encoding process comprises:
sequentially expanding all different user access DPIs as independent features according to the task batch number, and expanding DPI access frequency into a relationship feature of DPI and user access DPI frequency according to all different user access DPIs in the task batch number;
step S13: processing the continuous features; the access time and access time length data of different dimensions are mapped to a unified interval, and the data distribution is adjusted to be approximate to Gaussian distribution;
step S14: performing dimension reduction treatment on the high-dimensional characteristics by adopting principal component analysis;
the data set dividing step S2 includes the steps of:
step S21: after preprocessing, the attribution feature and the feature of whether the user accesses the DPI on the same day are regarded as sparse features, and the user access DPI frequency is defined as continuous features;
step S22: forming training set data according to historical data of time sequences 1,2 and … t-1 before a time point t+1 to be predicted; the data corresponding to the time point t is used as a verification set;
the model building step S3 includes the steps of:
step S31: providing a Bayesian neural network initial model, taking class features in the training set data as M1-dimensional feature information of an input layer of the Bayesian neural network, inputting the M1-dimensional feature information into an embedded layer of the Bayesian neural network for information extraction and dimension reduction, and reducing the M1-dimensional feature information to M2-dimensional feature information; wherein, M2 is smaller than M1, the Bayesian neural network comprises an input layer, an embedded layer, a product layer, an factorization layer, a full connection layer and an output layer;
step S32: adding the M2-dimensional features after dimension reduction with M3-dimensional continuous features to form M-dimensional features, and performing multiplication operation of inner products and outer products on the M-dimensional features in a product layer to enable feature information of the M-dimensional features to be interacted;
step S33: in the factorization layer, factorizing a weight matrix of the M-dimensional feature by adopting an factorization method;
step S34: inputting the information of the M-dimensional characteristics into the full-connection layer for training to obtain a trained Bayesian neural network model, wherein the Bayesian neural network model is a user prediction model with two output layer neurons; and validating the user prediction model by adopting local validation set data.
2. The marketing prediction method combining the interior/exterior product feature interaction and the bayesian neural network according to claim 1, further comprising step S35 of performing model evaluation index processing and tuning processing on the user prediction model.
3. The marketing prediction method combining inner/outer product feature interactions and bayesian neural networks according to claim 2, wherein the model evaluation metrics include employing log-loss functions, relative information gain RIG, and AUC values.
4. The marketing prediction method combining inner/outer product feature interactions and bayesian neural networks according to claim 3, wherein the model evaluation index is an AUC value, and the model tuning process is performed on the user prediction model if the AUC value is smaller than a predetermined threshold.
5. The marketing prediction method combining inner/outer product feature interactions and bayesian neural networks according to claim 2, wherein the model tuning process comprises one or more of the following:
increasing batch normalization to solve the problem of internal covariate offset of data;
adding a function of enabling part of neurons to be in a dormant state in the training process into a network;
the learning rate is adjusted, and the learning rate in the training process is adjusted through an exponential decay strategy;
setting a plurality of sub-training averages to better improve the problem of insufficient generalization capability caused by larger data variance;
adding L1 or L2 regularization, and applying punishment to the loss function to reduce the risk of overfitting;
the optimizing method for super parameter.
6. The marketing prediction method combining inner/outer product feature interactions and bayesian neural networks according to claim 5, wherein the optimization method of the pair of super-parameters adopts a bayesian optimization strategy.
7. The marketing prediction method combining inner/outer product feature interactions and bayesian neural networks according to claim 1, wherein the continuous feature processing is utilizing a RankGauss method.
8. The marketing prediction method combining the interior/exterior product feature interaction and the bayesian neural network according to claim 1, further comprising an anomaly detection and processing step for the original information of the user after step S11.
9. The marketing prediction method combining inner/outer product feature interaction and a bayesian neural network according to claim 1, further comprising a model prediction step S4, wherein a task of accurately marketing the user selected at a time point t+1 to be predicted is obtained according to the user prediction model.
10. The marketing prediction method combining inner/outer product feature interactions and bayesian neural networks according to claim 1, wherein the bayesian neural network model level node distribution morphology comprises: incremental, invariant constant, diamond or decremental Decryasing.
CN202110125002.3A 2021-01-29 2021-01-29 Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network Active CN112819523B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110125002.3A CN112819523B (en) 2021-01-29 2021-01-29 Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110125002.3A CN112819523B (en) 2021-01-29 2021-01-29 Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network

Publications (2)

Publication Number Publication Date
CN112819523A CN112819523A (en) 2021-05-18
CN112819523B true CN112819523B (en) 2024-03-26

Family

ID=75860166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110125002.3A Active CN112819523B (en) 2021-01-29 2021-01-29 Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network

Country Status (1)

Country Link
CN (1) CN112819523B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240025B (en) * 2021-05-19 2022-08-12 电子科技大学 Image classification method based on Bayesian neural network weight constraint
CN113344615B (en) * 2021-05-27 2023-12-05 上海数鸣人工智能科技有限公司 Marketing campaign prediction method based on GBDT and DL fusion model
TWI773507B (en) * 2021-09-01 2022-08-01 國立陽明交通大學 Algorithm and device for predicting system reliability

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019018533A1 (en) * 2017-07-18 2019-01-24 Neubay Inc Neuro-bayesian architecture for implementing artificial general intelligence
CN109831801A (en) * 2019-01-04 2019-05-31 东南大学 The node B cache algorithm of user's behavior prediction based on deep learning neural network
CN110619540A (en) * 2019-08-13 2019-12-27 浙江工业大学 Click stream estimation method of neural network
CN110956497A (en) * 2019-11-27 2020-04-03 桂林电子科技大学 Method for predicting repeated purchasing behavior of user of electronic commerce platform
CN112149352A (en) * 2020-09-23 2020-12-29 上海数鸣人工智能科技有限公司 Prediction method for marketing activity clicking by combining GBDT automatic characteristic engineering
CN112258223A (en) * 2020-10-13 2021-01-22 上海数鸣人工智能科技有限公司 Marketing advertisement click prediction method based on decision tree

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090138386A1 (en) * 2007-11-26 2009-05-28 Wachovia Corporation Interactive statement
KR101312927B1 (en) * 2011-06-03 2013-10-01 한국과학기술원 Advertisement providing system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019018533A1 (en) * 2017-07-18 2019-01-24 Neubay Inc Neuro-bayesian architecture for implementing artificial general intelligence
CN109831801A (en) * 2019-01-04 2019-05-31 东南大学 The node B cache algorithm of user's behavior prediction based on deep learning neural network
CN110619540A (en) * 2019-08-13 2019-12-27 浙江工业大学 Click stream estimation method of neural network
CN110956497A (en) * 2019-11-27 2020-04-03 桂林电子科技大学 Method for predicting repeated purchasing behavior of user of electronic commerce platform
CN112149352A (en) * 2020-09-23 2020-12-29 上海数鸣人工智能科技有限公司 Prediction method for marketing activity clicking by combining GBDT automatic characteristic engineering
CN112258223A (en) * 2020-10-13 2021-01-22 上海数鸣人工智能科技有限公司 Marketing advertisement click prediction method based on decision tree

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
HRS-DC:基于深度学习的混合推荐模型;刘振鹏;尹文召;王文胜;孙静薇;;计算机工程与应用(14);全文 *
基于Ranking的贝叶斯序列推荐算法;何慧;;小型微型计算机***(第07期);全文 *
基于支持向量机的客户流失预测模型;夏国恩;金炜东;;***工程理论与实践(第01期);全文 *
基于注意力机制的神经网络贝叶斯群组推荐算法;李诗文;潘善亮;;计算机应用与软件(第05期);全文 *
基于贝叶斯方法的网络广告预测模型研究;吴英;中国优秀硕士学位论文;全文 *
基于门控循环单元神经网络的广告点击率预估;陈巧红;董雯;孙麒;贾宇波;;浙江理工大学学报(自然科学版)(第05期);全文 *

Also Published As

Publication number Publication date
CN112819523A (en) 2021-05-18

Similar Documents

Publication Publication Date Title
US10360517B2 (en) Distributed hyperparameter tuning system for machine learning
CN112819523B (en) Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network
CN111797321B (en) Personalized knowledge recommendation method and system for different scenes
CN112967088A (en) Marketing activity prediction model structure and prediction method based on knowledge distillation
CN113344615B (en) Marketing campaign prediction method based on GBDT and DL fusion model
US10963802B1 (en) Distributed decision variable tuning system for machine learning
CN112910690A (en) Network traffic prediction method, device and equipment based on neural network model
CN114202061A (en) Article recommendation method, electronic device and medium based on generation of confrontation network model and deep reinforcement learning
CN110619540A (en) Click stream estimation method of neural network
CN112258223B (en) Marketing advertisement click prediction method based on decision tree
CN113255844B (en) Recommendation method and system based on graph convolution neural network interaction
CN113591971B (en) User individual behavior prediction method based on DPI time sequence word embedded vector
CN111611488A (en) Information recommendation method and device based on artificial intelligence and electronic equipment
CN111178986A (en) User-commodity preference prediction method and system
CN113254795A (en) Training method and device for recommendation model
CN113761193A (en) Log classification method and device, computer equipment and storage medium
CN112581177B (en) Marketing prediction method combining automatic feature engineering and residual neural network
CN115310004A (en) Graph nerve collaborative filtering recommendation method fusing project time sequence relation
CN113010774B (en) Click rate prediction method based on dynamic deep attention model
CN113360772A (en) Interpretable recommendation model training method and device
CN112927012A (en) Marketing data processing method and device and marketing model training method and device
CN113158577A (en) Discrete data characterization learning method and system based on hierarchical coupling relation
Wang Forecast model of TV show rating based on convolutional neural network
CN112149806B (en) Access control strategy generation method and device based on machine learning
CN115292587B (en) Recommendation method and system based on knowledge distillation and causal reasoning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 200436 room 406, 1256 and 1258 Wanrong Road, Jing'an District, Shanghai

Applicant after: Shanghai Shuming Artificial Intelligence Technology Co.,Ltd.

Address before: Room 1601-026, 238 JIANGCHANG Third Road, Jing'an District, Shanghai, 200436

Applicant before: Shanghai Shuming Artificial Intelligence Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant