CN112819523A

CN112819523A - Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network

Info

Publication number: CN112819523A
Application number: CN202110125002.3A
Authority: CN
Inventors: 项亮; 方同星
Original assignee: Shanghai Shuming Artificial Intelligence Technology Co ltd
Current assignee: Shanghai Shuming Artificial Intelligence Technology Co ltd
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2021-05-18
Anticipated expiration: 2041-01-29
Also published as: CN112819523B

Abstract

A marketing prediction method combining inner/outer product feature interaction and a Bayesian neural network comprises a data preprocessing step, a data set dividing step, a model establishing step and a prediction step of marketing activity clicking; in the process of establishing a prediction model, Bayesian inference is effectively utilized, and prediction uncertainty is introduced into a Bayesian neural network, so that the Bayesian neural network model has stronger robustness. And the features are crossed to extract the high-dimensional recessive features by adopting an inner/outer product combination method. Therefore, the method can effectively expand the application of deep learning to the algorithm problem of advertisement calculation and recommendation systems, and obviously improve the accuracy of user click behavior prediction.

Description

Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network

Technical Field

The invention relates to the technical field of artificial intelligence marketing in the Internet, in particular to a marketing prediction method combining inner/outer product feature interaction and a Bayesian neural network.

Background

The network advertising marketing is maximally spread to audience crowds by means of network marketing, and advertisements are put into targeted customers by means of a network platform. In calculating advertisement and recommendation system algorithms, commonly used algorithms include linear models such as Logistic Regression (LR), Factorization Machine (FM), and the like.

The algorithms have the characteristics of good interpretability and simple algorithm implementation, however, the algorithms are simple and have limited expression capacity. Therefore, the algorithms are difficult to extract high-order interaction information among the features, so that the overall performance of the algorithms is affected.

In addition, with the successful application of deep learning algorithms in many fields, such as Natural Language Processing (NLP), Computer Vision (CV), etc., deep learning models are also being increasingly applied in the field of mainstream advertising and recommendation systems.

Although the deep learning model has the advantages of automatic feature extraction and end-to-end learning, which are not possessed by many traditional algorithms, the deep learning model also has the following obvious disadvantages in the application of computing advertisements and recommending systems:

firstly, in most data sets of recommendation systems, a large-dimension sparse matrix, namely a matrix consisting of 0 and 1, is formed, and a certain difficulty exists in a deep learning model based on gradient descent; meanwhile, a large sparse matrix also causes large computational power consumption and excessive computation time. Therefore, how to reduce feature dimensions and effectively extract feature interaction information provides higher requirements for the design of feature engineering and algorithms.

Secondly, preventing the over-fitting phenomenon is a very important problem in the deep learning algorithm. In general, the risk of overfitting the model can be reduced by using an early-stopping mechanism, weight attenuation, L1-L2 regularization, Dropout and the like. However, for the precise positioning and placement problem in advertisement marketing, the uncertainty measurement in the model is also needed to be considered, and an over-confidence algorithm decision cannot obtain good benefit in actual advertisement placement. Therefore, how to add uncertainty measurement into a network architecture enables the reliability of algorithm decision to be higher, overfitting is effectively prevented, and the method is one of key technologies for applying deep learning to computing advertisements and recommending system problems and breaking through the problems.

Thirdly, the traditional deep learning model completes the crossing and combination of features directly through a plurality of layers of full connection layers, but the mode lacks certain pertinence.

Firstly, the full connection layer does not intersect aiming at different characteristic domains;

second, the operation of the fully connected layer is not directly designed for feature crossing.

Therefore, we need to develop a deep learning model with the ability to characterize different data patterns for specific services.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides a marketing prediction method combining inner/outer product feature interaction and a Bayesian neural network, which comprises a data preprocessing step S1, a data set dividing step S2 and a model establishing step S3;

the data preprocessing step S1 includes the steps of:

step S11: acquiring original information of a user, and extracting original characteristic information from the original information of the user; the original characteristic information comprises a user ID, a user mobile phone number attribution, a task batch number, a DPI (deep packet inspection) visited by a user on the same day, a DPI visited frequency of the user, a user visited time and/or a user visited duration; the task batch number represents original information of a user in a date time period, the DPI access frequency of the user, the DPI access time of the user and/or the user access time are/is each task batch number as a metering unit, and the DPI access time of the user on the same day and the attribution feature of the mobile phone number of the user are category features;

step S12: processing the category characteristics; performing One-hot coding processing on the attribution characteristics of the user mobile phone number and the DPI accessed by the user; wherein the One-hot encoding process comprises:

sequentially expanding all different user access DPIs as independent features according to the task batch numbers, and expanding the DPI access frequency in the task batch numbers into the relationship features of the DPI and the DPI access frequency of the users according to all different user access DPIs;

step S13: processing the continuous features; mapping access time and access duration data of different dimensions to a uniform interval, and adjusting the data distribution to approximate Gaussian distribution;

step S14: performing dimensionality reduction on the high-dimensional feature by adopting principal component analysis;

the data set dividing step S2 includes the steps of:

step S21: after preprocessing, regarding the attribution feature and the feature whether a user visits the DPI or not on the same day as a sparse feature, and defining the frequency of the user visiting the DPI as a continuous feature;

step S22: forming training set data according to historical data of which the time sequence before the time point t +1 needs to be predicted is 1,2, … t-1 time points; taking the data corresponding to the time point t as a local verification set;

the model building step S3 includes the steps of:

step S31: providing an initial Bayesian neural network model, taking the class characteristics in the training set data as M1 dimensional characteristic information of an input layer of the Bayesian neural network, inputting the M1 dimensional characteristic information into an embedding layer of the Bayesian neural network for information extraction and dimension reduction, and reducing the M1 dimensional characteristic information to M2 dimensional characteristic information; wherein M2 is less than M1, and the Bayesian neural network comprises an input layer, an embedding layer, a multiplication layer, a factorization layer, a full-link layer and an output layer;

step S32: adding continuous M3 dimensional features to the M2 dimensional features after dimensionality reduction to form M dimensional features, and performing multiplication operation of inner products and outer products on the M dimensional features in a multiplication and accumulation layer to enable feature information of the M dimensional features to be interacted;

step S33: in the factorization layer, factorizing the weight matrix of the M-dimensional features by adopting a factorization method;

step S34: inputting the information of the M-dimensional features into the full-connection layer for training to obtain a trained Bayesian neural network model, wherein the Bayesian neural network model is a user prediction model with two output layer neurons; and verifying the user prediction model by adopting the local verification centralized data.

Further, the marketing prediction method combining the inner/outer product feature interaction and the bayesian neural network further includes step S35, where the model evaluation index processing and the tuning processing are performed on the user prediction model.

Further, the model evaluation index includes a logarithmic loss function, a relative information gain RIG and an AUC value.

Further, the model evaluation index is an AUC value, and if the AUC value is smaller than a predetermined threshold, the model tuning process is performed on the user prediction model.

Further, the model tuning process includes one or more of the following steps:

firstly, batch normalization is added, and the problem of internal covariate deviation of data is solved;

secondly, adding a function of leading part of neurons to be in a dormant state in the training process in the network;

thirdly, adjusting the learning rate, wherein the learning rate in the training process is generally adjusted through strategies such as exponential attenuation and the like;

setting multi-seed training for averaging to better solve the problem of insufficient generalization capability caused by large data variance;

increasing L1 or L2 regularization, and applying punishment to the loss function to reduce the risk of overfitting;

sixthly, an optimization method for the over-parameters.

Further, the optimization method for the hyperparameters adopts a Bayesian optimization strategy.

Further, the continuous features are processed by using a RankGauss method.

Further, after step S11, the marketing prediction method combining the inner/outer product feature interaction and the bayesian neural network further includes the steps of performing anomaly detection and processing on the original information of the user.

Further, the marketing prediction method combining the inner/outer product feature interaction and the Bayesian neural network further comprises a model prediction step S4, wherein a task of accurate marketing of the screened user at the time point t +1 needing to be predicted is obtained according to the user prediction model.

Further, the hierarchical node distribution morphology of the bayesian neural network model comprises: incremental creating, invariant holding, diamond, or decremental creating.

According to the technical scheme, the marketing prediction method combining the inner/outer product feature interaction and the Bayesian neural network can effectively utilize Bayesian inference and introduce prediction uncertainty into the Bayesian neural network, so that a Bayesian neural network model has stronger robustness. By the method of combining inner/outer products, the features are crossed by the multiplication operation of the inner product and the outer product to extract the high-dimensional recessive features. The combination of the inner/outer product feature interaction and the Bayesian neural network model can effectively expand the application of deep learning to the algorithm problem of advertisement calculation and recommendation system, and obviously improve the accuracy of user click behavior prediction.

Drawings

FIG. 1 is a diagram illustrating an overall network structure according to an embodiment of the present invention

FIG. 2 is a schematic diagram of four types of node distributions of a Bayesian neural network hierarchy in an embodiment of the present invention

FIG. 3 is a flow chart illustrating a marketing prediction method combining inner/outer product feature interaction and a Bayesian neural network according to an embodiment of the present invention

FIG. 4 is a schematic diagram illustrating the operation of the inner product (A) and the outer product (B) in an embodiment of the present invention

FIG. 5 is a diagram illustrating factorization of weight matrices according to an embodiment of the present invention

FIG. 6 is a diagram illustrating a comparison between weights (left) and weights (right) of a conventional deep learning network

Detailed Description

The following description of the embodiments of the present invention will be made in detail with reference to the accompanying drawings 1 to 6.

In the following detailed description of the embodiments of the present invention, in order to clearly illustrate the structure of the present invention and to facilitate explanation, the structure shown in the drawings is not drawn to a general scale and is partially enlarged, deformed and simplified, so that it should be understood as a limitation of the present invention.

In the following embodiments of the present invention, the marketing prediction method combining the inner/outer product feature interaction and the bayesian neural network is configured in the overall structure of the bayesian neural network model. Referring to fig. 1, fig. 1 is a schematic diagram illustrating an overall network structure according to an embodiment of the present invention. As shown in fig. 1, the bayesian neural network includes an Input layer, an embedded layer, a Product layer, a Factorization layer, a Fully-connected layer, and an Output layer.

Specifically, the input layer is used for processing the received preprocessed and data set divided feature data, then the feature data is processed by the embedding layer, the multiplication layer and the factorization layer, and then the feature data is input into the full connection layer and the output layer.

Firstly, after the category characteristics in the input original characteristic information characteristics are subjected to One-hot encoding, dividing the category characteristics into different domains (fields) according to characteristic properties (such as the age and the gender included by a user ID, DPI user access time and/or user access time length and other information); then after embedding processing of an embedding layer, carrying out interaction of feature information through an inner product or an outer product between features; then, calculating posterior distribution by introducing prior Gaussian distribution assumption to network parameters and deducing and minimizing Kullback-Leibler divergence through variation to obtain updated network weight; and finally obtaining the final neural network model.

Compared with the traditional technology adopted in the field of data marketing by utilizing operator data, the method can effectively extract the interactive information of the features and reduce the feature dimension through the ingenious design of feature engineering and algorithm, simultaneously, uncertainty measurement is added into a network architecture, so that the reliability of algorithm decision is higher, overfitting is effectively prevented, and a deep learning model (a Lepis neural network model) with the capability of representing different data modes is obtained.

Referring to fig. 2, fig. 2 is a schematic diagram illustrating four forms of hierarchical node distribution of a bayesian neural network according to an embodiment of the present invention. As shown in fig. 2, the hierarchical node distribution form of the bayesian neural network model includes: incremental creating, invariant holding, diamond, or decremental creating. The selection of the four forms of the hierarchical node distribution can be performed according to different needs of services, and details are not described herein.

Referring to fig. 3, fig. 3 is a flow chart illustrating a marketing prediction method combining inner/outer product feature interaction and a bayesian neural network according to an embodiment of the present invention. As shown in fig. 3, the marketing prediction method combining the inner/outer product feature interaction and the bayesian neural network includes a data preprocessing step S1, a data set partitioning step S2, a model building step S3, and a model prediction step S4.

In an embodiment of the present invention, the data preprocessing step S1 includes the following steps:

step S11: acquiring original information of a user, and extracting original characteristic information from the original information of the user; the original characteristic information comprises a user ID, a user mobile phone number attribution, a task batch number, a user access DPI frequency, a user access time and/or a user access duration; the task batch number represents original information of a user in a date time period, the DPI access frequency of the user, the DPI access time of the user and/or the user access time are/is each task batch number as a metering unit, and the DPI access and the mobile phone number attribution of the user are characterized by category.

Referring to table 1 below, table 1 is a table description of raw data before preprocessing, and taking the data of the same batch as an example, the raw data before preprocessing is shown in table 1 below:

table 1:

preferably, in the embodiment of the present invention, in step S11, the method may further include the step of performing anomaly detection and processing, category feature processing, continuous feature processing, and dimension reduction processing on the raw information data of the user.

Abnormality detection and processing: in the process of combining the service requirements, deletion, filling and other processing are required for missing values, overlarge values and the like in the original data. In the data acquisition process, as the number of general users is in the million level, the missing condition may occur in the data acquisition process; if the missing amount is small, the removal can be generally directly carried out; if it is impossible to determine whether the missing data will affect the final model training effect, the missing value can be filled up by taking the average, mode, median, etc.

In addition, in data acquisition, a problem of an excessively large value may be encountered, for example, a user accesses the DPI ten thousand times within a day, which generally does not help to improve the generalization capability of the model in the actual modeling process, and therefore, a culling process or a padding method may be adopted to perform corresponding processing.

Step S12: processing the category characteristics; performing One-hot coding processing on the attribution characteristics of the user mobile phone number and the DPI accessed by the user; and the One-hot coding processing comprises the steps of expanding all different user access DPIs as independent features according to the task batch numbers in sequence, and expanding the frequency of the user access DPIs in the task batch numbers into the relationship features of the DPIs and the frequency of the user access DPIs according to all the different user access DPIs.

Specifically, firstly, One-hot unique coding can be performed on the DPI accessed by the user and the attribution characteristics of the mobile phone number of the user, and the One-hot unique coding is expanded. Taking a user accessing the DPIs as an example, if a certain user accesses a certain DPI, recording the DPI as 1, and recording the rest DPIs as 0; thus, if there are 10 different DPIs, 10 columns of features are formed, and only one corresponding user in each column of features is 1, and the rest are 0.

Step S13: processing the continuous features; that is, the access time and access duration data of different dimensions are mapped to a uniform interval, and the data distribution is adjusted to approximate to Gaussian distribution.

Step S14: and performing dimensionality reduction on the high-dimensional feature by adopting Principal Component Analysis (PCA).

Specifically, as can be seen from the above processing of the class characteristics, a high-dimensional sparse matrix is generally formed after the one-hot coding, which means that there is no way to derive in many places when the error propagates reversely for the training of the bayesian neural network, which is obviously not beneficial to the training of the bayesian neural network.

At the same time, the high dimensional features also increase computational overhead. Therefore, it is necessary to perform dimensionality reduction on the high-dimensional features first. As is clear to a person skilled in the art, the Principal Component Analysis (PCA) achieves the purpose of reducing the dimension by solving the problem that the variance of original data in a certain projection direction is the largest; the method can reduce the dimension of the features and simultaneously reduce the loss of information contained in the original features as much as possible so as to achieve the aim of comprehensively analyzing the collected data.

After the pretreatment of the above steps, the data format is shown in the following table 2:

next, the data set dividing step S2 may be performed, and in an embodiment of the present invention, the feature of whether the user access DPI feature clicks or not may be regarded as a sparse feature, and the attribution feature and the user access DPI frequency count may be defined as a continuous feature. Since Click-Through-Rate (CTR) problems typically involve a significant chronological order, i.e. what needs to be predicted is the user's behavior at the next point in time. Therefore, the history data before the history data, i.e., the time series, is generally regarded as training data (training data); and the local verification (verification data) is performed on the data corresponding to the time point.

The data set dividing step S2 specifically includes the following steps:

step S22: forming training set data according to historical data of which the time sequence before the time point t +1 needs to be predicted is 1,2, … t-1 time points; and the data corresponding to the time point t is used as a local verification set.

Training and validation of the user prediction model may then be performed by performing model building step S3. In an embodiment of the present invention, the user prediction model is a bayesian neural network model.

Specifically, step S3 may specifically include:

step S31: providing a Bayesian neural network initial model, taking the class characteristics in the training set data as M1 dimensional characteristic information of an input layer of the Bayesian neural network, inputting the M1 dimensional characteristic information into an embedding layer of the Bayesian neural network for information extraction and dimension reduction, and reducing the M1 dimensional characteristic information to M2 dimensional characteristic information; wherein M2 is less than M1.

As shown in fig. 1, the bayesian neural network comprises an input layer, an embedding layer, a multiplication layer, a factorization layer, a full-link layer, and an output layer. It is assumed that the original features are preprocessed to form N and different domains (numbered Field1, Field2, Field3, …, Field dn).

The N-domain features (Feature1, Feature1, Feature2, Feature3, …, Feature enr) form a high-dimensional sparse matrix due to the one-hot encoding step. Therefore, an embedding layer is added in the network structure, embedding (embedding) processing is carried out on the features, and extraction and dimension reduction are carried out on sparse feature information once to obtain a low-dimensional vector layer.

Step S32: and adding continuous features of M3 dimensions to the M2-dimensional features after dimension reduction to form M-dimensional features, and performing multiplication operation of inner products and outer products on the M-dimensional features in a multiplication and accumulation layer to enable feature information of the M-dimensional features to be interacted.

In the feature dimension reduction steps S31 and S32, the output layer of the decoder portion uses a sigmoid function as an activation function, and the output value of the output layer of the decoder portion is a value between 0 and 1; the other layers of activation function all use the ReLU activation function.

Referring to fig. 4, fig. 4 is a schematic diagram illustrating the operation of the inner product (a) and the outer product (B) according to the embodiment of the present invention. Different from the operation of directly adding a full connection layer in the deep learning model in the prior art, the method also needs to perform multiplication operation of an inner product and an outer product on the embedded features to interact feature information.

As can be seen from fig. 4, for the outer product operation to obtain a matrix, if the matrix has only diagonal values, it becomes the result of the inner product operation, so the inner product operation can be regarded as a special case of the outer product operation. In this way, the relationship between two different domains can be measured.

It is clear to those skilled in the art that the parameters of the general model will rise after the inner or outer product operation on the features. In order to reduce the computation consumption, a factorization (factorization) method may be used to convert a large weight matrix into a product of a small weight matrix and the transpose of the matrix. That is, step S33 is executed to factorize the weight matrix of the M-dimensional feature in the factorization layer by using a factorization method.

Referring to fig. 5, fig. 5 is a diagram illustrating a factorization operation of a weight matrix according to an embodiment of the present invention. After the above steps are completed, step S34 may be executed, that is, the information of the M-dimensional features is input into the full connection layer for training, so as to obtain a trained bayesian neural network model, where the bayesian neural network model is a user prediction model with two output layer neurons; and verifying the user prediction model by adopting the local verification centralized data.

Referring to fig. 6, fig. 6 is a schematic diagram illustrating a comparison between the weights (left) of the conventional deep learning network and the weights (right) of the bayesian network, for the bayesian neural network model, which is different from the conventional deep learning network model in that the connection weights between the networks are not a constant, but a distribution, and the distribution is obtained by bayesian inference.

In an embodiment of the present invention, the algorithmic description of the bayesian network may be as follows:

(ii) from N (. mu.,. log (1+ e)^ρ) ) sampling to obtain an initial weight omega of the network;

secondly, calculating log q (omega | theta), log p (omega), log p (y | omega, x) respectively;

thirdly, calculating a loss function

Fourthly, updating the network parameter theta ═ theta-alpha ^_θL。

Further, after the model training is completed, the marketing prediction method combining the inner/outer product feature interaction and the bayesian neural network further includes step S35, where the model evaluation index processing and the tuning processing are performed on the user prediction model.

The model evaluation index may generally include Log loss function (Log loss), Relative Information Gain (RIG), and auc (area under ROC curve) values. Generally, the closer the AUC value is to 1, the better the classification effect of the user prediction model.

For example, after the data are processed according to the above steps and trained by the model, the training effect of the model can be judged through the locally verified AUC value; the model evaluation index is that AUC value is smaller than a preset threshold, then model tuning processing is carried out on the user prediction model, if the effect is poor, the model generally needs to be tuned and optimized, and for the deep learning algorithm, optimization can be generally carried out from the following aspects:

adding Batch Normalization (Batch Normalization) to solve the Internal Covariate Shift problem of data.

Secondly, Dropout (the number of the neurons in a dormant state) is added in the network, namely, part of the neurons are in a dormant state in the training process.

And thirdly, adjusting the learning rate, wherein the learning rate in the training process is generally adjusted through strategies such as exponential attenuation and the like.

And fourthly, setting multiple seed training for averaging, and reducing the overfitting risk in the training process.

Increasing L1 or L2 regularization, and applying punishment to the loss function to reduce the risk of overfitting.

Sixthly, an optimization method of the super parameters.

In the optimization method of the hyper-parameter, a Grid Search (Grid Search) or a Random Search (Random Search) can be generally adopted; however, the two methods are relatively high in consumption of computing resources and are not efficient. In an embodiment of the present invention, a Bayesian Optimization (Bayesian Optimization) strategy is employed. Bayesian optimization calculates posterior probability distribution of the previous n data points through Gaussian process regression to obtain the mean value and variance of each hyper-parameter at each value-taking point; bayesian optimization finally selects a group of better hyper-parameters through balancing mean and variance and according to the joint probability distribution among the hyper-parameters.

After all the processing steps are completed, the characteristics are brought into a user prediction model, so that partial users with high willingness can be screened out in advance before advertisement putting, and accurate putting of marketing advertisements is carried out on the users.

Namely, the method can further comprise a model prediction step S4, wherein a task of accurate marketing is obtained for the screened user at the time point t +1 needing prediction according to the user prediction model.

The result shows that the click rate of the high-intention user selected by the user prediction model algorithm is about 10 times that of the low-intention user. Through the user prediction model, a large number of low-intention users can be directly screened out from the putting targets, so that a large amount of marketing cost is saved, and the increase of profit margin is realized.

The above description is only for the preferred embodiment of the present invention, and the embodiment is not intended to limit the scope of the present invention, so that all the equivalent structural changes made by using the contents of the description and the drawings of the present invention should be included in the scope of the present invention.

Claims

1. A marketing prediction method combining inner/outer product feature interaction and a Bayesian neural network is characterized by comprising a data preprocessing step S1, a data set dividing step S2 and a model establishing step S3;

the data preprocessing step S1 includes the steps of:

the data set dividing step S2 includes the steps of:

step S22: forming training set data according to historical data of which the time sequence before the time point t +1 needs to be predicted is 1,2, … t-1 time points; taking data corresponding to the time point t as a verification set;

the model building step S3 includes the steps of:

step S31: providing a Bayesian neural network initial model, taking the class characteristics in the training set data as M1 dimensional characteristic information of an input layer of the Bayesian neural network, inputting the M1 dimensional characteristic information into an embedding layer of the Bayesian neural network for information extraction and dimension reduction, and reducing the M1 dimensional characteristic information to M2 dimensional characteristic information; wherein M2 is less than M1, and the Bayesian neural network comprises an input layer, an embedding layer, a multiplication layer, a factorization layer, a full-link layer and an output layer;

2. The marketing prediction method combining inner/outer product feature interaction and Bayesian neural network as claimed in claim 1, further comprising step S35 of performing model evaluation index processing and tuning processing on the user prediction model.

3. The marketing prediction method combining inner/outer product feature interaction and Bayesian neural networks as claimed in claim 2, wherein said model evaluation index comprises using log-loss function, relative information gain RIG and AUC values.

4. The marketing prediction method combining inner/outer product feature interaction and Bayesian neural network of claim 3, wherein the model evaluation index is an AUC value, and if the AUC value is smaller than a predetermined threshold, model tuning processing is performed on the user prediction model.

5. The marketing prediction method combining inner/outer product feature interaction and Bayesian neural networks as claimed in claim 2, wherein said model tuning process comprises one or more of:

batch normalization is added, and the problem of internal covariate deviation of data is solved;

adding a function of leading part of neurons to be in a dormant state in a training process in a network;

adjusting the learning rate, generally adjusting the learning rate in the training process through strategies such as exponential attenuation and the like;

setting multiple sub-training averaging to better solve the problem of insufficient generalization capability caused by large data variance;

adding L1 or L2 regularization, and applying penalties to the loss function to reduce the risk of overfitting;

and (3) optimizing the hyper-parameters.

6. The marketing prediction method combining inner/outer product feature interaction and Bayesian neural networks as claimed in claim 5, wherein said optimization method for hyperparameters employs a Bayesian optimization strategy.

7. The marketing prediction method combining inner/outer product feature interaction and bayesian neural networks according to claim 1, wherein the processing of the continuous features is by using a RankGauss method.

8. The marketing prediction method combining inner/outer product feature interaction and Bayesian neural network as claimed in claim 1, further comprising an anomaly detection and processing step for the user' S raw information after step S11.

9. The marketing prediction method combining the inner/outer product feature interaction and the Bayesian neural network as recited in claim 1, further comprising a model prediction step S4, wherein a task of accurate marketing is performed by the screened user at a time point t +1 to be predicted is obtained according to the user prediction model.

10. The marketing prediction method combining inner/outer product feature interaction and the Bayesian neural network as recited in claim 1, wherein the Bayesian neural network model hierarchy node distribution morphology comprises: incremental creating, invariant holding, diamond, or decremental creating.