CN112819523A - Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network - Google Patents

Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network Download PDF

Info

Publication number
CN112819523A
CN112819523A CN202110125002.3A CN202110125002A CN112819523A CN 112819523 A CN112819523 A CN 112819523A CN 202110125002 A CN202110125002 A CN 202110125002A CN 112819523 A CN112819523 A CN 112819523A
Authority
CN
China
Prior art keywords
user
neural network
model
bayesian neural
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110125002.3A
Other languages
Chinese (zh)
Other versions
CN112819523B (en
Inventor
项亮
方同星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Shuming Artificial Intelligence Technology Co ltd
Original Assignee
Shanghai Shuming Artificial Intelligence Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Shuming Artificial Intelligence Technology Co ltd filed Critical Shanghai Shuming Artificial Intelligence Technology Co ltd
Priority to CN202110125002.3A priority Critical patent/CN112819523B/en
Publication of CN112819523A publication Critical patent/CN112819523A/en
Application granted granted Critical
Publication of CN112819523B publication Critical patent/CN112819523B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • G06Q30/0244Optimization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Biomedical Technology (AREA)
  • Computational Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Mathematical Analysis (AREA)

Abstract

A marketing prediction method combining inner/outer product feature interaction and a Bayesian neural network comprises a data preprocessing step, a data set dividing step, a model establishing step and a prediction step of marketing activity clicking; in the process of establishing a prediction model, Bayesian inference is effectively utilized, and prediction uncertainty is introduced into a Bayesian neural network, so that the Bayesian neural network model has stronger robustness. And the features are crossed to extract the high-dimensional recessive features by adopting an inner/outer product combination method. Therefore, the method can effectively expand the application of deep learning to the algorithm problem of advertisement calculation and recommendation systems, and obviously improve the accuracy of user click behavior prediction.

Description

Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network
Technical Field
The invention relates to the technical field of artificial intelligence marketing in the Internet, in particular to a marketing prediction method combining inner/outer product feature interaction and a Bayesian neural network.
Background
The network advertising marketing is maximally spread to audience crowds by means of network marketing, and advertisements are put into targeted customers by means of a network platform. In calculating advertisement and recommendation system algorithms, commonly used algorithms include linear models such as Logistic Regression (LR), Factorization Machine (FM), and the like.
The algorithms have the characteristics of good interpretability and simple algorithm implementation, however, the algorithms are simple and have limited expression capacity. Therefore, the algorithms are difficult to extract high-order interaction information among the features, so that the overall performance of the algorithms is affected.
In addition, with the successful application of deep learning algorithms in many fields, such as Natural Language Processing (NLP), Computer Vision (CV), etc., deep learning models are also being increasingly applied in the field of mainstream advertising and recommendation systems.
Although the deep learning model has the advantages of automatic feature extraction and end-to-end learning, which are not possessed by many traditional algorithms, the deep learning model also has the following obvious disadvantages in the application of computing advertisements and recommending systems:
firstly, in most data sets of recommendation systems, a large-dimension sparse matrix, namely a matrix consisting of 0 and 1, is formed, and a certain difficulty exists in a deep learning model based on gradient descent; meanwhile, a large sparse matrix also causes large computational power consumption and excessive computation time. Therefore, how to reduce feature dimensions and effectively extract feature interaction information provides higher requirements for the design of feature engineering and algorithms.
Secondly, preventing the over-fitting phenomenon is a very important problem in the deep learning algorithm. In general, the risk of overfitting the model can be reduced by using an early-stopping mechanism, weight attenuation, L1-L2 regularization, Dropout and the like. However, for the precise positioning and placement problem in advertisement marketing, the uncertainty measurement in the model is also needed to be considered, and an over-confidence algorithm decision cannot obtain good benefit in actual advertisement placement. Therefore, how to add uncertainty measurement into a network architecture enables the reliability of algorithm decision to be higher, overfitting is effectively prevented, and the method is one of key technologies for applying deep learning to computing advertisements and recommending system problems and breaking through the problems.
Thirdly, the traditional deep learning model completes the crossing and combination of features directly through a plurality of layers of full connection layers, but the mode lacks certain pertinence.
Firstly, the full connection layer does not intersect aiming at different characteristic domains;
second, the operation of the fully connected layer is not directly designed for feature crossing.
Therefore, we need to develop a deep learning model with the ability to characterize different data patterns for specific services.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a marketing prediction method combining inner/outer product feature interaction and a Bayesian neural network, which comprises a data preprocessing step S1, a data set dividing step S2 and a model establishing step S3;
the data preprocessing step S1 includes the steps of:
step S11: acquiring original information of a user, and extracting original characteristic information from the original information of the user; the original characteristic information comprises a user ID, a user mobile phone number attribution, a task batch number, a DPI (deep packet inspection) visited by a user on the same day, a DPI visited frequency of the user, a user visited time and/or a user visited duration; the task batch number represents original information of a user in a date time period, the DPI access frequency of the user, the DPI access time of the user and/or the user access time are/is each task batch number as a metering unit, and the DPI access time of the user on the same day and the attribution feature of the mobile phone number of the user are category features;
step S12: processing the category characteristics; performing One-hot coding processing on the attribution characteristics of the user mobile phone number and the DPI accessed by the user; wherein the One-hot encoding process comprises:
sequentially expanding all different user access DPIs as independent features according to the task batch numbers, and expanding the DPI access frequency in the task batch numbers into the relationship features of the DPI and the DPI access frequency of the users according to all different user access DPIs;
step S13: processing the continuous features; mapping access time and access duration data of different dimensions to a uniform interval, and adjusting the data distribution to approximate Gaussian distribution;
step S14: performing dimensionality reduction on the high-dimensional feature by adopting principal component analysis;
the data set dividing step S2 includes the steps of:
step S21: after preprocessing, regarding the attribution feature and the feature whether a user visits the DPI or not on the same day as a sparse feature, and defining the frequency of the user visiting the DPI as a continuous feature;
step S22: forming training set data according to historical data of which the time sequence before the time point t +1 needs to be predicted is 1,2, … t-1 time points; taking the data corresponding to the time point t as a local verification set;
the model building step S3 includes the steps of:
step S31: providing an initial Bayesian neural network model, taking the class characteristics in the training set data as M1 dimensional characteristic information of an input layer of the Bayesian neural network, inputting the M1 dimensional characteristic information into an embedding layer of the Bayesian neural network for information extraction and dimension reduction, and reducing the M1 dimensional characteristic information to M2 dimensional characteristic information; wherein M2 is less than M1, and the Bayesian neural network comprises an input layer, an embedding layer, a multiplication layer, a factorization layer, a full-link layer and an output layer;
step S32: adding continuous M3 dimensional features to the M2 dimensional features after dimensionality reduction to form M dimensional features, and performing multiplication operation of inner products and outer products on the M dimensional features in a multiplication and accumulation layer to enable feature information of the M dimensional features to be interacted;
step S33: in the factorization layer, factorizing the weight matrix of the M-dimensional features by adopting a factorization method;
step S34: inputting the information of the M-dimensional features into the full-connection layer for training to obtain a trained Bayesian neural network model, wherein the Bayesian neural network model is a user prediction model with two output layer neurons; and verifying the user prediction model by adopting the local verification centralized data.
Further, the marketing prediction method combining the inner/outer product feature interaction and the bayesian neural network further includes step S35, where the model evaluation index processing and the tuning processing are performed on the user prediction model.
Further, the model evaluation index includes a logarithmic loss function, a relative information gain RIG and an AUC value.
Further, the model evaluation index is an AUC value, and if the AUC value is smaller than a predetermined threshold, the model tuning process is performed on the user prediction model.
Further, the model tuning process includes one or more of the following steps:
firstly, batch normalization is added, and the problem of internal covariate deviation of data is solved;
secondly, adding a function of leading part of neurons to be in a dormant state in the training process in the network;
thirdly, adjusting the learning rate, wherein the learning rate in the training process is generally adjusted through strategies such as exponential attenuation and the like;
setting multi-seed training for averaging to better solve the problem of insufficient generalization capability caused by large data variance;
increasing L1 or L2 regularization, and applying punishment to the loss function to reduce the risk of overfitting;
sixthly, an optimization method for the over-parameters.
Further, the optimization method for the hyperparameters adopts a Bayesian optimization strategy.
Further, the continuous features are processed by using a RankGauss method.
Further, after step S11, the marketing prediction method combining the inner/outer product feature interaction and the bayesian neural network further includes the steps of performing anomaly detection and processing on the original information of the user.
Further, the marketing prediction method combining the inner/outer product feature interaction and the Bayesian neural network further comprises a model prediction step S4, wherein a task of accurate marketing of the screened user at the time point t +1 needing to be predicted is obtained according to the user prediction model.
Further, the hierarchical node distribution morphology of the bayesian neural network model comprises: incremental creating, invariant holding, diamond, or decremental creating.
According to the technical scheme, the marketing prediction method combining the inner/outer product feature interaction and the Bayesian neural network can effectively utilize Bayesian inference and introduce prediction uncertainty into the Bayesian neural network, so that a Bayesian neural network model has stronger robustness. By the method of combining inner/outer products, the features are crossed by the multiplication operation of the inner product and the outer product to extract the high-dimensional recessive features. The combination of the inner/outer product feature interaction and the Bayesian neural network model can effectively expand the application of deep learning to the algorithm problem of advertisement calculation and recommendation system, and obviously improve the accuracy of user click behavior prediction.
Drawings
FIG. 1 is a diagram illustrating an overall network structure according to an embodiment of the present invention
FIG. 2 is a schematic diagram of four types of node distributions of a Bayesian neural network hierarchy in an embodiment of the present invention
FIG. 3 is a flow chart illustrating a marketing prediction method combining inner/outer product feature interaction and a Bayesian neural network according to an embodiment of the present invention
FIG. 4 is a schematic diagram illustrating the operation of the inner product (A) and the outer product (B) in an embodiment of the present invention
FIG. 5 is a diagram illustrating factorization of weight matrices according to an embodiment of the present invention
FIG. 6 is a diagram illustrating a comparison between weights (left) and weights (right) of a conventional deep learning network
Detailed Description
The following description of the embodiments of the present invention will be made in detail with reference to the accompanying drawings 1 to 6.
In the following detailed description of the embodiments of the present invention, in order to clearly illustrate the structure of the present invention and to facilitate explanation, the structure shown in the drawings is not drawn to a general scale and is partially enlarged, deformed and simplified, so that it should be understood as a limitation of the present invention.
In the following embodiments of the present invention, the marketing prediction method combining the inner/outer product feature interaction and the bayesian neural network is configured in the overall structure of the bayesian neural network model. Referring to fig. 1, fig. 1 is a schematic diagram illustrating an overall network structure according to an embodiment of the present invention. As shown in fig. 1, the bayesian neural network includes an Input layer, an embedded layer, a Product layer, a Factorization layer, a Fully-connected layer, and an Output layer.
Specifically, the input layer is used for processing the received preprocessed and data set divided feature data, then the feature data is processed by the embedding layer, the multiplication layer and the factorization layer, and then the feature data is input into the full connection layer and the output layer.
Firstly, after the category characteristics in the input original characteristic information characteristics are subjected to One-hot encoding, dividing the category characteristics into different domains (fields) according to characteristic properties (such as the age and the gender included by a user ID, DPI user access time and/or user access time length and other information); then after embedding processing of an embedding layer, carrying out interaction of feature information through an inner product or an outer product between features; then, calculating posterior distribution by introducing prior Gaussian distribution assumption to network parameters and deducing and minimizing Kullback-Leibler divergence through variation to obtain updated network weight; and finally obtaining the final neural network model.
Compared with the traditional technology adopted in the field of data marketing by utilizing operator data, the method can effectively extract the interactive information of the features and reduce the feature dimension through the ingenious design of feature engineering and algorithm, simultaneously, uncertainty measurement is added into a network architecture, so that the reliability of algorithm decision is higher, overfitting is effectively prevented, and a deep learning model (a Lepis neural network model) with the capability of representing different data modes is obtained.
Referring to fig. 2, fig. 2 is a schematic diagram illustrating four forms of hierarchical node distribution of a bayesian neural network according to an embodiment of the present invention. As shown in fig. 2, the hierarchical node distribution form of the bayesian neural network model includes: incremental creating, invariant holding, diamond, or decremental creating. The selection of the four forms of the hierarchical node distribution can be performed according to different needs of services, and details are not described herein.
Referring to fig. 3, fig. 3 is a flow chart illustrating a marketing prediction method combining inner/outer product feature interaction and a bayesian neural network according to an embodiment of the present invention. As shown in fig. 3, the marketing prediction method combining the inner/outer product feature interaction and the bayesian neural network includes a data preprocessing step S1, a data set partitioning step S2, a model building step S3, and a model prediction step S4.
In an embodiment of the present invention, the data preprocessing step S1 includes the following steps:
step S11: acquiring original information of a user, and extracting original characteristic information from the original information of the user; the original characteristic information comprises a user ID, a user mobile phone number attribution, a task batch number, a user access DPI frequency, a user access time and/or a user access duration; the task batch number represents original information of a user in a date time period, the DPI access frequency of the user, the DPI access time of the user and/or the user access time are/is each task batch number as a metering unit, and the DPI access and the mobile phone number attribution of the user are characterized by category.
Referring to table 1 below, table 1 is a table description of raw data before preprocessing, and taking the data of the same batch as an example, the raw data before preprocessing is shown in table 1 below:
table 1:
Figure BDA0002923675420000061
preferably, in the embodiment of the present invention, in step S11, the method may further include the step of performing anomaly detection and processing, category feature processing, continuous feature processing, and dimension reduction processing on the raw information data of the user.
Abnormality detection and processing: in the process of combining the service requirements, deletion, filling and other processing are required for missing values, overlarge values and the like in the original data. In the data acquisition process, as the number of general users is in the million level, the missing condition may occur in the data acquisition process; if the missing amount is small, the removal can be generally directly carried out; if it is impossible to determine whether the missing data will affect the final model training effect, the missing value can be filled up by taking the average, mode, median, etc.
In addition, in data acquisition, a problem of an excessively large value may be encountered, for example, a user accesses the DPI ten thousand times within a day, which generally does not help to improve the generalization capability of the model in the actual modeling process, and therefore, a culling process or a padding method may be adopted to perform corresponding processing.
Step S12: processing the category characteristics; performing One-hot coding processing on the attribution characteristics of the user mobile phone number and the DPI accessed by the user; and the One-hot coding processing comprises the steps of expanding all different user access DPIs as independent features according to the task batch numbers in sequence, and expanding the frequency of the user access DPIs in the task batch numbers into the relationship features of the DPIs and the frequency of the user access DPIs according to all the different user access DPIs.
Specifically, firstly, One-hot unique coding can be performed on the DPI accessed by the user and the attribution characteristics of the mobile phone number of the user, and the One-hot unique coding is expanded. Taking a user accessing the DPIs as an example, if a certain user accesses a certain DPI, recording the DPI as 1, and recording the rest DPIs as 0; thus, if there are 10 different DPIs, 10 columns of features are formed, and only one corresponding user in each column of features is 1, and the rest are 0.
Step S13: processing the continuous features; that is, the access time and access duration data of different dimensions are mapped to a uniform interval, and the data distribution is adjusted to approximate to Gaussian distribution.
Step S14: and performing dimensionality reduction on the high-dimensional feature by adopting Principal Component Analysis (PCA).
Specifically, as can be seen from the above processing of the class characteristics, a high-dimensional sparse matrix is generally formed after the one-hot coding, which means that there is no way to derive in many places when the error propagates reversely for the training of the bayesian neural network, which is obviously not beneficial to the training of the bayesian neural network.
At the same time, the high dimensional features also increase computational overhead. Therefore, it is necessary to perform dimensionality reduction on the high-dimensional features first. As is clear to a person skilled in the art, the Principal Component Analysis (PCA) achieves the purpose of reducing the dimension by solving the problem that the variance of original data in a certain projection direction is the largest; the method can reduce the dimension of the features and simultaneously reduce the loss of information contained in the original features as much as possible so as to achieve the aim of comprehensively analyzing the collected data.
After the pretreatment of the above steps, the data format is shown in the following table 2:
Figure BDA0002923675420000081
next, the data set dividing step S2 may be performed, and in an embodiment of the present invention, the feature of whether the user access DPI feature clicks or not may be regarded as a sparse feature, and the attribution feature and the user access DPI frequency count may be defined as a continuous feature. Since Click-Through-Rate (CTR) problems typically involve a significant chronological order, i.e. what needs to be predicted is the user's behavior at the next point in time. Therefore, the history data before the history data, i.e., the time series, is generally regarded as training data (training data); and the local verification (verification data) is performed on the data corresponding to the time point.
The data set dividing step S2 specifically includes the following steps:
step S21: after preprocessing, regarding the attribution feature and the feature whether a user visits the DPI or not on the same day as a sparse feature, and defining the frequency of the user visiting the DPI as a continuous feature;
step S22: forming training set data according to historical data of which the time sequence before the time point t +1 needs to be predicted is 1,2, … t-1 time points; and the data corresponding to the time point t is used as a local verification set.
Training and validation of the user prediction model may then be performed by performing model building step S3. In an embodiment of the present invention, the user prediction model is a bayesian neural network model.
Specifically, step S3 may specifically include:
step S31: providing a Bayesian neural network initial model, taking the class characteristics in the training set data as M1 dimensional characteristic information of an input layer of the Bayesian neural network, inputting the M1 dimensional characteristic information into an embedding layer of the Bayesian neural network for information extraction and dimension reduction, and reducing the M1 dimensional characteristic information to M2 dimensional characteristic information; wherein M2 is less than M1.
As shown in fig. 1, the bayesian neural network comprises an input layer, an embedding layer, a multiplication layer, a factorization layer, a full-link layer, and an output layer. It is assumed that the original features are preprocessed to form N and different domains (numbered Field1, Field2, Field3, …, Field dn).
The N-domain features (Feature1, Feature1, Feature2, Feature3, …, Feature enr) form a high-dimensional sparse matrix due to the one-hot encoding step. Therefore, an embedding layer is added in the network structure, embedding (embedding) processing is carried out on the features, and extraction and dimension reduction are carried out on sparse feature information once to obtain a low-dimensional vector layer.
Step S32: and adding continuous features of M3 dimensions to the M2-dimensional features after dimension reduction to form M-dimensional features, and performing multiplication operation of inner products and outer products on the M-dimensional features in a multiplication and accumulation layer to enable feature information of the M-dimensional features to be interacted.
In the feature dimension reduction steps S31 and S32, the output layer of the decoder portion uses a sigmoid function as an activation function, and the output value of the output layer of the decoder portion is a value between 0 and 1; the other layers of activation function all use the ReLU activation function.
Referring to fig. 4, fig. 4 is a schematic diagram illustrating the operation of the inner product (a) and the outer product (B) according to the embodiment of the present invention. Different from the operation of directly adding a full connection layer in the deep learning model in the prior art, the method also needs to perform multiplication operation of an inner product and an outer product on the embedded features to interact feature information.
As can be seen from fig. 4, for the outer product operation to obtain a matrix, if the matrix has only diagonal values, it becomes the result of the inner product operation, so the inner product operation can be regarded as a special case of the outer product operation. In this way, the relationship between two different domains can be measured.
It is clear to those skilled in the art that the parameters of the general model will rise after the inner or outer product operation on the features. In order to reduce the computation consumption, a factorization (factorization) method may be used to convert a large weight matrix into a product of a small weight matrix and the transpose of the matrix. That is, step S33 is executed to factorize the weight matrix of the M-dimensional feature in the factorization layer by using a factorization method.
Referring to fig. 5, fig. 5 is a diagram illustrating a factorization operation of a weight matrix according to an embodiment of the present invention. After the above steps are completed, step S34 may be executed, that is, the information of the M-dimensional features is input into the full connection layer for training, so as to obtain a trained bayesian neural network model, where the bayesian neural network model is a user prediction model with two output layer neurons; and verifying the user prediction model by adopting the local verification centralized data.
Referring to fig. 6, fig. 6 is a schematic diagram illustrating a comparison between the weights (left) of the conventional deep learning network and the weights (right) of the bayesian network, for the bayesian neural network model, which is different from the conventional deep learning network model in that the connection weights between the networks are not a constant, but a distribution, and the distribution is obtained by bayesian inference.
In an embodiment of the present invention, the algorithmic description of the bayesian network may be as follows:
(ii) from N (. mu.,. log (1+ e)ρ) ) sampling to obtain an initial weight omega of the network;
secondly, calculating log q (omega | theta), log p (omega), log p (y | omega, x) respectively;
thirdly, calculating a loss function
Figure BDA0002923675420000101
Fourthly, updating the network parameter theta ═ theta-alpha ^θL。
Further, after the model training is completed, the marketing prediction method combining the inner/outer product feature interaction and the bayesian neural network further includes step S35, where the model evaluation index processing and the tuning processing are performed on the user prediction model.
The model evaluation index may generally include Log loss function (Log loss), Relative Information Gain (RIG), and auc (area under ROC curve) values. Generally, the closer the AUC value is to 1, the better the classification effect of the user prediction model.
For example, after the data are processed according to the above steps and trained by the model, the training effect of the model can be judged through the locally verified AUC value; the model evaluation index is that AUC value is smaller than a preset threshold, then model tuning processing is carried out on the user prediction model, if the effect is poor, the model generally needs to be tuned and optimized, and for the deep learning algorithm, optimization can be generally carried out from the following aspects:
adding Batch Normalization (Batch Normalization) to solve the Internal Covariate Shift problem of data.
Secondly, Dropout (the number of the neurons in a dormant state) is added in the network, namely, part of the neurons are in a dormant state in the training process.
And thirdly, adjusting the learning rate, wherein the learning rate in the training process is generally adjusted through strategies such as exponential attenuation and the like.
And fourthly, setting multiple seed training for averaging, and reducing the overfitting risk in the training process.
Increasing L1 or L2 regularization, and applying punishment to the loss function to reduce the risk of overfitting.
Sixthly, an optimization method of the super parameters.
In the optimization method of the hyper-parameter, a Grid Search (Grid Search) or a Random Search (Random Search) can be generally adopted; however, the two methods are relatively high in consumption of computing resources and are not efficient. In an embodiment of the present invention, a Bayesian Optimization (Bayesian Optimization) strategy is employed. Bayesian optimization calculates posterior probability distribution of the previous n data points through Gaussian process regression to obtain the mean value and variance of each hyper-parameter at each value-taking point; bayesian optimization finally selects a group of better hyper-parameters through balancing mean and variance and according to the joint probability distribution among the hyper-parameters.
After all the processing steps are completed, the characteristics are brought into a user prediction model, so that partial users with high willingness can be screened out in advance before advertisement putting, and accurate putting of marketing advertisements is carried out on the users.
Namely, the method can further comprise a model prediction step S4, wherein a task of accurate marketing is obtained for the screened user at the time point t +1 needing prediction according to the user prediction model.
The result shows that the click rate of the high-intention user selected by the user prediction model algorithm is about 10 times that of the low-intention user. Through the user prediction model, a large number of low-intention users can be directly screened out from the putting targets, so that a large amount of marketing cost is saved, and the increase of profit margin is realized.
The above description is only for the preferred embodiment of the present invention, and the embodiment is not intended to limit the scope of the present invention, so that all the equivalent structural changes made by using the contents of the description and the drawings of the present invention should be included in the scope of the present invention.

Claims (10)

1. A marketing prediction method combining inner/outer product feature interaction and a Bayesian neural network is characterized by comprising a data preprocessing step S1, a data set dividing step S2 and a model establishing step S3;
the data preprocessing step S1 includes the steps of:
step S11: acquiring original information of a user, and extracting original characteristic information from the original information of the user; the original characteristic information comprises a user ID, a user mobile phone number attribution, a task batch number, a DPI (deep packet inspection) visited by a user on the same day, a DPI visited frequency of the user, a user visited time and/or a user visited duration; the task batch number represents original information of a user in a date time period, the DPI access frequency of the user, the DPI access time of the user and/or the user access time are/is each task batch number as a metering unit, and the DPI access time of the user on the same day and the attribution feature of the mobile phone number of the user are category features;
step S12: processing the category characteristics; performing One-hot coding processing on the attribution characteristics of the user mobile phone number and the DPI accessed by the user; wherein the One-hot encoding process comprises:
sequentially expanding all different user access DPIs as independent features according to the task batch numbers, and expanding the DPI access frequency in the task batch numbers into the relationship features of the DPI and the DPI access frequency of the users according to all different user access DPIs;
step S13: processing the continuous features; mapping access time and access duration data of different dimensions to a uniform interval, and adjusting the data distribution to approximate Gaussian distribution;
step S14: performing dimensionality reduction on the high-dimensional feature by adopting principal component analysis;
the data set dividing step S2 includes the steps of:
step S21: after preprocessing, regarding the attribution feature and the feature whether a user visits the DPI or not on the same day as a sparse feature, and defining the frequency of the user visiting the DPI as a continuous feature;
step S22: forming training set data according to historical data of which the time sequence before the time point t +1 needs to be predicted is 1,2, … t-1 time points; taking data corresponding to the time point t as a verification set;
the model building step S3 includes the steps of:
step S31: providing a Bayesian neural network initial model, taking the class characteristics in the training set data as M1 dimensional characteristic information of an input layer of the Bayesian neural network, inputting the M1 dimensional characteristic information into an embedding layer of the Bayesian neural network for information extraction and dimension reduction, and reducing the M1 dimensional characteristic information to M2 dimensional characteristic information; wherein M2 is less than M1, and the Bayesian neural network comprises an input layer, an embedding layer, a multiplication layer, a factorization layer, a full-link layer and an output layer;
step S32: adding continuous M3 dimensional features to the M2 dimensional features after dimensionality reduction to form M dimensional features, and performing multiplication operation of inner products and outer products on the M dimensional features in a multiplication and accumulation layer to enable feature information of the M dimensional features to be interacted;
step S33: in the factorization layer, factorizing the weight matrix of the M-dimensional features by adopting a factorization method;
step S34: inputting the information of the M-dimensional features into the full-connection layer for training to obtain a trained Bayesian neural network model, wherein the Bayesian neural network model is a user prediction model with two output layer neurons; and verifying the user prediction model by adopting the local verification centralized data.
2. The marketing prediction method combining inner/outer product feature interaction and Bayesian neural network as claimed in claim 1, further comprising step S35 of performing model evaluation index processing and tuning processing on the user prediction model.
3. The marketing prediction method combining inner/outer product feature interaction and Bayesian neural networks as claimed in claim 2, wherein said model evaluation index comprises using log-loss function, relative information gain RIG and AUC values.
4. The marketing prediction method combining inner/outer product feature interaction and Bayesian neural network of claim 3, wherein the model evaluation index is an AUC value, and if the AUC value is smaller than a predetermined threshold, model tuning processing is performed on the user prediction model.
5. The marketing prediction method combining inner/outer product feature interaction and Bayesian neural networks as claimed in claim 2, wherein said model tuning process comprises one or more of:
batch normalization is added, and the problem of internal covariate deviation of data is solved;
adding a function of leading part of neurons to be in a dormant state in a training process in a network;
adjusting the learning rate, generally adjusting the learning rate in the training process through strategies such as exponential attenuation and the like;
setting multiple sub-training averaging to better solve the problem of insufficient generalization capability caused by large data variance;
adding L1 or L2 regularization, and applying penalties to the loss function to reduce the risk of overfitting;
and (3) optimizing the hyper-parameters.
6. The marketing prediction method combining inner/outer product feature interaction and Bayesian neural networks as claimed in claim 5, wherein said optimization method for hyperparameters employs a Bayesian optimization strategy.
7. The marketing prediction method combining inner/outer product feature interaction and bayesian neural networks according to claim 1, wherein the processing of the continuous features is by using a RankGauss method.
8. The marketing prediction method combining inner/outer product feature interaction and Bayesian neural network as claimed in claim 1, further comprising an anomaly detection and processing step for the user' S raw information after step S11.
9. The marketing prediction method combining the inner/outer product feature interaction and the Bayesian neural network as recited in claim 1, further comprising a model prediction step S4, wherein a task of accurate marketing is performed by the screened user at a time point t +1 to be predicted is obtained according to the user prediction model.
10. The marketing prediction method combining inner/outer product feature interaction and the Bayesian neural network as recited in claim 1, wherein the Bayesian neural network model hierarchy node distribution morphology comprises: incremental creating, invariant holding, diamond, or decremental creating.
CN202110125002.3A 2021-01-29 2021-01-29 Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network Active CN112819523B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110125002.3A CN112819523B (en) 2021-01-29 2021-01-29 Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110125002.3A CN112819523B (en) 2021-01-29 2021-01-29 Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network

Publications (2)

Publication Number Publication Date
CN112819523A true CN112819523A (en) 2021-05-18
CN112819523B CN112819523B (en) 2024-03-26

Family

ID=75860166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110125002.3A Active CN112819523B (en) 2021-01-29 2021-01-29 Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network

Country Status (1)

Country Link
CN (1) CN112819523B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240025A (en) * 2021-05-19 2021-08-10 电子科技大学 Image classification method based on Bayesian neural network weight constraint
CN113344615A (en) * 2021-05-27 2021-09-03 上海数鸣人工智能科技有限公司 Marketing activity prediction method based on GBDT and DL fusion model
TWI773507B (en) * 2021-09-01 2022-08-01 國立陽明交通大學 Algorithm and device for predicting system reliability

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090138386A1 (en) * 2007-11-26 2009-05-28 Wachovia Corporation Interactive statement
US20120310737A1 (en) * 2011-06-03 2012-12-06 Korea Advanced Institute Of Science And Technology Method for providing advertisement, computer-readable medium including program for performing the method and advertisement providing system
WO2019018533A1 (en) * 2017-07-18 2019-01-24 Neubay Inc Neuro-bayesian architecture for implementing artificial general intelligence
CN109831801A (en) * 2019-01-04 2019-05-31 东南大学 The node B cache algorithm of user's behavior prediction based on deep learning neural network
CN110619540A (en) * 2019-08-13 2019-12-27 浙江工业大学 Click stream estimation method of neural network
CN110956497A (en) * 2019-11-27 2020-04-03 桂林电子科技大学 Method for predicting repeated purchasing behavior of user of electronic commerce platform
CN112149352A (en) * 2020-09-23 2020-12-29 上海数鸣人工智能科技有限公司 Prediction method for marketing activity clicking by combining GBDT automatic characteristic engineering
CN112258223A (en) * 2020-10-13 2021-01-22 上海数鸣人工智能科技有限公司 Marketing advertisement click prediction method based on decision tree

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090138386A1 (en) * 2007-11-26 2009-05-28 Wachovia Corporation Interactive statement
US20120310737A1 (en) * 2011-06-03 2012-12-06 Korea Advanced Institute Of Science And Technology Method for providing advertisement, computer-readable medium including program for performing the method and advertisement providing system
WO2019018533A1 (en) * 2017-07-18 2019-01-24 Neubay Inc Neuro-bayesian architecture for implementing artificial general intelligence
CN109831801A (en) * 2019-01-04 2019-05-31 东南大学 The node B cache algorithm of user's behavior prediction based on deep learning neural network
CN110619540A (en) * 2019-08-13 2019-12-27 浙江工业大学 Click stream estimation method of neural network
CN110956497A (en) * 2019-11-27 2020-04-03 桂林电子科技大学 Method for predicting repeated purchasing behavior of user of electronic commerce platform
CN112149352A (en) * 2020-09-23 2020-12-29 上海数鸣人工智能科技有限公司 Prediction method for marketing activity clicking by combining GBDT automatic characteristic engineering
CN112258223A (en) * 2020-10-13 2021-01-22 上海数鸣人工智能科技有限公司 Marketing advertisement click prediction method based on decision tree

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
何慧;: "基于Ranking的贝叶斯序列推荐算法", 小型微型计算机***, no. 07 *
刘振鹏;尹文召;王文胜;孙静薇;: "HRS-DC:基于深度学习的混合推荐模型", 计算机工程与应用, no. 14 *
吴英: "基于贝叶斯方法的网络广告预测模型研究", 中国优秀硕士学位论文 *
夏国恩;金炜东;: "基于支持向量机的客户流失预测模型", ***工程理论与实践, no. 01 *
李诗文;潘善亮;: "基于注意力机制的神经网络贝叶斯群组推荐算法", 计算机应用与软件, no. 05 *
陈巧红;董雯;孙麒;贾宇波;: "基于门控循环单元神经网络的广告点击率预估", 浙江理工大学学报(自然科学版), no. 05 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240025A (en) * 2021-05-19 2021-08-10 电子科技大学 Image classification method based on Bayesian neural network weight constraint
CN113240025B (en) * 2021-05-19 2022-08-12 电子科技大学 Image classification method based on Bayesian neural network weight constraint
CN113344615A (en) * 2021-05-27 2021-09-03 上海数鸣人工智能科技有限公司 Marketing activity prediction method based on GBDT and DL fusion model
CN113344615B (en) * 2021-05-27 2023-12-05 上海数鸣人工智能科技有限公司 Marketing campaign prediction method based on GBDT and DL fusion model
TWI773507B (en) * 2021-09-01 2022-08-01 國立陽明交通大學 Algorithm and device for predicting system reliability

Also Published As

Publication number Publication date
CN112819523B (en) 2024-03-26

Similar Documents

Publication Publication Date Title
CN112819523B (en) Marketing prediction method combining inner/outer product feature interaction and Bayesian neural network
CN111797321B (en) Personalized knowledge recommendation method and system for different scenes
CN112967088A (en) Marketing activity prediction model structure and prediction method based on knowledge distillation
CN113344615B (en) Marketing campaign prediction method based on GBDT and DL fusion model
US10963802B1 (en) Distributed decision variable tuning system for machine learning
CN113255844B (en) Recommendation method and system based on graph convolution neural network interaction
CN112258223B (en) Marketing advertisement click prediction method based on decision tree
CN113591971B (en) User individual behavior prediction method based on DPI time sequence word embedded vector
CN110619540A (en) Click stream estimation method of neural network
CN110110372B (en) Automatic segmentation prediction method for user time sequence behavior
CN111611488A (en) Information recommendation method and device based on artificial intelligence and electronic equipment
Grob et al. A recurrent neural network survival model: Predicting web user return time
CN111178986A (en) User-commodity preference prediction method and system
CN111984842B (en) Bank customer data processing method and device
CN111815066B (en) User click prediction method based on gradient lifting decision tree
CN111428181A (en) Bank financing product recommendation method based on generalized additive model and matrix decomposition
CN112581177B (en) Marketing prediction method combining automatic feature engineering and residual neural network
CN113256024B (en) User behavior prediction method fusing group behaviors
CN113360772B (en) Interpretable recommendation model training method and device
Santos et al. Microservices performance forecast using dynamic Multiple Predictor Systems
CN115293800A (en) Prediction method aiming at internet click rate prediction based on shadow feature screening
KR102343579B1 (en) Method for providing service using parents predicting model
CN113793187B (en) Click rate prediction method and system based on instance weight balance and dual attention
CN115271784A (en) Click prediction method for feature interaction and pseudo tag learning based on genetic algorithm
CN115935178A (en) Prediction integration modeling method based on label-free sample learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 200436 room 406, 1256 and 1258 Wanrong Road, Jing'an District, Shanghai

Applicant after: Shanghai Shuming Artificial Intelligence Technology Co.,Ltd.

Address before: Room 1601-026, 238 JIANGCHANG Third Road, Jing'an District, Shanghai, 200436

Applicant before: Shanghai Shuming Artificial Intelligence Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant