CN111352976A

CN111352976A - Search advertisement conversion rate prediction method and device for shopping nodes

Info

Publication number: CN111352976A
Application number: CN202010146512.4A
Authority: CN
Inventors: 赖粤; 钱毅霖; 余荣; 吴茂强
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2020-03-05
Filing date: 2020-03-05
Publication date: 2020-06-30
Anticipated expiration: 2040-03-05
Also published as: CN111352976B

Abstract

The application discloses a method and a device for predicting search advertisement conversion rate aiming at shopping nodes, wherein the method comprises the following steps: acquiring a shopping data set on the current day of a shopping festival and before the shopping festival; taking a shopping data set before a shopping festival as a first training set, training an advertisement conversion rate model, and predicting to obtain a first prediction result of the current day of the shopping festival; taking a shopping data set of a part of time of the day of the shopping festival as a second training set, taking a shopping data set of another part of time of the day of the shopping festival as a test set, taking part of data in the second training set as a verification set, and respectively adding a first prediction result serving as a new feature into the second training set, the verification set and the test set; and training an advertisement conversion rate model to obtain a final prediction result of the current shopping festival. The problem of inaccurate prediction results caused by the training set constructed in the daily period predicting shopping time and the test set constructed in the daily period is solved.

Description

Search advertisement conversion rate prediction method and device for shopping nodes

Technical Field

The application relates to the technical field of data mining, in particular to a method and a device for predicting search advertisement conversion rate aiming at shopping nodes.

Background

The search advertisements are advertisements related to the search terms that are presented in the return interface according to the user's search behavior. The e-commerce platform, as a complex system, can be affected by various factors. In shopping malls such as henry, 618, various activities of merchants and platforms can result in dramatic changes in traffic distribution, and we refer to this particular period of ad conversion-related data as a particular traffic. And the model trained in daily period is difficult to match with special flow effectively.

For the problems, the most similar implementation scheme of the invention is to take the conversion rate prediction as the traditional regression prediction problem, divide the data set by a method of sliding a time window, use a plurality of different machine learning algorithms as models, and use a Stacking method for fusion to obtain the final result.

The prior art always migrates a traditional regression problem prediction method to a conversion rate prediction problem without modification, or rarely distinguishes the conversion rate prediction method under two data distributions of a daily period and a shopping mall period, so the prior art has the following defects that 1, a training set constructed in the daily period is not adapted, but is directly used for predicting a test set constructed in the shopping mall period, 2, a high-dimensional feature is coded by using a unique heat vector, or all the features are coded by using the same method without distinction, the coding methods have poor effects and large promotion space, 3, the prior art uses a fixed time window for extracting the features, but the shopping mall period and the period nearby have various different data distributions, another mode of data set division is needed, the waste of data is avoided, 4, the prior art adopts a smooth window for processing the features when the conversion rate features are calculated, but the prior art adopts a plurality of weighted prediction methods which respectively adopt different weighting methods, and respectively adopt a plurality of weighted prediction methods, and the weighted prediction method is used for determining more serious learning efficiency when a plurality of weighted prediction methods are adopted, and the weighted prediction method adopts a plurality of weighted prediction methods which are different weighting and are used for determining more different models, and the weighted prediction methods are adopted when the weighted prediction method is adopted, and the weighted prediction method for calculating more serious learning efficiency is adopted, and the model is adopted.

Disclosure of Invention

The application provides a method and a device for predicting the conversion rate of search advertisements aiming at shopping sessions, so that the problem of inaccurate prediction results caused by a test set constructed in a period of predicting shopping sessions by a training set constructed in a daily period is solved.

In view of the above, a first aspect of the present application provides a method for predicting a search advertisement conversion rate for a shopping node, the method comprising:

acquiring a shopping data set on the current day of a shopping festival and before the shopping festival;

taking a shopping data set before the shopping festival as a first training set, training an advertisement conversion rate model, and predicting to obtain a first prediction result of the current shopping festival;

taking the shopping data set of a part of time of the day of the shopping festival as a second training set, taking the shopping data set of another part of time of the day of the shopping festival as a test set, taking part of data in the second training set as a verification set, and respectively adding the first prediction result serving as a new feature into the second training set, the verification set and the test set;

and training the advertisement conversion rate model by adopting the second training set, the verification set and the test set added with new features to obtain the final result of the current shopping festival.

Optionally, the training the advertisement conversion rate model by using the shopping data set before the shopping festival as a first training set to predict a first prediction result of the current day of the shopping festival further includes:

and preprocessing the data in the shopping data set to obtain data in accordance with the input format of the advertisement conversion rate model.

Optionally, the preprocessing the data in the shopping data set specifically includes:

processing missing values and abnormal values in the shopping data set;

processing the original attribute characteristics of the search advertisement categories in the shopping data set to obtain characteristic data meeting format requirements;

and coding the characteristic data by adopting a hierarchical coding method based on an information entropy principle.

Optionally, the encoding the feature data by using a hierarchical encoding method based on an information entropy principle specifically includes:

inputting the characteristic data;

comparing the characteristic data with data in a characteristic library, and if the characteristic library does not contain the characteristic data, calculating a score of the characteristic data by using an information entropy formula;

if the score is greater than a first threshold, discarding the feature data;

if the score is larger than a second threshold and smaller than a first threshold, encoding the characteristic data by adopting a one-hot encoding method;

if the score is smaller than a third threshold value and the feature data are not id features, encoding the feature data by adopting a mean encoding method; if the score is smaller than a third threshold value and the feature data are id features, encoding the feature data by adopting an Embedding encoding method;

and outputting the coded characteristic data.

Optionally, the feature data includes a conversion rate feature, a user feature, a commodity feature, an id feature, a search advertisement feature, a time feature, and a ranking feature.

Optionally, in the preprocessing of the data in the shopping data set, the method further includes smoothing the conversion rate feature by using a conversion rate smoothing method based on a priori value, specifically:

when the conversion rate is calculated in different feature or feature combination groups, B represents the number of purchases of the corresponding feature or feature combination, C represents the number of clicks of the corresponding feature or feature combination,

and

the average purchase number and the average click number of the corresponding characteristics or the characteristic combinations in the same time range are respectively, the parameter adjustment range of lambda is between 0 and 1, and the confidence of the statistical value is shown.

Optionally, the evaluation criteria used for acquiring the shopping data set on the day of the shopping festival and before the shopping festival are:

where N represents the number of test set samples, y_iTrue label, p, representing the ith sample in the test set_iThe estimated conversion for the ith sample is shown.

A second aspect of the present application provides a search advertisement conversion rate prediction apparatus for shopping malls, the apparatus comprising:

the data acquisition module is used for acquiring a shopping data set on the current shopping festival and before the shopping festival;

the first prediction module is used for taking a shopping data set before the shopping festival as a first training set, training an advertisement conversion rate model and predicting to obtain a first prediction result of the current shopping festival;

the data set processing module is used for taking the shopping data set of a part of time of the day of the shopping festival as a second training set, taking the shopping data set of another part of time of the day of the shopping festival as a test set, taking part of data in the second training set as a verification set, and respectively adding the first prediction result serving as a new feature into the second training set, the verification set and the test set;

and the merging prediction module is used for training the advertisement conversion rate model by the second training set, the verification set and the test set added with new features to obtain the final result of the current shopping festival.

Optionally, the method further includes:

and the preprocessing module is used for preprocessing the data in the shopping data set to obtain data in accordance with the input format of the advertisement conversion rate model.

Optionally, the preprocessing module further includes:

the data restoration module is used for processing missing values and abnormal values in the shopping data set;

the characteristic extraction module is used for processing the original characteristics of the attributes of the search advertisement categories in the shopping data set to obtain characteristic data meeting the format requirement;

and the hierarchical coding module is used for hierarchically coding the characteristic data.

According to the technical scheme, the method has the following advantages:

the application provides a method and a device for predicting the conversion rate of search advertisements for shopping nodes, wherein the method comprises the following steps: acquiring a shopping data set on the current day of a shopping festival and before the shopping festival; taking a shopping data set before a shopping festival as a first training set, training an advertisement conversion rate model, and predicting to obtain a first prediction result of the current day of the shopping festival; taking a shopping data set of a part of time of the day of the shopping festival as a second training set, taking a shopping data set of another part of time of the day of the shopping festival as a test set, taking part of data in the second training set as a verification set, and respectively adding a first prediction result serving as a new feature into the second training set, the verification set and the test set; and training an advertisement conversion rate model to obtain a final prediction result of the current shopping festival.

According to the method, part of data of the current shopping festival is used as a training set, the other part of shopping data set is used as a test set, and part of data in the training set is used as a verification set and used for training a corresponding machine learning model, so that the accuracy of the machine learning model is guaranteed; in addition, the data before the shopping festival is used as a training set to train a corresponding machine learning model, and corresponding time windows are divided to obtain the characteristics of the influence of the data before the shopping festival on the current day of the shopping festival, so that the defect of time characterization characteristics caused by only using the data on the current day of the shopping festival as training data is overcome.

Drawings

FIG. 1 is a flow chart of a method of an embodiment of a search advertisement conversion rate prediction method for shopping nodes according to the present application;

FIG. 2 is a schematic diagram of an embodiment of an apparatus for predicting conversion rate of search advertisements for shopping nodes according to the present application;

FIG. 3 is a line graph of click rate on the day of the shopping festival and 7 days before the shopping festival in the embodiment of the present application;

FIG. 4 is a line graph of the total conversion on the day of the shopping festival and 7 days before the shopping festival in accordance with an embodiment of the present invention;

FIG. 5 is a graphical illustration of the local conversion on the day of the shopping festival in an embodiment of the present invention;

FIG. 6 is a flow chart of hierarchical encoding of feature data in an embodiment of the present invention;

FIG. 7 is a diagram illustrating feature data obtained after layered coding according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating the data of the current day of the shopping festival as a training set, a validation set, and a test set according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of an advertisement conversion rate model according to an embodiment of the present invention;

fig. 10 is a schematic diagram of the feature groups obtained in the embodiment of the present invention.

Detailed Description

Because in the existing data of shopping nodes, the training set formed by the conversion rate related data before the shopping node has no predictability for the test set formed by the conversion rate related data of the current day of the shopping node. Specifically, as shown in fig. 3 and 4, fig. 3 is a line graph of click rate on the day of the shopping festival and 7 days before the shopping festival, fig. 4 is a line graph of total conversion rate on the day of the shopping festival and 7 days before the shopping festival, it can be seen that day1 to day6 are one type, and the conversion rate is medium; day7 is one type, low conversion; day8 is a type with high conversion rate (day8 is the date of the shopping festival), wherein the conversion rate represents the ratio of the purchase amount to the click rate. In practice, the distance between the shopping nodes is represented as day 1-day 6, the influence of the shopping node factors is small, and the shopping nodes are called as daily periods in the invention; day7 is the day before the shopping festival, and the user generates a large amount of click behaviors such as collection and purchase in order to wait for the arrival of the shopping festival, but the actual purchase behaviors are few; and day8, the purchasing behavior generated on the day of the shopping festival, is far beyond the daily period, and belongs to a special flow with high conversion rate. Therefore, if data before the shopping festival is used as a training set to directly predict the conversion rate of the shopping festival on the day, the prediction is distorted.

Therefore, the method for composite prediction of the composite model composed of different data set division methods is used for predicting the current-day conversion rate data of the shopping festival, and by adopting part of the current-day data of the shopping festival as a training set, the other part of the current-day data of the shopping festival as a test set and part of the current-day data of the shopping festival as a verification set, the current-day data of the shopping festival is used for training a corresponding machine learning model, so that the accuracy of the machine learning model is ensured; in addition, the data before the shopping festival is used as a training set to train a corresponding machine learning model, and corresponding time windows are divided to obtain the characteristics of the influence of the data before the shopping festival on the current day of the shopping festival, so that the defect of time characterization characteristics caused by only using the data on the current day of the shopping festival as training data is overcome.

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method for predicting a search advertisement conversion rate of a shopping node according to an embodiment of the present invention, as shown in fig. 1, where fig. 1 includes:

101. shopping data sets on the day of the shopping festival and before the shopping festival are obtained.

It should be noted that, when acquiring a shopping node and a shopping data set before the shopping node, data such as search advertisements and purchases before and after the shopping node of the shopping mall may be selected, the data type may include multiple dimensions such as user behavior, product attribute, store attribute, data related to search terms, and the corresponding behavior is converted into a purchase of the user. The data source data obtained can be modeled as large, using the estimated criterion as the loglos function:

where N represents the number of test set samples, y_iTrue label, p, representing the ith sample in the test set_iIndicating the predicted conversion of the ith sample, a smaller value of logloss indicates a more accurate predicted value.

In a specific embodiment, data of a shopping festival and 7 days before the shopping festival may be obtained, specifically as shown in fig. 3-5, and fig. 3 is a line graph of click rate on the day of the shopping festival and 7 days before the shopping festival; FIG. 4 is a plot of the overall conversion on the day of the shopping festival and 7 days prior to the shopping festival; fig. 5 is a schematic diagram of local conversion rate of the current day of the shopping festival, so that it can be seen that the data distribution before the shopping festival is greatly different from the data distribution of the conversion rate of the current day of the shopping festival.

102. And taking a shopping data set before the shopping festival as a first training set, training an advertisement conversion rate model, and predicting to obtain a first prediction result of the current shopping festival.

It should be noted that, since the daily purchase amount of the shopping festival far exceeds the average daily purchase amount of the shopping festival, in order to ensure that the data distribution of the training set and the test set is the same, the user behavior of part of the data of the shopping festival on the day needs to be modeled as the training set, but the data before the shopping festival should not be directly discarded, otherwise, serious data waste is caused, and therefore, the characteristics of the user, the commodity, the shop and the like can be characterized through the data before the shopping festival. Specifically, the features of the user, the commodity, the shop and the like in the historical data can be extracted by adopting the data set before a part of shopping nodes, the data of the current shopping node can be predicted by adopting the data set before another part of shopping nodes as a training set, and the prediction result is used as a new feature.

In one specific embodiment, data from the day of the shopping day and data from the 7 th day before the shopping day are used as experimental data. As shown in fig. 9, the Model2 in the structural diagram of the advertisement conversion rate Model in the embodiment of the present invention trains the advertisement conversion rate Model by using data of day4-day6 to predict the conversion rate of the shopping festival on the day, and the data sets of day 1-day 3 are used to extract the characteristics of users, commodities, stores, etc. in the history data.

103. And taking the shopping data set of the shopping festival in a part of the current day as a second training set, taking the shopping data set of the shopping festival in another part of the current day as a test set, taking part of the data in the second training set as a verification set, and respectively adding the first prediction result serving as a new feature into the second training set, the verification set and the test set.

It should be noted that, because the daily purchase amount of the shopping festival far exceeds the average purchase amount of each day before the shopping festival, only the data of the shopping festival on the day is taken as the main training set for modeling, that is, the shopping data set of the shopping festival on a part of the day is taken as the second training set, the shopping data set of the shopping festival on another part of the day is taken as the test set, and the data in the second training set is taken as the verification set. However, if only the data of the current day of the shopping festival is considered, the loss of the time characterization feature is caused, namely, the data before the shopping festival may have certain influence on the prediction result of the current day of the shopping festival. Therefore, it is necessary to predict the current day conversion rate of the shopping node by using the data before the shopping node as a training set, and since the prediction result is the current day conversion rate of the shopping node, the prediction result needs to be added as a new feature to the second training set, the verification set and the test set, respectively.

In a specific embodiment, according to the local conversion rate diagram of the shopping festival on the day shown in fig. 5, the data of the shopping festival on the day from 0 to 10 may be selected as a training set, the data of the shopping festival from 10 to 12 may be selected as a verification set, and the data of the shopping festival from 12 to 24 may be selected as a test set. As shown in the structural diagram of the advertisement conversion rate model shown in FIG. 9, Dataset2 adopts a conventional time sliding window method, which means that the conversion rate of the day of the shopping festival is predicted by using the data before the shopping festival as a training set, and the predicted result is added as a new list of features to a second training set, a verification set and a test set.

104. And training an advertisement conversion rate model by adopting a second training set, a verification set and a test set added with new characteristics to obtain a final result of the current shopping festival.

It should be noted that, in order to make up for the lack of the time-related feature, the feature including the time-related feature obtained by training the advertisement conversion rate model with the data before the shopping festival is added to the training set, the verification set and the test set of the current shopping festival for training the corresponding advertisement conversion rate model to obtain the final result of the current shopping festival.

In a specific embodiment, Dataset2 uses a traditional time sliding window method, and takes day 1-day 7 as a training set to predict the conversion rate of day8 all day, and adds the predicted result as a new list of features to Dataset1, wherein the new list of features carries time information, which can make up for the lack of time depiction of the features mentioned in Dataset 1.

The above is a method flow diagram of an embodiment of a method for predicting a search advertisement conversion rate for a shopping segment according to the present application, and the present application further includes another embodiment of a method for predicting a search advertisement conversion rate for a shopping segment, and the steps further include:

201. and preprocessing the data in the shopping data set to obtain data in accordance with the input format of the advertisement conversion rate model.

Wherein the pretreatment specifically comprises:

2011. missing values and outliers in the shopping dataset are processed.

The processing of the missing values and the abnormal values in the shopping dataset includes filling the missing values, deleting or replacing the abnormal values, and denoising the abnormal click data.

Specifically, the missing values are filled: and carrying out mode filling on the discrete data, and carrying out median filling on the continuous data.

Deletion or substitution of outliers: and (4) performing data analysis by adopting the box type graph, and filling or deleting abnormal values in the box type graph by using a specific value.

And denoising the abnormal click data: through data exploration, the fact that the number of clicks of certain shops in a certain time period is large but the clicks are not converted into purchasing behaviors once can be found, so from the business perspective, the clicks are considered to belong to the bill swiping behaviors, the data needs to be subjected to noise reduction, and the specific method is to eliminate the users who have the largest number of clicks and do not purchase corresponding to suspicious merchants.

2012. And processing the original characteristics of the attributes of the search advertisement categories in the shopping data set to obtain characteristic data meeting the format requirement.

It should be noted that the category and attribute features of the product belong to the raw data. In addition, there is also a list of special data in most commercial platforms: the predicted commodity category attribute list is a commodity feature which is predicted by a platform by using a search word input by a user through a collaborative filtering method and the like and is possibly interested by the user, and belongs to special data in a search advertisement scene. The data formats of the original category attributes of the commodities and the predicted category attributes of the search terms are specifically shown in the following table, and the characteristics need to be sliced to separate multiple columns of characteristics.

Counting the occurrence times of the commodity attributes aiming at item _ properties, and keeping the first ten items as new characteristics; for the prediction _ category _ property, reserving the top ten items with the most times as new characteristics; for item _ categories, only leaf categories are taken as new features since the root categories are all the same.

2013. And coding the characteristic data by adopting a hierarchical coding method based on an information entropy principle.

It should be noted that, the present application adopts a hierarchical coding method based on the information entropy principle to solve the feature coding problem, wherein the available high-dimensional features are processed by adopting a scheme of combining mean coding with Embedding.

The specific process of hierarchical coding is shown in fig. 6, and includes:

inputting characteristic data; comparing the characteristic data with data in a characteristic library, and if the characteristic library does not contain the characteristic data, calculating a characteristic data score; if the score is greater than a first threshold, discarding the feature data; if the score is larger than a second threshold and smaller than a first threshold, encoding the characteristic data by adopting a one-hot encoding method; if the score is smaller than a third threshold value and the feature data are not the id-type features, encoding the feature data by adopting a mean encoding method; if the score is smaller than a third threshold value and the feature data are id-type features, encoding the feature data by adopting an Embedding encoding method; and outputting the coded characteristic data.

It should be noted that there are two common characteristic attributes: dimensionality and sparsity. In the traditional method, the dimension can be distinguished by setting a threshold, the sparsity is often distinguished manually by experience and based on a specific scene, and the dimension is distinguished by a quantitative method.

The entropy of information is the expectation of information quantity, represents the uncertainty of information, describes the chaos degree of data, and when the probability in entropy is estimated by the data, the corresponding entropy value is called empirical entropy. The method uses the idea of information entropy to measure the sparsity degree of the features, namely, the discrete degree of a certain column of feature data is calculated by using the empirical entropy. Assuming that the data set is D, the number of samples in the data set is | D |, and assuming that the value of a certain column of features is represented by (K ═ 0,1,2 …, K), the number of samples belonging to a certain value is | C_kL. The empirical entropy calculation formula of the features is as follows:

the smaller the entropy value H is, the denser the characteristic distribution of the list is represented; the larger the entropy value H, the more sparsely the distribution of the features representing the column. The feature dimension is denoted by K. The dimension and entropy values are multiplied by one another and expressed as Score, and the formula is as follows:

analysis shows that when the feature dimension is larger and the feature distribution is more sparse, the Score value is larger; meanwhile, under the condition that each column of features belong to the same training set and the data volume of the training set is large enough, the low-dimensional features cannot be sparsely distributed at the same time, so the value of the Score can be divided into three conditions: high and overly sparse, high and denser, and low.

The method selects the threshold values a and b to distinguish the three conditions, firstly, initial values of the threshold values a and the threshold values b are set from actual services, the set basis is the proportion of the three characteristics in the total characteristics, the parameter adjusting process floats around the initial values in a certain range, and the value with the optimal result is selected.

The specific coding method adopted is shown in the following table:

for available high-dimensional features, whether the features are id class features or not is further distinguished. When the id features are processed, each id generally represents an entity, each characteristic value can be edited into a document form for encoding by using Embedding, and the space distance of entities with similar attributes can be shortened by a vector obtained by encoding, so that the prediction effect on the portrait features of users and commodities is greatly increased. The mean value coding can be suitable for various high-dimensional features, but because the coding is performed by using a statistical method only, the coding effect of id features is inferior to that of Embedding, and therefore the mean value coding is used as an auxiliary method of Embedding to process other high-dimensional features.

In addition, in order to save the time required for encoding, a feature library can be set by using the prediction situation of the daily period, and the name, structure and encoding method of the existing feature can be stored in the feature library. When encoding features, first, it is searched for existing features in the feature library. If so, directly adopting a coding method corresponding to the characteristic; if not, the coding method of the invention is adopted, and the name, the structure and the corresponding coding method of the new feature are stored in the feature library after coding, and the feature library becomes a new feature library suitable for the large shopping volume saving and can be used again when the next shopping volume comes.

In the present application, the feature data may further include a conversion rate feature, a user feature, a commodity feature, an id feature, a search advertisement feature, a time feature, and a ranking feature.

It should be noted that the conversion rate characteristic is defined as the ratio of the number of purchases of the user to the number of clicks, but this simple calculation method has the following two problems:

a. when the number of advertisement clicks is small, directly calculating the conversion rate may result in a high result. For example, if an advertisement is clicked only 1 time, and 1 purchase is generated, then CVR is 1.0, which is an overestimate.

b. When the number of advertisement clicks is large but the number of purchases is small, directly calculating the conversion rate results in a low result, even close to 0, which is an overestimate.

Therefore, the conversion rate needs to be smoothed, and the traditional method adopts a Bayesian smoothing method, which is a method widely used in the estimation of the conversion rate of the click rate. However, the calculation process of the parameters is complicated, and under the condition of large data volume, if Bayesian smooth conversion is frequently performed, a large amount of calculation resources are occupied, so that the whole process becomes very inefficient. The application provides a conversion rate smoothing method based on a priori value, although the calculation precision of the conversion rate smoothing method is different from that of a Bayes smoothing method in a certain degree theoretically, the effect is almost the same on the final result actually reflected, and the running speed is greatly optimized. The specific method comprises the following steps:

and

Specifically, when the conversion rate is calculated in the item id group, B represents the number of purchases of a certain item, C represents the number of clicks of a certain item,

and

the average number of purchases and the average number of clicks for all the items in the same time frame, respectively.

It should be noted that the user characteristics are expressed as depicting the click behavior records of different users. However, there are users who click more and buy less frequently in the data set, and there are users who click less and buy more frequently. For a low-frequency user, the historical behavior of the user is difficult to depict, and only the characteristics of the user, such as the category attributes of search words and click commodities, can be counted; for high frequency users, more specific preference characteristics can be characterized. The corresponding user characteristics are specifically as follows:

a. user frequency: whether the product appears in a daily period, whether the product appears in the previous day, whether the corresponding commodity or shop is clicked in the previous day, and the like.

b. User behavior: the number of clicks, whether the first click/the last click, the shop time interval of the previous click, is in proportion to the number of clicks of the commodity.

It should be noted that the characteristics of the goods include price ranking, sales promotion strength of the goods in the category, and the like.

It should be noted that the search advertisement features include the original search category, the input search term that the user indicates his intention, and the obtained category attribute list containing the search term prediction. I.e., including search terms, original item categories, and predicted category attribute features.

It should be noted that the time characteristics include a time window characteristic, a different time granularity characteristic, a time difference characteristic, and the like. The time window features are relatively important features, and specifically comprise a series of click, conversion, cross features and the like which are respectively extracted from users, commodities and the like in two time periods of day (day 1-day 6) and day before a shopping festival (day 7); the shopping day (day8) is used to characterize the day's behavior, such as merchandise and stores, based on statistics of the day's morning.

It should be noted that the ordering feature includes a global ordering and a local ordering, and is specifically shown in fig. 10:

global ordering: the number of times the user clicks on the commodities is sorted, the number of times the user clicks on the stores is sorted, the number of times the commodities are clicked by different users is sorted, the number of times the stores are clicked by different users is sorted, and the like.

Local sorting: the number of times a user clicks on a category/good/store is a ranking of the total categories/goods/stores that the user clicks on, a ranking of the number of times a good is purchased by different users, etc.

In addition, it should be noted that, in the application, when the advertisement conversion rate model is calculated, the XGBoost algorithm is adopted to solve. Specifically, in a conversion prediction scenario, click behavior is necessarily much more than conversion behavior, that is, there is a problem of imbalance between positive and negative samples. The conventional method is to perform prediction after sampling data samples, and generally down-sample most types of samples or up-sample few types of samples. When the sequencing indexes of positive and negative samples of the prediction result are more concerned, the method can avoid that the prediction result is biased to most samples during prediction, and scale _ pos _ weight parameters in the XGboost algorithm can be set at the moment, and the principle is that few samples are up-sampled.

However, the prediction target of the application is a specific probability value, and the adopted evaluation index is a loglos function, which has higher requirements on the accuracy of the probability value. If the ratio of positive and negative samples in the training set is changed by sampling, the accuracy of the predicted probability value is affected because it changes the distribution of the original data. Therefore, the data set is not processed by using a traditional sampling method, namely the scale _ pos _ weight parameter is not specially set, but the max _ delta _ step parameter of the XGboost algorithm is set to be a limited number, and the principle of the XGboost algorithm is to prevent overfitting and help convergence.

It should be further noted that the feature importance is measured by using the feature scoring function of the XGBoost algorithm. Through calculation, the following characteristics can be found to be of greater importance: searching relevant characteristics of the words, wherein the influence of the actual measurement on the result by 6 thousandths is a characteristic group with the largest influence amplitude; the conversion rate characteristic is a characteristic group with second largest influence, and the measured influence on the result is 3 thousandths. These two results also fit the scenario of search advertisement conversion rate prediction greatly. The sorting feature, the time feature, the feature group of the user and the commodity image and the like also have certain influence on the result, and the result is between the kilo point and the ten thousand point.

Finally, comparing the reproduction result of the scheme most similar to the present application with the result of the present application, the following table shows that the loss value of the scheme is smaller, i.e. the obtained result is closer to the real value. The method can be proved to be a high-efficiency and accurate search advertisement conversion rate prediction method which can adapt to special traffic.

	Most similar scheme	This scheme
			loss of logloss value	0.14183	0.13990

The above are embodiments of the method of the present application, and the present application further includes an embodiment of a device for predicting a conversion rate of a search advertisement for a shopping node, specifically as shown in fig. 2, including:

the data acquisition module 301 is configured to acquire a shopping data set on the current day of the shopping festival and before the shopping festival.

The first prediction module 302 is configured to train an advertisement conversion rate model by using a shopping data set before a shopping festival as a first training set, and predict a first prediction result of the current shopping festival.

The data set processing module 303 is configured to use a shopping data set of a part of time of the day of the shopping festival as a second training set, use a shopping data set of another part of time of the day of the shopping festival as a test set, use part of data in the second training set as a verification set, and add the first prediction result as a new feature to the second training set, the verification set, and the test set, respectively;

and the merging prediction module 304 is used for training the advertisement conversion rate model by the second training set, the verification set and the test set added with the new features to obtain the final result of the current shopping festival.

The embodiment of the application further comprises:

Wherein, the preprocessing module further comprises:

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method for predicting the conversion rate of search advertisements for shopping nodes is characterized by comprising the following steps:

2. The method of claim 1, wherein the training of the advertisement conversion rate model to predict the first prediction result of the current day of the shopping festival is performed by using the shopping data set before the shopping festival as a first training set, and the method further comprises:

3. The method for predicting search advertisement conversion rate for shopping nodes according to claim 2, wherein the preprocessing of the data in the shopping data set is specifically:

processing missing values and abnormal values in the shopping data set;

4. The method for predicting search advertisement conversion rate for shopping node according to claim 3, wherein said encoding the feature data by using a hierarchical encoding method based on information entropy principle specifically comprises:

inputting the characteristic data;

if the score is greater than a first threshold, discarding the feature data;

and outputting the coded characteristic data.

5. The method of claim 3, wherein the feature data includes a conversion feature, a user feature, a commodity feature, an id feature, a search advertisement feature, a time feature, and a ranking feature.

6. The method of claim 5, wherein the preprocessing the data in the shopping dataset further comprises smoothing the conversion characteristics by a conversion smoothing method based on a priori values, specifically:

and

the average purchase number and the average click number of corresponding characteristics or characteristic combinations in the same time range are respectively represented, the parameter adjustment range of lambda is between 0 and 1, and the statistical value is representedThe confidence of (3).

7. The method of claim 1, wherein the evaluation criteria for obtaining the shopping data sets on the day of the shopping festival and before the shopping festival are:

8. A search advertisement conversion rate prediction apparatus for a shopping node, comprising:

9. The apparatus for predicting search advertisement conversion rate for shopping malls according to claim 8, further comprising:

10. The apparatus of claim 9, wherein the preprocessing module further comprises: