CN107220845B - User re-purchase probability prediction/user quality determination method and device and electronic equipment - Google Patents

User re-purchase probability prediction/user quality determination method and device and electronic equipment Download PDF

Info

Publication number
CN107220845B
CN107220845B CN201710321753.6A CN201710321753A CN107220845B CN 107220845 B CN107220845 B CN 107220845B CN 201710321753 A CN201710321753 A CN 201710321753A CN 107220845 B CN107220845 B CN 107220845B
Authority
CN
China
Prior art keywords
user
users
repurchase
prediction
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710321753.6A
Other languages
Chinese (zh)
Other versions
CN107220845A (en
Inventor
刘梦宇
魏尧
王永会
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xingxuan Technology Co Ltd
Original Assignee
Beijing Xingxuan Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xingxuan Technology Co Ltd filed Critical Beijing Xingxuan Technology Co Ltd
Priority to CN201710321753.6A priority Critical patent/CN107220845B/en
Publication of CN107220845A publication Critical patent/CN107220845A/en
Application granted granted Critical
Publication of CN107220845B publication Critical patent/CN107220845B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0203Market surveys; Market polls

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention provides a user repurchase probability prediction method and device and electronic equipment, and relates to the technical field of internet. The user repurchase probability prediction method comprises the following steps: learning from the training sample set to obtain a prediction model of the user repurchase probability; acquiring a characteristic data set of a user to be predicted; and taking the characteristic data set of the user to be predicted as the input of the prediction model, and obtaining the repurchase probability prediction value of the user to be predicted through the prediction model. According to the technical scheme provided by the embodiment of the invention, the repurchase probability predicted value of the user to be predicted can be automatically obtained through the prediction model according to the data of more characteristic dimensions of the user, so that the effects of improving the prediction accuracy and the prediction efficiency are achieved.

Description

User re-purchase probability prediction/user quality determination method and device and electronic equipment
Technical Field
The embodiment of the invention relates to the technical field of internet, in particular to a method and a device for predicting user repurchase probability and electronic equipment.
Background
In the field of O2O (Online To offline ), the rate of repurchase of new users (buyers) is an important index for measuring the quality of merchants, and the quality of merchants is an important basis for determining how To subsidize new users for merchants. For example, if a large number of new users exist in a business on a certain day and the repurchase rate of the new users is low, the quality of the new users in the business on the day is poor, and effective benefits cannot be brought to the O2O e-commerce platform; in this case, the platform may limit subsidies to new users for that merchant to reduce platform losses.
The probability of repurchase refers to the likelihood of a2 nd purchase by a consumer at the platform. The repurchase behavior can be determined according to days, hours or even months. In practice, the most common is a repeat purchase, determined on a daily basis. When the repeated purchasing behavior is determined according to the day, the user only counts as the first purchase no matter how many times of purchases are made on the day in the first purchasing behavior; in this case, the user repurchase probability may be a 3-day repurchase probability, a 7-day repurchase probability, a 15-day repurchase probability, or the like.
In the prior art, the rate of repurchase of new users is generally determined by the following two ways: 1) after the actual repurchase behavior occurs, determining according to the actual repurchase behavior of the user; 2) after the first purchasing behavior of the user occurs, the platform operator predicts the repurchase probability of the user by using artificial experience according to the user characteristics, and then determines the repurchase rate of the new user according to the predicted value of the repurchase probability of each new user. However, both of the above approaches have some disadvantages. The first mode can be obtained by calculation only after the actual repurchase behavior occurs, and cannot be obtained in advance, so that the timeliness of the data is low. For example, a 7-day repurchase rate of new users 7 days ago can only be obtained, and the behavior of the platform to limit subsidies of new users by merchants according to the data generally occurs in a time range of less than 7 days (e.g., 3 days), so the data is invalid for the platform. Although the second method can predict the repurchase probability of the user in advance and calculate the repurchase rate of the new user according to the repurchase probability of the user, the prediction of the repurchase probability of the user depends on manual experience, so the requirement on people is high, and the prediction results of different people are possibly different, so that the prediction accuracy and efficiency cannot be guaranteed, and the accuracy and efficiency of the repurchase rate of the new user cannot be guaranteed.
From the analysis, the problem that the prediction accuracy and the prediction efficiency are low exists in the prior art.
Disclosure of Invention
The embodiment of the invention provides a user repurchase probability prediction method, a user repurchase probability prediction device and electronic equipment, and a user quality determination method, a user quality determination device and electronic equipment, which are used for solving the problems that in the prior art, the prediction accuracy and the prediction efficiency of the user repurchase probability are low.
In a first aspect, an embodiment of the present invention provides a method for predicting a user repurchase probability, where the method includes: learning from the training sample set to obtain a prediction model of the user repurchase probability; the training sample comprises a feature data set of a historical user and a corresponding record between the purchase-by-purchase marks; acquiring a characteristic data set of a user to be predicted; and taking the characteristic data set of the user to be predicted as the input of the prediction model, and obtaining the repurchase probability prediction value of the user to be predicted through the prediction model.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the prediction model is obtained by learning from the training sample set in a manner of combining an iterative decision tree GBDT model and a logistic regression model.
With reference to the first implementation manner of the first aspect, in a second possible implementation manner of the first aspect, if the prediction accuracy of the prediction model is lower than a preset threshold, the model parameters of the GBDT model and/or the logistic regression model are reset, and the prediction model is obtained by relearning according to the reset model parameters.
With reference to the first aspect, the first implementation manner of the first aspect, or the second implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the feature data includes: order characteristic data, user behavior characteristic data; the acquiring of the feature data set of the user to be predicted includes: extracting the order characteristic data from the order data of the user to be predicted; and extracting the user behavior feature data from the user behavior data of the user to be predicted.
In a second aspect, an embodiment of the present invention further provides a device for predicting a user repurchase probability, where the device includes: the prediction model generation unit is used for learning a prediction model for obtaining the user repurchase probability from the training sample set; the training sample comprises a feature data set of a historical user and a corresponding record between the purchase-by-purchase marks; the characteristic data acquisition unit is used for acquiring a characteristic data set of a user to be predicted; and the repurchase probability prediction unit is used for taking the characteristic data set of the user to be predicted as the input of the prediction model, and obtaining the repurchase probability prediction value of the user to be predicted through the prediction model.
The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions.
In one possible design, the user repurchase probability prediction device includes a processor and a memory, the memory is used for storing a program for supporting the user repurchase probability prediction device to execute the user repurchase probability prediction method in the first aspect, and the processor is configured to execute the program stored in the memory. The user repurchase probability prediction device can further comprise a communication interface, and the communication interface is used for communicating the user repurchase probability prediction device with other equipment or a communication network.
In a third aspect, an embodiment of the present invention provides a computer storage medium, configured to store computer software instructions for the user repurchase probability prediction apparatus, where the computer software instructions include a program for executing the user repurchase probability prediction method in the first aspect to the user repurchase probability prediction apparatus.
In a fourth aspect, an embodiment of the present invention provides a method for determining user quality, where the method includes: acquiring a plurality of second users of a first user to be processed; according to the feature data set of the second user, obtaining the repurchase probability predicted values of the plurality of second users through a pre-generated prediction model; determining the user quality of the plurality of second users according to the repurchase probability predicted value; and determining the second user quality of the first user according to the ratio of the number of the second users with different user qualities to the total number of the second users.
In a fifth aspect, an embodiment of the present invention further provides an apparatus for determining user quality, where the apparatus includes: a second user acquisition unit configured to acquire a plurality of second users of the first user to be processed; the repurchase probability prediction unit is used for obtaining the repurchase probability prediction values of the plurality of second users through a pre-generated prediction model according to the feature data set of the second users; the first user quality determining unit is used for determining the user quality of the plurality of second users according to the repurchase probability predicted value; and the second user quality determining unit is used for determining the second user quality of the first user according to the ratio of the number of the second users with different user qualities to the total number of the second users.
The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions.
In one possible design, the structure of the user quality determination apparatus includes a processor and a memory, the memory is used for storing a program for supporting the user quality determination apparatus to execute the user quality determination method in the fourth aspect, and the processor is configured to execute the program stored in the memory. The user quality determination means may further comprise a communication interface for the user quality determination means to communicate with other devices or a communication network.
In a sixth aspect, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for the user quality determination apparatus, which includes a program for executing the user quality determination method in the fourth aspect to the user quality determination apparatus.
According to the user repurchase probability prediction method provided by the embodiment of the invention, a prediction model of the user repurchase probability is obtained through learning from a training sample set, a characteristic data set of a user to be predicted is used as the input of the prediction model, and a repurchase probability prediction value of the user to be predicted is obtained through the prediction model; by the processing mode, the repurchase probability predicted value of the user to be predicted can be automatically obtained through the prediction model according to the data of more characteristic dimensions of the user; therefore, the prediction accuracy and the prediction efficiency can be effectively improved.
According to the user quality determination method provided by the embodiment of the invention, a plurality of second users of a first user to be processed are obtained, and according to a characteristic data set of the second users, the repurchase probability predicted values of the second users are obtained through a pre-generated prediction model; determining the user quality of a plurality of second users according to the re-purchase probability predicted value; determining the second user quality of the first user according to the ratio of the number of the second users with different user qualities to the total number of the second users; this way of processing makes it possible to automatically determine the second user quality of the first user; therefore, the processing accuracy and the processing efficiency can be effectively improved.
These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart illustrating a user repurchase probability prediction method according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a prediction model of a user repurchase probability prediction method according to an embodiment of the invention;
FIG. 3 is a schematic flow chart illustrating a specific flow of a user repurchase probability prediction method according to an embodiment of the present invention;
FIG. 4 shows a block diagram of a user repurchase probability prediction device according to one embodiment of the invention;
FIG. 5 shows a flow diagram of a user quality determination method according to one embodiment of the invention;
fig. 6 shows a block diagram of a user quality determination apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.
In some of the flows described in the present specification and claims and in the above figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, with the order of the operations being indicated as 101, 102, etc. merely to distinguish between the various operations, and the order of the operations by themselves does not represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flowchart of an embodiment of a user repurchase probability prediction method according to the present invention is shown, and the method includes the following steps:
step 101, the user repurchase probability prediction device learns a prediction model of the user repurchase probability from a training sample set.
The training sample set comprises a plurality of training samples, and the number of the training samples can be determined according to specific business requirements. The training sample includes a record of correspondence between the feature data sets of the historical users and the repurchase signatures.
The historical users include historical new users of the O2O platform, for example, new users in a past period of time (e.g., three months) are historical new users. The plurality of historical users included in the training sample set may be historical new users of different merchants.
The feature data set includes data for a plurality of feature dimensions of a user. The characteristic data set can comprise the order characteristic of the user, the behavior characteristic of the user, and the order characteristic and the behavior characteristic of the user. Among other things, order characteristics include, but are not limited to: the order amount, the preferential amount, the distribution time, whether the order is a logistics order, the gender of the user, the city to which the order belongs, the source of the order placing channel, whether the merchant is an important merchant (KA) or not, whether the merchant is in the distribution range or not, and the like. Behavioral characteristics of the user include, but are not limited to: the source of the entry into the O2O platform, the time of the order of the user, the time of the entire order path of the user, etc.
The repurchase mark is a mark indicating whether the historical user actually repurchase after the first purchase behavior. If the historical user actually performs the repurchase after the first purchase behavior, the repurchase mark is set to be yes; if the historical user does not make a repurchase after the first purchase, the repurchase flag may be set to no.
The correspondence between a historical user's feature data set and the user's repurchase signature forms a training sample. Please refer to table 1, which is an example set of training samples consisting of a plurality of training samples.
Figure BDA0001290002900000041
Figure BDA0001290002900000051
TABLE 1 training sample example set
The user repurchase probability prediction device needs to learn from a training sample set to obtain the prediction model, firstly needs to obtain the training sample set, and then learns from the training sample set to obtain the prediction model according to the selected machine learning model.
First, a method of generating a training sample set will be described with an example of a 7-day repurchase probability. When generating the training sample set, the following process can be adopted:
1) sample data is first collected for new users (i.e., historical users) over a period of time (e.g., three months or half a year, etc.), as well as the repurchase status of these new users. After collecting sample data of historical users, labeling each historical user sample according to the repurchase condition of the historical users (namely, repurchase marks) to indicate which users repurchase within 7 days, if a new user repurchase a country once, setting the repurchase mark to be yes (for example, 1 indicates yes), and if no repurchase exists, setting the repurchase mark to be no (for example, 0 indicates no), so that a data set can be formed, wherein each piece of data comprises: user identification, whether to repurchase (value 0 or 1).
2) After the data set is formed, sample data of the historical user needs to be analyzed, and a required characteristic dimension is determined, for example, the characteristic dimension includes order characteristics and behavior characteristics of the user, where the order characteristics include: the method comprises the following steps of obtaining an order amount, a preferential amount, distribution time, whether the order is a logistics order, the gender of a user, a city to which the order belongs, a channel source for placing the order, a KA of a merchant, whether the merchant is in a distribution range and the like; the behavior characteristics of the user include: the source of the entry into the O2O platform, the time of the order of the user, the time of the entire order path of the user, etc.
3) And extracting data of the required characteristic dimension from the sample of the historical user according to the analyzed characteristic dimension information to form a characteristic vector. The specific forming process of the feature vector can be as follows: a) reading data in a sample of the historical user one by one; b) judging the read data one by one according to characteristic dimensions, if a certain characteristic dimension is hit, setting the current position as 1, if the certain characteristic dimension is not hit, setting the current position as 0, if the following merchant is a KA merchant, setting the current position as 0, and if the merchant is not KA merchant, setting the current position as 0; c) after the feature dimension-by-feature dimension judgment, each historical user sample forms a feature vector such as (1,0,1,0,1,1,1,1,0,0,0 … …); d) and saving the feature vectors of the historical users into a file.
Through the process, a training sample set can be generated, wherein the samples actually purchased again by the user are positive samples, the samples not actually purchased again are negative samples, and the training sample set comprises a plurality of positive samples and a plurality of negative samples.
After the training sample set is obtained, a proper machine learning model needs to be selected. The machine learning model may be a single machine learning model, such as a decision tree model, a logistic regression model, a neural network model, or the like; the machine learning model may also be a machine learning model formed by combining a plurality of machine learning models, for example, a combined model of a Gradient Boost Decision Tree (GBDT) combined with a logistic regression model.
It should be noted that, in the case of fewer feature dimensions and more training samples, if a single machine learning model (e.g., a decision tree model or a logistic regression model) is used, it may cause a case that the accuracy of the trained prediction model is low, for example, when the total number of feature dimensions is more than 30 and the total number of new users per day is about 5 ten thousand or more (i.e., the total number of training samples for three months is more than 450 ten thousand).
In order to improve the prediction accuracy of the prediction model, a characteristic engineering method can be applied to expand some combined characteristics as much as possible. Referring to fig. 2, which is a machine learning model adopted in the present embodiment, the model input x is a feature vector. In this embodiment, the machine learning model selected is a combination model of the GBDT model and the logistic regression model. GBDT is a commonly used non-linear model. It is based on boosting thought in ensemble learning, each calculation is to reduce the residual of the last time, and in order to eliminate the residual, a new model can be established in the Gradient (Gradient) direction of residual reduction. Therefore, in the Gradient Boost, each new model is established so that the residual error of the previous model is reduced toward the Gradient direction. The GBDT idea enables the system to discover various distinctive features and feature combinations, and the path of the decision tree can be directly used as the input feature of the logistic regression model, so that the step of manually searching the features and the feature combinations is omitted.
In specific implementation, the training process executed by the user repurchase probability prediction device may include the following steps: 1) reading a feature vector formed in the previous stage; 2) training the prediction model to obtain the weight of each feature dimension; 3) and saving the weight of each feature dimension obtained by training.
In practical applications, the prediction model of the user repurchase probability may be obtained by periodically learning from the training sample set according to a preset time interval, for example, training every 10 days or once a day to generate a new prediction model. The weights of the characteristic dimensions of the prediction model are stored in a file, and can be read from the file when the model is used for predicting the probability of the repurchase of a new user.
Step 102, obtaining a characteristic data set of a user to be predicted.
The user to be predicted refers to a user needing to predict the possibility whether the user will be purchased again or not in the future. The feature data set of the user to be predicted and the feature data set of the training sample have the same feature dimension, but the feature data of different users may not be the same.
The method for acquiring the feature data set of the user to be predicted comprises the following specific steps: extracting the order characteristic data from the order data of the user to be predicted; and extracting the user behavior feature data from the user behavior data of the user to be predicted.
When the order characteristic data and the user behavior characteristic data are extracted, the read order data and the read user behavior data can be judged one by one in characteristic dimension, if a certain characteristic dimension is hit, the order data and the user behavior data are set to be 1 in the current position, if the certain characteristic dimension is not hit, the order data and the user behavior data are set to be 0, if the ordering merchant is a KA merchant, the ordering merchant is set to be 0 in the current position, and if the ordering merchant is not the KA merchant, the ordering merchant is set to be 0.
And 103, taking the characteristic data set of the user to be predicted as the input of the prediction model, and obtaining a repurchase probability prediction value of the user to be predicted through the prediction model.
After the user repurchase probability prediction device obtains the characteristic data set of the user to be predicted, the characteristic data set of the user to be predicted can be used as the input of the prediction model, and the prediction value of the repurchase probability of the user to be predicted is obtained through the prediction model.
Referring to fig. 3, it is a specific flowchart of the method for predicting the user repurchase probability in this embodiment. Through the figure, the user repurchase probability prediction method provided by the embodiment of the invention can be intuitively understood. As can be seen from fig. 3, the user repurchase probability prediction method of the present embodiment includes two stages of processing procedures, namely: the process of the prediction model generation phase and the process of the new user prediction. In the generation stage of the prediction model, new user sample data in the past three months needs to be collected firstly, then the feature dimension is determined through analyzing the features, then the feature vectors in the samples are extracted, and finally the prediction model of the user repurchase probability is obtained through the centralized learning of the training samples. In the new user prediction stage, firstly, a feature data set of a new user needs to be acquired, and then, a prediction value of the repurchase probability of the new user can be acquired through a pre-generated prediction model.
It should be noted that, as shown in fig. 3, in order to ensure that the prediction model reaches a certain accuracy, in this embodiment, after a new prediction model is generated by training, the accuracy of model prediction is calculated according to the verification set, if the accuracy reaches a preset threshold (e.g., 90%), the prediction model may be determined as an actually available model, and if the accuracy does not reach the preset threshold, model parameters (e.g., parameters such as depth of the decision tree, number of the decision tree, and the like) may be adjusted by a manual adjustment manner or a machine automatic adjustment manner, and according to the adjusted model parameters, the prediction model of the user repurchase probability is obtained by relearning from the training sample set until the accuracy reaches the preset threshold, and the model is saved. Wherein the preset threshold value may be determined empirically.
As can be seen from the foregoing embodiments, in the user repurchase probability prediction method provided by the embodiments of the present invention, a prediction model of the user repurchase probability is obtained through learning from a training sample set, a feature data set of a user to be predicted is used as an input of the prediction model, and a repurchase probability prediction value of the user to be predicted is obtained through the prediction model; by the processing mode, the repurchase probability predicted value of the user to be predicted can be automatically obtained through the prediction model according to the data of more characteristic dimensions of the user; therefore, the prediction accuracy and the prediction efficiency can be effectively improved.
Fig. 4 is a schematic structural diagram of an embodiment of the user repurchase probability prediction device of the present invention. As shown in fig. 4, the user repurchase probability prediction device includes: a prediction model generation unit 401, configured to learn a prediction model for obtaining a user repurchase probability from a training sample set; a feature data obtaining unit 402, configured to obtain a feature data set of a user to be predicted; and a repurchase probability prediction unit 403, configured to use the feature data set of the user to be predicted as an input of the prediction model, and obtain a repurchase probability prediction value of the user to be predicted through the prediction model.
The prediction model generation unit 401 may be specifically configured to learn and obtain the prediction model from the training sample set by combining an iterative decision tree GBDT model and a logistic regression model.
As can be seen from the foregoing embodiments, the user repurchase probability prediction device provided in the embodiments of the present invention obtains the prediction model of the user repurchase probability by learning from the training sample set, and obtains the repurchase probability prediction value of the user to be predicted through the prediction model by using the feature data set of the user to be predicted as the input of the prediction model; by the processing mode, the repurchase probability predicted value of the user to be predicted can be automatically obtained through the prediction model according to the data of more characteristic dimensions of the user; therefore, the prediction accuracy and the prediction efficiency can be effectively improved.
In one possible design, the user repurchase probability prediction device includes a processor and a memory, the memory is used for storing a program for supporting the user repurchase probability prediction device to execute the user repurchase probability prediction method in the first aspect, and the processor is configured to execute the program stored in the memory.
The program includes one or more computer instructions, wherein the one or more computer instructions are for execution invoked by the processor.
The processor is configured to: learning from the training sample set to obtain a prediction model of the user repurchase probability; the training sample comprises a feature data set of a historical user and a corresponding record between the purchase-by-purchase marks; acquiring a characteristic data set of a user to be predicted; and taking the characteristic data set of the user to be predicted as the input of the prediction model, and obtaining the repurchase probability prediction value of the user to be predicted through the prediction model.
An embodiment of the present invention provides a computer storage medium, configured to store computer software instructions for a user repurchase probability prediction apparatus, where the computer software instructions include a program for executing the user repurchase probability prediction method in the first aspect to the user repurchase probability prediction apparatus.
Corresponding to the method for predicting the user repurchase probability provided by the first embodiment, the embodiment of the invention further provides a method for determining the user quality.
Referring to fig. 5, a flowchart of an embodiment of the method for determining user quality of the present invention is shown, and the method includes the following steps:
step 501, a plurality of second users of a first user to be processed are obtained.
The first user comprises an O2O platform merchant. The second users include new users of the first user, for example, the first user is merchant a, and the second users are all new users (buyers) of merchant a on the same day.
And 502, obtaining the repurchase probability predicted values of the plurality of second users through a pre-generated prediction model according to the feature data set of the second users.
In specific implementation, the steps can comprise the following specific steps: 1) acquiring a characteristic data set of each second user; 2) and aiming at each second user, obtaining a repurchase probability predicted value of the second user through a pre-generated prediction model according to the characteristic data set of the second user.
The step of obtaining the repurchase probability prediction value of the second user through the pre-generated prediction model corresponds to step S103 in the first embodiment, and for related description, reference is made to corresponding description in the first embodiment, which is not repeated herein.
Step 503, determining the user quality of the plurality of second users according to the repurchase probability predicted value.
The different repurchase probability prediction values correspond to different user qualities, for example, the quality of new users with the repurchase probability prediction value smaller than 60% is poor, the quality of new users with the repurchase probability prediction value larger than or equal to 80% is good, and the quality of new users with the repurchase probability prediction value between 60% and 80% is general.
Step 504, determining the second user quality of the first user according to the ratio of the number of the second users with different user qualities to the total number of the second users.
In specific implementation, the steps can comprise the following specific steps: 1) calculating the ratio of the number of the second users with different user qualities to the total number of the second users; 2) and determining the second user quality of the first user according to the ratio obtained by calculation and a preset user quality determination rule.
In practical application, the processing mode of the first user can be further determined according to the second user quality of the first user. The treatment methods include but are not limited to: and limiting the second user subsidy to the first user, improving the modes of performing the second user subsidy to the first user and the like.
The user quality determination rule may be set according to a specific service requirement. For example, the user quality determination rule is set to: and if the ratio of the new users with poor quality exceeds a first preset threshold, determining the quality of a second user of the first user as poor quality and the like.
In this embodiment, assuming that, among all new users of the merchant a on a certain day, the ratio of the new users with poor quality is 40%, the ratio of the new users with good quality is 35%, and the ratio of the new users with general quality is 25%, according to a preset user quality determination rule, it is determined that the second user quality of the merchant a is poor, and further according to a preset user processing rule, it is determined that the merchant a needs to be restricted from a processing mode in which the O2O platform subsidizes the new users.
As can be seen from the foregoing embodiments, in the user quality determination method provided by the embodiments of the present invention, the plurality of second users of the first user to be processed are obtained, and according to the feature data set of the second users, the repurchase probability prediction values of the plurality of second users are obtained through the pre-generated prediction model; determining the user quality of a plurality of second users according to the re-purchase probability predicted value; determining the second user quality of the first user according to the ratio of the number of the second users with different user qualities to the total number of the second users; this way of processing makes it possible to automatically determine the second user quality of the first user; therefore, the processing accuracy and the processing efficiency can be effectively improved.
Fig. 6 is a schematic structural diagram of an embodiment of the user quality determination apparatus according to the present invention. As shown in fig. 6, the user quality determination apparatus includes: a second user acquiring unit 601 configured to acquire a plurality of second users of the first user to be processed; a repurchase probability prediction unit 602, configured to obtain repurchase probability prediction values of the multiple second users through a prediction model generated in advance according to the feature data set of the second users; a first user quality determining unit 603, configured to determine user qualities of the multiple second users according to the repurchase probability prediction value; a second user quality determining unit 604, configured to determine the second user quality of the first user according to ratios of the numbers of the second users with different user qualities to the total number of the second users, respectively.
In one possible design, the user quality determination apparatus includes a processor and a memory, the memory is used for storing a program for supporting the user quality determination apparatus to execute the user quality determination method in the fourth aspect, and the processor is configured to execute the program stored in the memory.
The program includes one or more computer instructions, wherein the one or more computer instructions are for execution invoked by the processor.
The processor is configured to: acquiring a plurality of second users of a first user to be processed; according to the feature data set of the second user, obtaining the repurchase probability predicted values of the plurality of second users through a pre-generated prediction model; determining the user quality of the plurality of second users according to the repurchase probability predicted value; and determining the second user quality of the first user according to the ratio of the number of the second users with different user qualities to the total number of the second users.
An embodiment of the present invention provides a computer storage medium for storing computer software instructions for a user quality determination apparatus, which includes a program for executing the user quality determination method in the fourth aspect to the user quality determination apparatus.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
The invention discloses a1 and a user repurchase probability prediction method, which comprises the following steps:
learning from the training sample set to obtain a prediction model of the user repurchase probability;
acquiring a characteristic data set of a user to be predicted;
and taking the characteristic data set of the user to be predicted as the input of the prediction model, and obtaining the repurchase probability prediction value of the user to be predicted through the prediction model.
A2, the method according to A1, wherein the predictive model of the user repurchase probability is obtained by learning from the training sample set, in the following way:
and learning from the training sample set to obtain the prediction model by combining an iterative decision tree GBDT model and a logistic regression model.
A3, the method of a2, further comprising:
if the prediction accuracy of the prediction model is lower than a preset threshold value, resetting the model parameters of the GBDT model and/or the logistic regression model, and re-learning according to the reset model parameters to obtain the prediction model.
A4, the method as claimed in any one of a1 to A3, the characteristic data comprising: order characteristic data, user behavior characteristic data;
the acquiring of the feature data set of the user to be predicted includes:
extracting the order characteristic data from the order data of the user to be predicted;
and extracting the user behavior feature data from the user behavior data of the user to be predicted.
A5, the method of A4, the order characteristic data comprising: source channel, payment method, item characteristics, city level, time characteristics.
A6, the method of A4, the user behavior feature data comprising: entry source, click feature, path feature.
The invention also discloses B7, a user repurchase probability prediction device, comprising:
the prediction model generation unit is used for learning a prediction model for obtaining the user repurchase probability from the training sample set;
the characteristic data acquisition unit is used for acquiring a characteristic data set of a user to be predicted;
and the repurchase probability prediction unit is used for taking the characteristic data set of the user to be predicted as the input of the prediction model, and obtaining the repurchase probability prediction value of the user to be predicted through the prediction model.
B8, the apparatus according to B7, wherein the prediction model generation unit is specifically configured to learn and obtain the prediction model from the training sample set by combining an iterative decision tree GBDT model with a logistic regression model.
The invention also discloses C9, an electronic device, comprising a memory and a processor; wherein the content of the first and second substances,
the memory is to store one or more computer instructions, wherein the one or more computer instructions are for the processor to invoke for execution;
the processor is configured to:
learning from the training sample set to obtain a prediction model of the user repurchase probability; acquiring a characteristic data set of a user to be predicted; and taking the characteristic data set of the user to be predicted as the input of the prediction model, and obtaining the repurchase probability prediction value of the user to be predicted through the prediction model.
Also disclosed is D10, a computer storage medium storing one or more computer instructions that, when executed, implement the method of any one of a1-a 6.
The invention also discloses E11 and a user quality determination method, which comprises the following steps:
acquiring a plurality of second users of a first user to be processed;
according to the feature data set of the second user, obtaining the repurchase probability predicted values of the plurality of second users through a pre-generated prediction model;
determining the user quality of the plurality of second users according to the repurchase probability predicted value;
and determining the second user quality of the first user according to the ratio of the number of the second users with different user qualities to the total number of the second users.
E12, the method of E11, the predictive model being generated as follows:
the predictive model is learned from a set of training samples.
E13, the method according to E12, wherein the learning from the training sample set obtains the predictive model by:
and learning from the training sample set to obtain the prediction model by combining an iterative decision tree GBDT model and a logistic regression model.
The invention also discloses F14, a user quality determination device, comprising:
a second user acquisition unit configured to acquire a plurality of second users of the first user to be processed;
the repurchase probability prediction unit is used for obtaining the repurchase probability prediction values of the plurality of second users through a pre-generated prediction model according to the feature data set of the second users;
the first user quality determining unit is used for determining the user quality of the plurality of second users according to the repurchase probability predicted value;
and the second user quality determining unit is used for determining the second user quality of the first user according to the ratio of the number of the second users with different user qualities to the total number of the second users.
F15, the apparatus of F14, the apparatus further comprising:
and the prediction model generation unit is used for learning and obtaining the prediction model from the training sample set.
F16, the apparatus according to F15, wherein the prediction model generation unit is specifically configured to learn and obtain the prediction model from the training sample set by combining an iterative decision tree GBDT model with a logistic regression model.
The invention also discloses G17, an electronic device, comprising a memory and a processor; wherein the content of the first and second substances,
the memory is to store one or more computer instructions, wherein the one or more computer instructions are for the processor to invoke for execution;
the processor is configured to:
acquiring a plurality of second users of a first user to be processed; according to the feature data set of the second user, obtaining the repurchase probability predicted values of the plurality of second users through a pre-generated prediction model; determining the user quality of the plurality of second users according to the repurchase probability predicted value; and determining the second user quality of the first user according to the ratio of the number of the second users with different user qualities to the total number of the second users.
H18, a computer storage medium storing one or more computer instructions which, when executed, implement the method of any one of E11-E13.

Claims (6)

1. A method for determining user quality, the method comprising:
acquiring a plurality of second users of a first user to be processed, wherein the first user is a merchant of an O2O platform, and the second users are new users of the merchant;
according to the feature data set of the second users, obtaining a repurchase probability predicted value of each second user through a pre-generated prediction model; the prediction model is obtained by training through a training sample set; the training sample set comprises a feature data set of historical users and a corresponding record between a repurchase mark, the historical users comprise historical new users, and the repurchase mark is used for indicating whether the historical new users repurchase after first purchasing behaviors;
determining the user quality of each second user according to the re-purchasing probability predicted value;
respectively calculating the ratio of the number of the second users with different user qualities to the total number of the second users;
determining second user quality of the first user according to the ratio obtained by calculation and a preset user quality determination rule;
determining a processing mode for the first user according to the second user quality of the first user; the processing mode comprises a subsidy processing mode of the first user.
2. The method of claim 1, wherein the predictive model is learned from the set of training samples by combining an iterative decision tree (GBDT) model with a logistic regression model.
3. An apparatus for user quality determination, the apparatus comprising:
a second user obtaining unit, configured to obtain multiple second users of a first user to be processed, where the first user is a merchant of an O2O platform, and the second user is a new user of the merchant;
the repurchase probability prediction unit is used for obtaining a repurchase probability prediction value of each second user through a pre-generated prediction model according to the feature data set of the second user; the prediction model is obtained by training through a training sample set; the training sample set comprises a feature data set of historical users and a corresponding record between a repurchase mark, the historical users comprise historical new users, and the repurchase mark is used for indicating whether the historical new users repurchase after first purchasing behaviors;
the first user quality determining unit is used for determining the user quality of each second user according to the repurchase probability predicted value;
a second user quality determining unit, configured to calculate a ratio of the number of the second users with different user qualities to the total number of the second users;
determining second user quality of the first user according to the ratio obtained by calculation and a preset user quality determination rule;
determining a processing mode for the first user according to the second user quality of the first user; the processing mode comprises a subsidy processing mode of the first user.
4. The apparatus of claim 3, wherein the predictive model is learned from the set of training samples by combining an iterative decision tree (GBDT) model with a logistic regression model.
5. An electronic device comprising a memory and a processor; wherein the content of the first and second substances,
the memory is to store one or more computer instructions, wherein the one or more computer instructions are for the processor to invoke for execution;
the processor is configured to:
acquiring a plurality of second users of a first user to be processed, wherein the first user is a merchant of an O2O platform, and the second users are new users of the merchant;
according to the feature data set of the second user, obtaining the repurchase probability predicted values of the plurality of second users through a pre-generated prediction model; the prediction model is obtained by training through a training sample set; the training sample set comprises a feature data set of historical users and a corresponding record between a repurchase mark, the historical users comprise historical new users, and the repurchase mark is used for indicating whether the historical new users repurchase after first purchasing behaviors;
determining the user quality of the plurality of second users according to the repurchase probability predicted value;
respectively calculating the ratio of the number of the second users with different user qualities to the total number of the second users;
determining second user quality of the first user according to the ratio obtained by calculation and a preset user quality determination rule;
determining a processing mode for the first user according to the second user quality of the first user; the processing mode comprises a subsidy processing mode of the first user.
6. A computer-readable storage medium having one or more computer instructions stored thereon which, when executed by a processor, implement the method of claim 1 or 2.
CN201710321753.6A 2017-05-09 2017-05-09 User re-purchase probability prediction/user quality determination method and device and electronic equipment Expired - Fee Related CN107220845B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710321753.6A CN107220845B (en) 2017-05-09 2017-05-09 User re-purchase probability prediction/user quality determination method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710321753.6A CN107220845B (en) 2017-05-09 2017-05-09 User re-purchase probability prediction/user quality determination method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN107220845A CN107220845A (en) 2017-09-29
CN107220845B true CN107220845B (en) 2021-06-29

Family

ID=59944733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710321753.6A Expired - Fee Related CN107220845B (en) 2017-05-09 2017-05-09 User re-purchase probability prediction/user quality determination method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN107220845B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934369A (en) * 2017-12-15 2019-06-25 北京京东尚科信息技术有限公司 Method and device for information push
CN108170673B (en) * 2017-12-26 2021-08-24 北京百度网讯科技有限公司 Information tone identification method and device based on artificial intelligence
CN108346107B (en) * 2017-12-28 2020-11-10 创新先进技术有限公司 Social content risk identification method, device and equipment
CN108564392A (en) * 2018-01-18 2018-09-21 百度在线网络技术(北京)有限公司 Information processing method and device
CN109657832A (en) * 2018-05-04 2019-04-19 美味不用等(上海)信息科技股份有限公司 A kind of prediction technique and device of frequent customer
CN109461023B (en) * 2018-10-12 2023-10-24 中国平安人寿保险股份有限公司 Loss user retrieval method and device, electronic equipment and storage medium
CN110364162B (en) * 2018-11-15 2022-03-15 腾讯科技(深圳)有限公司 Artificial intelligence resetting method and device and storage medium
CN111292106A (en) * 2018-12-06 2020-06-16 北京嘀嘀无限科技发展有限公司 Method and device for determining business demand influence factors
CN109685583B (en) * 2019-01-10 2020-12-25 博拉网络股份有限公司 Supply chain demand prediction method based on big data
CN109829651A (en) * 2019-01-31 2019-05-31 拉扎斯网络科技(上海)有限公司 Determine method, apparatus, electronic equipment and the storage medium of the strategy for trade company
CN109903095A (en) * 2019-03-01 2019-06-18 上海拉扎斯信息科技有限公司 Data processing method, device, electronic equipment and computer readable storage medium
CN111667290B (en) * 2019-03-08 2024-06-18 北京京东尚科信息技术有限公司 Business display method and device and computer readable storage medium
CN110134722A (en) * 2019-05-22 2019-08-16 北京小度信息科技有限公司 Target user determines method, apparatus, equipment and storage medium
CN111047048B (en) * 2019-11-22 2023-04-07 支付宝(杭州)信息技术有限公司 Energized model training and merchant energizing method and device, and electronic equipment
CN111008871A (en) * 2019-12-10 2020-04-14 重庆锐云科技有限公司 Real estate repurchase customer follow-up quantity calculation method, device and storage medium
CN111292170A (en) * 2020-02-18 2020-06-16 重庆锐云科技有限公司 Method, device and storage medium for recommending intention customers for appointed building
CN113706198B (en) * 2021-08-27 2022-08-26 青木数字技术股份有限公司 Method for estimating recent repurchase probability of E-commerce repurchase hidden customers

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101620692A (en) * 2008-06-30 2010-01-06 上海全成通信技术有限公司 Method for analyzing customer churn of mobile communication service
CN104504460A (en) * 2014-12-09 2015-04-08 北京嘀嘀无限科技发展有限公司 Method and device for predicating user loss of car calling platform

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8712828B2 (en) * 2005-12-30 2014-04-29 Accenture Global Services Limited Churn prediction and management system
US20150310336A1 (en) * 2014-04-29 2015-10-29 Wise Athena Inc. Predicting customer churn in a telecommunications network environment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101620692A (en) * 2008-06-30 2010-01-06 上海全成通信技术有限公司 Method for analyzing customer churn of mobile communication service
CN104504460A (en) * 2014-12-09 2015-04-08 北京嘀嘀无限科技发展有限公司 Method and device for predicating user loss of car calling platform

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"基于数据挖掘的网购用户流失预测研究";郭成蹊;《中国优秀硕士学位论文全文数据库 经济与管理科学辑》;20161231;参见正文第4-5章 *
"淘宝网卖家信誉影响因素研究";吴培红;《http://d.wanfangdata.com.cn/thesis/ChJUaGVzaXNOZXdTMjAyMDEwMjgSB0QyODAyNTAaCG9jaGdqMTV6》;20130424;参见正文第28-35页 *
郭成蹊."基于数据挖掘的网购用户流失预测研究".《中国优秀硕士学位论文全文数据库 经济与管理科学辑》.2016,第J157-23页. *

Also Published As

Publication number Publication date
CN107220845A (en) 2017-09-29

Similar Documents

Publication Publication Date Title
CN107220845B (en) User re-purchase probability prediction/user quality determination method and device and electronic equipment
CN106651542B (en) Article recommendation method and device
CN108875776B (en) Model training method and device, service recommendation method and device, and electronic device
CN111078880B (en) Sub-application risk identification method and device
CN109685537B (en) User behavior analysis method, device, medium and electronic equipment
CN111181757B (en) Information security risk prediction method and device, computing equipment and storage medium
CN108133390A (en) For predicting the method and apparatus of user behavior and computing device
CN112070545B (en) Method, apparatus, medium, and electronic device for optimizing information reach
CN111428217A (en) Method and device for identifying cheat group, electronic equipment and computer readable storage medium
CN109885834A (en) A kind of prediction technique and device of age of user gender
CN111178537A (en) Feature extraction model training method and device
CN111598338A (en) Method, apparatus, medium, and electronic device for updating prediction model
CN111210332A (en) Method and device for generating post-loan management strategy and electronic equipment
CN107862599B (en) Bank risk data processing method and device, computer equipment and storage medium
CN107644042B (en) Software program click rate pre-estimation sorting method and server
CN113592593A (en) Training and application method, device, equipment and storage medium of sequence recommendation model
CN111143533A (en) Customer service method and system based on user behavior data
CN111612624A (en) Method and system for analyzing importance of data features
CN110796379A (en) Risk assessment method, device and equipment of business channel and storage medium
CN113568739B (en) User resource quota allocation method and device and electronic equipment
CN111325372A (en) Method for establishing prediction model, prediction method, device, medium and equipment
CN113298642B (en) Order detection method and device, electronic equipment and storage medium
CN114693428A (en) Data determination method and device, computer readable storage medium and electronic equipment
CN112767117A (en) Method and device for evaluating enterprise status in group
CN113313615A (en) Method and device for quantitatively grading and grading enterprise judicial risks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Building N3, building 12, No. 27, Jiancai Chengzhong Road, Haidian District, Beijing 100096

Applicant after: Beijing Xingxuan Technology Co.,Ltd.

Address before: 100085 Beijing, Haidian District on the road to the information on the ground floor of the 1 to the 3 floor of the 2 floor, room 11, 202

Applicant before: Beijing Xiaodu Information Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210629

CF01 Termination of patent right due to non-payment of annual fee