CN109903100A - A kind of customer churn prediction technique, device and readable storage medium storing program for executing - Google Patents
A kind of customer churn prediction technique, device and readable storage medium storing program for executing Download PDFInfo
- Publication number
- CN109903100A CN109903100A CN201910225076.7A CN201910225076A CN109903100A CN 109903100 A CN109903100 A CN 109903100A CN 201910225076 A CN201910225076 A CN 201910225076A CN 109903100 A CN109903100 A CN 109903100A
- Authority
- CN
- China
- Prior art keywords
- sample
- feature samples
- prediction model
- feature
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 80
- 238000005070 sampling Methods 0.000 claims description 10
- 230000003542 behavioural effect Effects 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000003066 decision tree Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000007477 logistic regression Methods 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the invention discloses a kind of customer churn prediction technique, device and readable storage medium storing program for executing.This method comprises: equipment is trained sample vector to obtain the first prediction model, the importance ranking of multiple feature samples in sample vector is generated further according to the first prediction model, and obtain the cross feature of preceding k feature samples in importance ranking, after obtaining final prediction model according to cross feature and sample vector the first prediction model of update, the second training characteristics of user to be predicted are input to updated first prediction model to predict that user to be predicted logins the time that target application distance this time logins target application next time by equipment.Using the embodiment of the present application, the accuracy of prediction model can be improved, realize the prediction to customer churn.
Description
Technical field
The present invention relates to technical field of data processing more particularly to a kind of customer churn prediction technique, device and readable deposit
Storage media.
Background technique
Many network services and game on line have been all suffered from there is a large amount of use within a few minutes or a few houres of beginning
The case where family is lost, in order to reduce customer churn, can predict the loss of user, to formulate not for different users
Same strategy, improves the game experiencing of user.
Existing attrition prediction method mostly uses core index to fluctuate or use the methods of logistic regression, decision tree, core
Heart index refers mainly to game duration, outpost failure rate etc. and occurs then to think that user will be lost when large variation, and logistic regression is determined
Plan tree refers mainly to carry out whether prediction user will be lost according to user's history Behavioral availability logistic regression or decision tree.So
And both mode covering surfaces are relatively narrow, prediction accuracy is not high.Therefore, how more accurately prediction customer churn situation is this
The problem of technical field personnel are studying.
Summary of the invention
The embodiment of the invention discloses a kind of customer churn prediction technique, device and readable storage medium storing program for executing, can be realized pair
The prediction of customer churn, and improve the accuracy of prediction model.
In a first aspect, the embodiment of the invention provides a kind of customer churn prediction techniques, this method comprises:
Sample vector is trained to obtain the first prediction model, wherein the sample vector includes multiple feature samples
This, each feature samples include the first training characteristics and user tag, first training characteristics in the multiple feature samples
For the feature extracted in the initial data of pre-set user, when the initial data includes representation data and operation target application
Behavioral data;The user tag is logined target application distance for describing the pre-set user next time and is this time logined
The time of the target application, first prediction model are used to carry out ranking to the importance of the multiple feature samples;
The importance ranking of multiple feature samples in the sample vector is generated according to first prediction model, and is obtained
The cross feature of preceding k feature samples in the importance ranking, the cross feature are that the preceding k feature samples are counted
Obtained feature is calculated in student movement;
First prediction model is updated according to the cross feature and the sample vector;
The second instruction is extracted in the initial data in the preset period of time for logining the target application from user to be predicted
Practice feature, second training characteristics is input to updated first prediction model, to predict the user to be predicted
The time that the target application distance this time logins the target application is logined next time.
In the above-mentioned methods, equipment is trained sample vector to obtain the first prediction model, further according to the first prediction
Model generates the importance ranking of multiple feature samples in sample vector, and obtains preceding k feature samples in importance ranking
Cross feature updates the first prediction model according to cross feature and sample vector and obtains final prediction model, to predict to pre-
It surveys user and logins the time that target application distance this time logins target application next time;It is this to be arranged by obtaining feature importance
The cross feature of preceding k feature samples carrys out the mode of training pattern in name, can expand the coverage rate of important feature, to improve
The prediction to customer churn is realized in the accuracy of prediction model.
It is described that sample vector is trained to obtain based in a first aspect, in a kind of wherein optional implementation
First prediction model, comprising:
Obtain sample vector;
Training set is generated according to the sample vector, and the training training set is to obtain the first prediction model;Wherein, institute
Stating includes multiple feature samples in training set, and each feature samples in the multiple feature samples are in the sample vector
Feature samples.
This implementation is screened again by the sample vector to acquisition, improves the quality of feature samples, from
And improve the accuracy of model.
Based in a first aspect, described generated according to the sample vector is trained in a kind of wherein optional implementation
Collection, comprising:
The sample vector includes positive sample and negative sample, and the positive sample is in the multiple feature samples comprising default
The sample of field, the negative sample is the sample for not including the preset field in the multiple feature samples, if the positive sample
The ratio of this and the negative sample is more than preset range, then down-sampling is carried out to the negative sample, so that in the training set
The ratio of the positive sample and the negative sample is within preset range.
This implementation is provided with the ratio of positive negative sample in training set, reasonable positive and negative during training pattern
Sample proportion can be improved the accuracy of model.
It is described to be generated according to first prediction model based in a first aspect, in a kind of wherein optional implementation
The importance ranking of the multiple feature samples, comprising:
According to first prediction model prediction as a result, calculate the accuracy and recall rate of the multiple feature samples,
The accuracy of each feature samples in the importance ranking is greater than preset threshold, and recall rate is bigger, in the importance
Ranking in ranking more before.
Based in a first aspect, the preset period of time is no more than two hours in a kind of wherein optional implementation.
Such implementation offers the predicted times of hour grade, in two hours after can only being logged in using user
Or the data of shorter time predict whether user is lost, more efficient provides prediction result, enable a device to mention faster
For being suitable for the personalized service of user to be predicted.
Second aspect, the embodiment of the invention provides a kind of customer churn prediction meanss, which includes:
Training unit, for being trained to sample vector to obtain the first prediction model, wherein the sample vector packet
Multiple feature samples are included, each feature samples include the first training characteristics and user tag in the multiple feature samples, described
First training characteristics are the feature extracted in the initial data of pre-set user, and the initial data includes representation data and operation
Behavioral data when target application;The user tag for describe the pre-set user login next time the target application away from
From the time for this time logining the target application, first prediction model is used for the importance to the multiple feature samples
Carry out ranking;
Acquiring unit, for generating the important of multiple feature samples in the sample vector according to first prediction model
Property ranking, and obtain the cross feature of preceding k feature samples in the importance ranking, the cross feature is the preceding k
Feature samples perform mathematical calculations obtained feature;
Updating unit, for updating first prediction model according to the cross feature and the sample vector;
Predicting unit mentions in the initial data in the preset period of time for logining the target application from user to be predicted
Take the second training characteristics, second training characteristics be input to updated first prediction model, with predict it is described to
Prediction user logins the time that the target application distance this time logins the target application next time.
Based on second aspect, in one of the implementation manners, the training unit includes:
Subelement is obtained, for obtaining sample vector;
Training subelement, for generating training set according to the sample vector, and the training training set is to obtain first
Prediction model;It wherein, include multiple feature samples in the training set, each feature samples in the multiple feature samples are
Feature samples in the sample vector.
Based on second aspect, in one of the implementation manners, the sample vector includes positive sample and negative sample, described
Positive sample be the multiple feature samples in include preset field sample, the negative sample be the multiple feature samples in not
Sample comprising the preset field;The acquisition subelement further include:
Sampling unit, if being more than preset range for the ratio of the positive sample and the negative sample, to the negative sample
This progress down-sampling, so that the ratio of the positive sample and the negative sample in the training set is within preset range.
Based on second aspect, in one of the implementation manners, the acquiring unit further include:
Computing unit, for according to first prediction model predict as a result, calculating the essence of the multiple feature samples
The accuracy of exactness and recall rate, each feature samples in the importance ranking is greater than preset threshold, and recall rate is bigger,
The ranking in the importance ranking more before.
Based on second aspect, in one of the implementation manners, the preset period of time is no more than two hours.
It should be noted that the implementation of second aspect and corresponding beneficial effect are referred to first aspect and phase
The description in implementation is answered, details are not described herein again.
The third aspect, the embodiment of the invention discloses a kind of computer readable storage medium, the computer storage medium
It is stored with program instruction, described program instruction makes the processor execute first aspect or first party when being executed by a processor
Method described in any possible implementation in face.
It should be noted that the implementation of the third aspect and corresponding beneficial effect are referred to first aspect and phase
The description in implementation is answered, details are not described herein again.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, the present invention will be implemented below
Attached drawing needed in example or background technique is briefly described.
Fig. 1 is a kind of structural schematic diagram of the pre- measurement equipment of customer churn provided in an embodiment of the present invention;
Fig. 2 is a kind of flow diagram of customer churn prediction technique provided in an embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of customer churn prediction meanss provided in an embodiment of the present invention.
Specific embodiment
It is described below in conjunction with attached drawing technical solution in the embodiment of the present invention.
It should be appreciated that the term used in this present specification is merely for the sake of for the purpose of describing particular embodiments
And it is not intended to limit the application." embodiment " is referred in the specification of the present application it is meant that is described is specific in conjunction with the embodiments
Feature, structure or characteristic may be embodied at least one embodiment of the application.It is somebody's turn to do each position in the description
Phrase might not each mean identical embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.
Those skilled in the art explicitly and implicitly understand that embodiment described herein can mutually be tied with other embodiments
It closes.The term " equipment " that uses in the present specification, " unit ", " system " etc. for indicate computer-related entity, hardware,
Firmware, the combination of hardware and software, software or software in execution.For example, equipment can be but not limited to, and processor, data
Processing platform calculates equipment, computer, 2 or more computers etc..
It is also understood that referring in present specification to term "and/or" used in the appended claims related
Join any combination and all possible combinations of one or more of item listed, and including these combinations.
In order to better understand a kind of customer churn prediction technique provided by the embodiments of the present application, device and computer-readable
The equipment of storage medium, the customer churn prediction technique being first applicable in below the embodiment of the present application is described:
Refering to fig. 1, Fig. 1 is the equipment schematic diagram for the customer churn prediction technique that this programme embodiment provides.Equipment 10 can
To include processor 101, memory 104 and communication module 105, processor 101, memory 104 and communication module 105 can lead to
Cross the interconnection of bus 106.Memory 104 can be high speed random access memory (Random Access Memory, RAM)
Memory is also possible to non-volatile memory (non-volatile memory), for example, at least a magnetic disk storage.
Memory 104 optionally can also be that at least one is located remotely from the storage system of aforementioned processor 101.Memory 104 is used for
Application code is stored, may include operating system, network communication module, Subscriber Interface Module SIM and data processor;
Communication module 105 is used to carry out information exchange with external equipment, wherein may include for carrying out wireless, wired or other communications
The unit of mode.Optionally, the device in 103 parts for realizing receive capabilities can be considered as receiving unit, reality will be used for
The device of existing sending function is considered as transmission unit, i.e. 103 parts include receiving unit and transmission unit;Processor 101 can also be with
Referred to as processing unit handles veneer, processing module, processing unit etc..Processor can be central processing unit (central
Processing unit, CPU), the combination of network processing unit (network processor, NP) or CPU and NP.Work as processing
When device 101 calls the payment amount Prediction program of memory 104, method shown in Fig. 2 is executed.
In the concrete realization, the pre- measurement equipment 10 of customer churn may include cell phone, tablet computer, personal digital assistant
(Personal Digital Assistant, PDA), mobile internet device (Mobile Internet Device, MID),
The equipment that intelligent wearable device (such as smartwatch, Intelligent bracelet) various users can be used, the embodiment of the present application are not made to have
Body limits.
Optionally, the equipment can (multiple servers may be constructed a server set for one or more servers
Group), needing on server to run has corresponding server to provide corresponding customer churn prediction service, such as database
Service, data calculating, decision execution etc..
Customer churn prediction technique of the invention is illustrated below with reference to Fig. 2, as shown in Fig. 2, it is real for the present invention
A kind of flow diagram of customer churn prediction technique of example offer is provided, this method can be realized based on equipment shown in FIG. 1,
This method can include but is not limited to following steps:
Step S201: equipment is trained sample vector to obtain the first prediction model.
Specifically, after equipment gets sample vector, according to sample vector training Gradient Iteration decision tree (Gradient
Boosting Decision Tree, GBDT), to obtain the first prediction model, sample vector includes multiple feature samples, multiple
Each feature samples include the first training characteristics and user tag in feature samples, and the first training characteristics are the original in pre-set user
The feature extracted in beginning data, behavioral data when initial data includes representation data and operation target application, wherein number of drawing a portrait
According to comprising user's gender, age, region, end message etc., behavioral data includes to login number, online hours, outpost number, most
Closely once login time point etc.;User tag for describe pre-set user login next time target application distance this time login mesh
It marks the time of application, the first prediction model is used to carry out ranking to the importance of multiple feature samples.
In a kind of wherein embodiment, after equipment gets sample vector, training set is generated according to the sample vector, and
The training training set is to obtain the first prediction model;It wherein, include multiple feature samples in training set, in multiple feature samples
Each feature samples are the feature samples in sample vector;In other words, equipment to multiple feature samples in the sample vector into
Row screening, gets training set, wherein the mode screened can be the quantity according to positive negative sample as foundation, i.e. sample vector
Including positive sample and negative sample, positive sample is the sample in multiple feature samples comprising preset field, and negative sample is multiple features
Do not include the sample of preset field in sample, if the ratio of positive sample and negative sample is less than preset range in sample vector,
Equipment can carry out down-sampling to negative sample, i.e., primary to the several sample value values of the train interval of negative sample, so that in training set
Positive sample and negative sample ratio within preset range, if the ratio of positive sample and negative sample is more than pre- in sample vector
If range, then equipment, which can reduce the quantity of positive sample or increase the quantity of negative sample, makes positive sample in training set and negative sample
Within preset range, which is generally arranged between 0.2~0.5 this ratio.For example, if sample vector packet
This 20 feature samples of the M20 that includes M1, M2, M3, M4 ..., wherein positive sample is M1, M2, M3, remaining 17 are negative sample, at this moment
The ratio of positive negative sample is 0.176, is not belonging in preset range, then carries out down-sampling to negative sample, it can with 2 for interval pair
Negative sample is sampled, and the negative sample after sampling is M4, M6, M8, M10, M12, M14, M16, M18, M20, at this moment positive negative sample
Ratio be 0.33, belong in preset range, i.e. screening is completed, the feature samples in training set are M1, M2, M3, M4, M6, M8,
M10,M12,M14,M16,M18,M20.This this implementation of embodiment is by carrying out again the sample vector of acquisition
Screening, controls the ratio of positive negative sample in training set, improves the quality of feature samples, during training pattern rationally
Positive and negative sample proportion can be improved the accuracy of model.
Step S202: equipment generates the importance ranking of multiple feature samples in sample vector according to the first prediction model,
And obtain the cross feature of preceding k feature samples in importance ranking.
Specifically, equipment obtains the first prediction model according to sample vector training, by the output knot of first prediction model
Fruit is compared with the user tag in the sample vector, calculates the accuracy of multiple feature samples in sample vector and recalls
Rate, the accuracy of each feature samples in importance ranking are greater than preset threshold, and which is generally arranged at 0.8~
Between 0.9, and recall rate is bigger, the ranking in the importance ranking more before, in other words, as long as feature samples is accurate
Degree has been more than preset threshold, is just ranked up according to the recall rate of feature samples;Then preceding k in the importance ranking are obtained
Feature samples perform mathematical calculations to obtain cross feature, in the concrete realization, the operation include plus operation, subtract operation, multiplication,
At least one of division operation, i.e. at most there are four different cross features between two feature samples.
For example, the feature samples in training set be M1, M2, M3, M4, M6, M8, M10, M12, M14, M16, M18,
M20, wherein the accuracy of this seven feature samples of M1, M2, M3, M4, M6, M8, M10 is greater than preset threshold, then according to recall rate
Rankings are carried out to this seven feature samples, the bigger ranking of recall rate more before, ranking can be M6, M8, M2, M3, M10, M1, M4,
Default k=3 then performs mathematical calculations to obtain two-by-two new cross feature to before ranking 3 feature samples M6, M8 and M2, M6 and
The cross feature of M8 can be c1=M6+M8, c2=M6-M8, c3=M6*M8, c4=M6/M8;It should be understood that M6 and M2 it
Between cross feature and M2 and M8 between cross feature can similarly obtain.
Step S203: equipment updates the first prediction model according to cross feature and sample vector.
Specifically, after getting cross feature, equipment carries out feature selecting to the cross feature, and it is special to obtain optimal intersection
Sign, optimal cross feature may include multiple cross features, can select the quantity of required cross feature according to the actual situation,
Equipment updates the first prediction model according to the optimal cross feature and sample vector, obtains final prediction model.
Step S204: equipment extracts in initial data in the preset period of time for logining target application from user to be predicted
Second training characteristics are input to updated first prediction model by two training characteristics.
Specifically, after equipment obtains final prediction model, input user to be predicted login target application it is default when
The second training characteristics in section this time login target application to predict that user to be predicted logins target application distance next time
Time, wherein preset period of time is usually no more than two hours, in other words, logins two of target application in user to be predicted
In hour, equipment obtains behavioral data and representation data of the user to be predicted in target application, and wherein representation data includes
User's gender, age, region, end message etc., behavioral data include to login number, online hours, outpost number, the last time
Login time point etc.;Then the second training characteristics are extracted in behavioral data and representation data, are inputted second training characteristics and are arrived
In final prediction model, with predict user to be predicted login next time target application distance this time login target application when
Between, this embodiment provides the predicted time of hour grade, in two hours after can only being logged in using user or more
The data of short time predict whether user is lost, and more efficient provide prediction result, enable a device to provide faster suitable
Together in the personalized service of user to be predicted.
In the method depicted in fig. 2, equipment is trained sample vector to obtain the first prediction model, further according to
One prediction model generates the importance ranking of multiple feature samples in sample vector, and obtains preceding k feature in importance ranking
The cross feature of sample updates the first prediction model according to cross feature and sample vector and obtains final prediction model, with pre-
It surveys user to be predicted and logins the time that target application distance this time logins target application next time;It is this to pass through acquisition feature weight
The cross feature of preceding k feature samples carrys out the mode of training pattern in the property wanted ranking, can expand the coverage rate of important feature, from
And the accuracy of prediction model is improved, realize the prediction to customer churn.
For the ease of better implementing the above scheme of the embodiment of the present invention, the present invention is also corresponding to provide a kind of user's stream
Prediction meanss are lost, are described in detail with reference to the accompanying drawing:
As shown in figure 3, the embodiment of the present invention provides a kind of structural schematic diagram of customer churn prediction meanss 30, the device 30
It can be a device (for example, chip) in devices described above or the equipment, customer churn prediction meanss 30 can be with
It include: training unit 301, acquiring unit 302, updating unit 303, predicting unit 304, wherein
Training unit 301, for being trained to sample vector to obtain the first prediction model, wherein sample vector packet
Include multiple feature samples, each feature samples include training characteristics and user tag in multiple feature samples, training characteristics be
The feature extracted in the initial data of pre-set user, behavior number when initial data includes representation data and operation target application
According to;User tag is used to describe pre-set user and logins the time that target application distance this time logins target application next time, the
One prediction model is used to carry out ranking to the importance of multiple feature samples;
Acquiring unit 302, the importance for generating multiple feature samples in sample vector according to the first prediction model are arranged
Name, and the cross feature of preceding k feature samples in importance ranking is obtained, cross feature is that preceding k feature samples carry out mathematics
The obtained feature of operation;
Updating unit 303, for updating the first prediction model according to cross feature and sample vector;
Predicting unit 304, for being mentioned in initial data in the preset period of time for logining target application from user to be predicted
The second training characteristics are taken, the second training characteristics are input to updated first prediction model, to predict one under user to be predicted
The secondary time logined target application distance and this time login target application.
Wherein, preset period of time is no more than two hours.
In a kind of wherein embodiment, training unit 301 includes obtaining subelement 305 and training subelement 306, wherein
Subelement 305 is obtained for obtaining sample vector;
Training subelement 306 is used to generate training set according to sample vector, and training training set is to obtain the first prediction mould
Type;It wherein, include multiple feature samples in training set, each feature samples in multiple feature samples are the spy in sample vector
Levy sample.
In a kind of wherein embodiment, sample vector includes positive sample and negative sample, and positive sample is in multiple feature samples
Sample comprising preset field, negative sample are the sample for not including preset field in multiple feature samples;Obtain subelement 306 also
Include:
Sampling unit 307 adopt to negative sample if being more than preset range for the ratio of positive sample and negative sample
Sample, so that the ratio of positive sample and negative sample in training set is within preset range.
In a kind of wherein embodiment, acquiring unit 302 further include:
Computing unit 308, for according to the first prediction model predict as a result, calculate multiple feature samples accuracy and
The accuracy of recall rate, each feature samples in importance ranking is greater than preset threshold, and recall rate is bigger, arranges in importance
Name in ranking more before.
It should be noted that the function of each functional unit can be found in device described in Fig. 3 in the embodiment of the present application
The associated description of step S201- step S204 in embodiment of the method described in Fig. 2 is stated, details are not described herein again.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium
In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic
Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access
Memory, RAM) etc..
In this application, the unit as illustrated by the separation member may or may not be physically separate
, component shown as a unit may or may not be physical unit, it can and it is in one place, or can also
To be distributed over a plurality of network elements.Some or all of unit therein can be selected to realize this hair according to the actual needs
The purpose of bright example scheme.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, is also possible to two or more units and is integrated in one unit.It is above-mentioned integrated
Unit both can take the form of hardware realization, can also realize in the form of software functional units.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace
It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right
It is required that protection scope subject to.
It should be understood that magnitude of the sequence numbers of the above procedures are not meant to execute suitable in the various embodiments of the application
Sequence it is successive, the execution of each process sequence should be determined by its function and internal logic, the implementation without coping with the embodiment of the present invention
Process constitutes any restriction.Although the application is described in conjunction with each embodiment herein, however, being protected required by embodiment
During the application of shield, those skilled in the art are appreciated that and realize other variations of open embodiment.
Claims (10)
1. a kind of customer churn prediction technique characterized by comprising
Sample vector is trained to obtain the first prediction model, wherein the sample vector includes multiple feature samples, institute
Stating each feature samples in multiple feature samples includes the first training characteristics and user tag, and first training characteristics are pre-
Behavior number if the feature extracted in the initial data of user, when the initial data includes representation data and operation target application
According to;The user tag logins target application distance for describing the pre-set user next time and this time logins the mesh
It marks the time of application, first prediction model is used to carry out ranking to the importance of the multiple feature samples;
The importance ranking of multiple feature samples in the sample vector is generated according to first prediction model, and described in acquisition
The cross feature of preceding k feature samples in importance ranking, the cross feature are that the preceding k feature samples carry out mathematics fortune
Calculate obtained feature;
First prediction model is updated according to the cross feature and the sample vector;
It is special that the second training is extracted in the initial data in the preset period of time for logining the target application from user to be predicted
Second training characteristics are input to updated first prediction model, to predict one under the user to be predicted by sign
The secondary time logined the target application distance and this time login the target application.
2. the method according to claim 1, wherein described be trained sample vector to obtain the first prediction
Model, comprising:
Obtain sample vector;
Training set is generated according to the sample vector, and the training training set is to obtain the first prediction model;Wherein, the instruction
Practicing concentration includes multiple feature samples, and each feature samples in the multiple feature samples are the feature in the sample vector
Sample.
3. according to the method described in claim 2, it is characterized in that, described generate training set according to the sample vector, comprising:
The sample vector includes positive sample and negative sample, and the positive sample is in the multiple feature samples comprising preset field
Sample, the negative sample be the multiple feature samples in do not include the preset field sample, if the positive sample with
The ratio of the negative sample is more than preset range, then down-sampling is carried out to the negative sample, so that described in the training set
The ratio of positive sample and the negative sample is within preset range.
4. method according to claim 1-3, which is characterized in that described to be generated according to first prediction model
The importance ranking of the multiple feature samples, comprising:
It is described according to first prediction model prediction as a result, calculate the accuracy and recall rate of the multiple feature samples
The accuracy of each feature samples in importance ranking is greater than preset threshold, and recall rate is bigger, in the importance ranking
In ranking more before.
5. method according to claim 1-3, which is characterized in that the preset period of time is no more than two hours.
6. a kind of customer churn prediction meanss characterized by comprising
Training unit, for being trained to sample vector to obtain the first prediction model, wherein the sample vector includes more
A feature samples, each feature samples include the first training characteristics and user tag in the multiple feature samples, and described first
Training characteristics are the feature extracted in the initial data of pre-set user, and the initial data includes representation data and operation target
Using when behavioral data;The user tag logins the target application apart from this for describing the pre-set user next time
It once logins the time of the target application, first prediction model is used to carry out the importance of the multiple feature samples
Ranking;
Acquiring unit, the importance for generating multiple feature samples in the sample vector according to first prediction model are arranged
Name, and the cross feature of preceding k feature samples in the importance ranking is obtained, the cross feature is the preceding k feature
Sample performs mathematical calculations obtained feature;
Updating unit, for updating first prediction model according to the cross feature and the sample vector;
Predicting unit, for being mentioned in the initial data in the preset period of time for logining the target application from user to be predicted
Take the second training characteristics, second training characteristics be input to updated first prediction model, with predict it is described to
Prediction user logins the time that the target application distance this time logins the target application next time.
7. device according to claim 6, which is characterized in that the training unit includes:
Subelement is obtained, for obtaining sample vector;
Training subelement, for generating training set according to the sample vector, and the training training set is to obtain the first prediction
Model;It wherein, include multiple feature samples in the training set, each feature samples in the multiple feature samples are described
Feature samples in sample vector.
8. device according to claim 7, which is characterized in that the sample vector includes positive sample and negative sample, described
Positive sample be the multiple feature samples in include preset field sample, the negative sample be the multiple feature samples in not
Sample comprising the preset field;The acquisition subelement further include:
Sampling unit, if the ratio for the positive sample and the negative sample is more than preset range, to the negative sample into
Row down-sampling, so that the ratio of the positive sample and the negative sample in the training set is within preset range.
9. according to the described in any item devices of claim 6-8, which is characterized in that the acquiring unit further include:
Computing unit, for according to first prediction model predict as a result, calculating the accuracy of the multiple feature samples
And recall rate, the accuracy of each feature samples in the importance ranking is greater than preset threshold, and recall rate is bigger, in institute
Before stating ranking in importance ranking more.
10. a kind of computer readable storage medium, which is characterized in that the computer storage medium is stored with program instruction, institute
Stating program instruction when being executed by a processor makes the processor execute the method according to claim 1 to 5.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2018115964421 | 2018-12-25 | ||
CN201811596442 | 2018-12-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109903100A true CN109903100A (en) | 2019-06-18 |
Family
ID=66953474
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910225076.7A Pending CN109903100A (en) | 2018-12-25 | 2019-03-22 | A kind of customer churn prediction technique, device and readable storage medium storing program for executing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109903100A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110598845A (en) * | 2019-08-13 | 2019-12-20 | 中国平安人寿保险股份有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN111803957A (en) * | 2020-07-17 | 2020-10-23 | 网易(杭州)网络有限公司 | Player prediction method and device for online game, computer equipment and medium |
CN111861588A (en) * | 2020-08-06 | 2020-10-30 | 网易(杭州)网络有限公司 | Training method of loss prediction model, player loss reason analysis method and player loss reason analysis device |
CN112245934A (en) * | 2020-11-16 | 2021-01-22 | 腾讯科技(深圳)有限公司 | Data analysis method, device and equipment for virtual resources in virtual scene application |
CN115018562A (en) * | 2022-07-06 | 2022-09-06 | 湖南草花互动科技股份公司 | User pre-churn prediction method, device and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120233159A1 (en) * | 2011-03-10 | 2012-09-13 | International Business Machines Corporation | Hierarchical ranking of facial attributes |
CN105005909A (en) * | 2015-06-17 | 2015-10-28 | 深圳市腾讯计算机***有限公司 | Method and device for predicting lost users |
CN107832581A (en) * | 2017-12-15 | 2018-03-23 | 百度在线网络技术(北京)有限公司 | Trend prediction method and device |
CN108121795A (en) * | 2017-12-20 | 2018-06-05 | 北京奇虎科技有限公司 | User's behavior prediction method and device |
-
2019
- 2019-03-22 CN CN201910225076.7A patent/CN109903100A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120233159A1 (en) * | 2011-03-10 | 2012-09-13 | International Business Machines Corporation | Hierarchical ranking of facial attributes |
CN105005909A (en) * | 2015-06-17 | 2015-10-28 | 深圳市腾讯计算机***有限公司 | Method and device for predicting lost users |
CN107832581A (en) * | 2017-12-15 | 2018-03-23 | 百度在线网络技术(北京)有限公司 | Trend prediction method and device |
CN108121795A (en) * | 2017-12-20 | 2018-06-05 | 北京奇虎科技有限公司 | User's behavior prediction method and device |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110598845A (en) * | 2019-08-13 | 2019-12-20 | 中国平安人寿保险股份有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN111803957A (en) * | 2020-07-17 | 2020-10-23 | 网易(杭州)网络有限公司 | Player prediction method and device for online game, computer equipment and medium |
CN111803957B (en) * | 2020-07-17 | 2024-02-09 | 网易(杭州)网络有限公司 | Method, device, computer equipment and medium for predicting players of online games |
CN111861588A (en) * | 2020-08-06 | 2020-10-30 | 网易(杭州)网络有限公司 | Training method of loss prediction model, player loss reason analysis method and player loss reason analysis device |
CN111861588B (en) * | 2020-08-06 | 2023-10-31 | 网易(杭州)网络有限公司 | Training method of loss prediction model, player loss reason analysis method and player loss reason analysis device |
CN112245934A (en) * | 2020-11-16 | 2021-01-22 | 腾讯科技(深圳)有限公司 | Data analysis method, device and equipment for virtual resources in virtual scene application |
CN115018562A (en) * | 2022-07-06 | 2022-09-06 | 湖南草花互动科技股份公司 | User pre-churn prediction method, device and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109903100A (en) | A kind of customer churn prediction technique, device and readable storage medium storing program for executing | |
TWI788529B (en) | Credit risk prediction method and device based on LSTM model | |
US10958748B2 (en) | Resource push method and apparatus | |
CN108021983A (en) | Neural framework search | |
CN106250403A (en) | Customer loss Forecasting Methodology and device | |
CN111507768B (en) | Potential user determination method and related device | |
CN103999049B (en) | Method and apparatus for predicting virtual machine demand | |
EP3574453A1 (en) | Optimizing neural network architectures | |
CN109690576A (en) | The training machine learning model in multiple machine learning tasks | |
CN108431832A (en) | Neural network is expanded using external memory | |
CN112183818A (en) | Recommendation probability prediction method and device, electronic equipment and storage medium | |
CN109784959A (en) | A kind of target user's prediction technique, device, background server and storage medium | |
CN113379042B (en) | Business prediction model training method and device for protecting data privacy | |
CN112232887A (en) | Data processing method and device, computer equipment and storage medium | |
CN107368499B (en) | Client label modeling and recommending method and device | |
CN111402028A (en) | Information processing method, device and equipment | |
CN116955808A (en) | Game recommendation method, device, electronic equipment and medium | |
KR102010031B1 (en) | Method and apparatus for predicting game indicator information | |
CN111686451A (en) | Business processing method, device, equipment and computer storage medium | |
CN114092162B (en) | Recommendation quality determination method, and training method and device of recommendation quality determination model | |
CN116823264A (en) | Risk identification method, risk identification device, electronic equipment, medium and program product | |
CN114238106A (en) | Test time prediction method and device, electronic device and storage medium | |
CN112242959B (en) | Micro-service current-limiting control method, device, equipment and computer storage medium | |
CN114757700A (en) | Article sales prediction model training method, article sales prediction method and apparatus | |
CN113627513A (en) | Training data generation method and system, electronic device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190618 |