CN109903100A

CN109903100A - A kind of customer churn prediction technique, device and readable storage medium storing program for executing

Info

Publication number: CN109903100A
Application number: CN201910225076.7A
Authority: CN
Inventors: 苏杰; ***
Original assignee: Meng Yu Science And Technology Ltd Of Shenzhen
Current assignee: Meng Yu Science And Technology Ltd Of Shenzhen
Priority date: 2018-12-25
Filing date: 2019-03-22
Publication date: 2019-06-18

Abstract

The embodiment of the invention discloses a kind of customer churn prediction technique, device and readable storage medium storing program for executing.This method comprises: equipment is trained sample vector to obtain the first prediction model, the importance ranking of multiple feature samples in sample vector is generated further according to the first prediction model, and obtain the cross feature of preceding k feature samples in importance ranking, after obtaining final prediction model according to cross feature and sample vector the first prediction model of update, the second training characteristics of user to be predicted are input to updated first prediction model to predict that user to be predicted logins the time that target application distance this time logins target application next time by equipment.Using the embodiment of the present application, the accuracy of prediction model can be improved, realize the prediction to customer churn.

Description

A kind of customer churn prediction technique, device and readable storage medium storing program for executing

Technical field

The present invention relates to technical field of data processing more particularly to a kind of customer churn prediction technique, device and readable deposit Storage media.

Background technique

Many network services and game on line have been all suffered from there is a large amount of use within a few minutes or a few houres of beginning The case where family is lost, in order to reduce customer churn, can predict the loss of user, to formulate not for different users Same strategy, improves the game experiencing of user.

Existing attrition prediction method mostly uses core index to fluctuate or use the methods of logistic regression, decision tree, core Heart index refers mainly to game duration, outpost failure rate etc. and occurs then to think that user will be lost when large variation, and logistic regression is determined Plan tree refers mainly to carry out whether prediction user will be lost according to user's history Behavioral availability logistic regression or decision tree.So And both mode covering surfaces are relatively narrow, prediction accuracy is not high.Therefore, how more accurately prediction customer churn situation is this The problem of technical field personnel are studying.

Summary of the invention

The embodiment of the invention discloses a kind of customer churn prediction technique, device and readable storage medium storing program for executing, can be realized pair The prediction of customer churn, and improve the accuracy of prediction model.

In a first aspect, the embodiment of the invention provides a kind of customer churn prediction techniques, this method comprises:

Sample vector is trained to obtain the first prediction model, wherein the sample vector includes multiple feature samples This, each feature samples include the first training characteristics and user tag, first training characteristics in the multiple feature samples For the feature extracted in the initial data of pre-set user, when the initial data includes representation data and operation target application Behavioral data；The user tag is logined target application distance for describing the pre-set user next time and is this time logined The time of the target application, first prediction model are used to carry out ranking to the importance of the multiple feature samples；

The importance ranking of multiple feature samples in the sample vector is generated according to first prediction model, and is obtained The cross feature of preceding k feature samples in the importance ranking, the cross feature are that the preceding k feature samples are counted Obtained feature is calculated in student movement；

First prediction model is updated according to the cross feature and the sample vector；

The second instruction is extracted in the initial data in the preset period of time for logining the target application from user to be predicted Practice feature, second training characteristics is input to updated first prediction model, to predict the user to be predicted The time that the target application distance this time logins the target application is logined next time.

In the above-mentioned methods, equipment is trained sample vector to obtain the first prediction model, further according to the first prediction Model generates the importance ranking of multiple feature samples in sample vector, and obtains preceding k feature samples in importance ranking Cross feature updates the first prediction model according to cross feature and sample vector and obtains final prediction model, to predict to pre- It surveys user and logins the time that target application distance this time logins target application next time；It is this to be arranged by obtaining feature importance The cross feature of preceding k feature samples carrys out the mode of training pattern in name, can expand the coverage rate of important feature, to improve The prediction to customer churn is realized in the accuracy of prediction model.

It is described that sample vector is trained to obtain based in a first aspect, in a kind of wherein optional implementation First prediction model, comprising:

Obtain sample vector；

Training set is generated according to the sample vector, and the training training set is to obtain the first prediction model；Wherein, institute Stating includes multiple feature samples in training set, and each feature samples in the multiple feature samples are in the sample vector Feature samples.

This implementation is screened again by the sample vector to acquisition, improves the quality of feature samples, from And improve the accuracy of model.

Based in a first aspect, described generated according to the sample vector is trained in a kind of wherein optional implementation Collection, comprising:

The sample vector includes positive sample and negative sample, and the positive sample is in the multiple feature samples comprising default The sample of field, the negative sample is the sample for not including the preset field in the multiple feature samples, if the positive sample The ratio of this and the negative sample is more than preset range, then down-sampling is carried out to the negative sample, so that in the training set The ratio of the positive sample and the negative sample is within preset range.

This implementation is provided with the ratio of positive negative sample in training set, reasonable positive and negative during training pattern Sample proportion can be improved the accuracy of model.

It is described to be generated according to first prediction model based in a first aspect, in a kind of wherein optional implementation The importance ranking of the multiple feature samples, comprising:

According to first prediction model prediction as a result, calculate the accuracy and recall rate of the multiple feature samples, The accuracy of each feature samples in the importance ranking is greater than preset threshold, and recall rate is bigger, in the importance Ranking in ranking more before.

Based in a first aspect, the preset period of time is no more than two hours in a kind of wherein optional implementation.

Such implementation offers the predicted times of hour grade, in two hours after can only being logged in using user Or the data of shorter time predict whether user is lost, more efficient provides prediction result, enable a device to mention faster For being suitable for the personalized service of user to be predicted.

Second aspect, the embodiment of the invention provides a kind of customer churn prediction meanss, which includes:

Training unit, for being trained to sample vector to obtain the first prediction model, wherein the sample vector packet Multiple feature samples are included, each feature samples include the first training characteristics and user tag in the multiple feature samples, described First training characteristics are the feature extracted in the initial data of pre-set user, and the initial data includes representation data and operation Behavioral data when target application；The user tag for describe the pre-set user login next time the target application away from From the time for this time logining the target application, first prediction model is used for the importance to the multiple feature samples Carry out ranking；

Acquiring unit, for generating the important of multiple feature samples in the sample vector according to first prediction model Property ranking, and obtain the cross feature of preceding k feature samples in the importance ranking, the cross feature is the preceding k Feature samples perform mathematical calculations obtained feature；

Updating unit, for updating first prediction model according to the cross feature and the sample vector；

Predicting unit mentions in the initial data in the preset period of time for logining the target application from user to be predicted Take the second training characteristics, second training characteristics be input to updated first prediction model, with predict it is described to Prediction user logins the time that the target application distance this time logins the target application next time.

Based on second aspect, in one of the implementation manners, the training unit includes:

Subelement is obtained, for obtaining sample vector；

Training subelement, for generating training set according to the sample vector, and the training training set is to obtain first Prediction model；It wherein, include multiple feature samples in the training set, each feature samples in the multiple feature samples are Feature samples in the sample vector.

Based on second aspect, in one of the implementation manners, the sample vector includes positive sample and negative sample, described Positive sample be the multiple feature samples in include preset field sample, the negative sample be the multiple feature samples in not Sample comprising the preset field；The acquisition subelement further include:

Sampling unit, if being more than preset range for the ratio of the positive sample and the negative sample, to the negative sample This progress down-sampling, so that the ratio of the positive sample and the negative sample in the training set is within preset range.

Based on second aspect, in one of the implementation manners, the acquiring unit further include:

Computing unit, for according to first prediction model predict as a result, calculating the essence of the multiple feature samples The accuracy of exactness and recall rate, each feature samples in the importance ranking is greater than preset threshold, and recall rate is bigger, The ranking in the importance ranking more before.

Based on second aspect, in one of the implementation manners, the preset period of time is no more than two hours.

It should be noted that the implementation of second aspect and corresponding beneficial effect are referred to first aspect and phase The description in implementation is answered, details are not described herein again.

The third aspect, the embodiment of the invention discloses a kind of computer readable storage medium, the computer storage medium It is stored with program instruction, described program instruction makes the processor execute first aspect or first party when being executed by a processor Method described in any possible implementation in face.

It should be noted that the implementation of the third aspect and corresponding beneficial effect are referred to first aspect and phase The description in implementation is answered, details are not described herein again.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, the present invention will be implemented below Attached drawing needed in example or background technique is briefly described.

Fig. 1 is a kind of structural schematic diagram of the pre- measurement equipment of customer churn provided in an embodiment of the present invention；

Fig. 2 is a kind of flow diagram of customer churn prediction technique provided in an embodiment of the present invention；

Fig. 3 is a kind of structural schematic diagram of customer churn prediction meanss provided in an embodiment of the present invention.

Specific embodiment

It is described below in conjunction with attached drawing technical solution in the embodiment of the present invention.

It should be appreciated that the term used in this present specification is merely for the sake of for the purpose of describing particular embodiments And it is not intended to limit the application." embodiment " is referred in the specification of the present application it is meant that is described is specific in conjunction with the embodiments Feature, structure or characteristic may be embodied at least one embodiment of the application.It is somebody's turn to do each position in the description Phrase might not each mean identical embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion. Those skilled in the art explicitly and implicitly understand that embodiment described herein can mutually be tied with other embodiments It closes.The term " equipment " that uses in the present specification, " unit ", " system " etc. for indicate computer-related entity, hardware, Firmware, the combination of hardware and software, software or software in execution.For example, equipment can be but not limited to, and processor, data Processing platform calculates equipment, computer, 2 or more computers etc..

It is also understood that referring in present specification to term "and/or" used in the appended claims related Join any combination and all possible combinations of one or more of item listed, and including these combinations.

In order to better understand a kind of customer churn prediction technique provided by the embodiments of the present application, device and computer-readable The equipment of storage medium, the customer churn prediction technique being first applicable in below the embodiment of the present application is described:

Refering to fig. 1, Fig. 1 is the equipment schematic diagram for the customer churn prediction technique that this programme embodiment provides.Equipment 10 can To include processor 101, memory 104 and communication module 105, processor 101, memory 104 and communication module 105 can lead to Cross the interconnection of bus 106.Memory 104 can be high speed random access memory (Random Access Memory, RAM) Memory is also possible to non-volatile memory (non-volatile memory), for example, at least a magnetic disk storage. Memory 104 optionally can also be that at least one is located remotely from the storage system of aforementioned processor 101.Memory 104 is used for Application code is stored, may include operating system, network communication module, Subscriber Interface Module SIM and data processor； Communication module 105 is used to carry out information exchange with external equipment, wherein may include for carrying out wireless, wired or other communications The unit of mode.Optionally, the device in 103 parts for realizing receive capabilities can be considered as receiving unit, reality will be used for The device of existing sending function is considered as transmission unit, i.e. 103 parts include receiving unit and transmission unit；Processor 101 can also be with Referred to as processing unit handles veneer, processing module, processing unit etc..Processor can be central processing unit (central Processing unit, CPU), the combination of network processing unit (network processor, NP) or CPU and NP.Work as processing When device 101 calls the payment amount Prediction program of memory 104, method shown in Fig. 2 is executed.

In the concrete realization, the pre- measurement equipment 10 of customer churn may include cell phone, tablet computer, personal digital assistant (Personal Digital Assistant, PDA), mobile internet device (Mobile Internet Device, MID), The equipment that intelligent wearable device (such as smartwatch, Intelligent bracelet) various users can be used, the embodiment of the present application are not made to have Body limits.

Optionally, the equipment can (multiple servers may be constructed a server set for one or more servers Group), needing on server to run has corresponding server to provide corresponding customer churn prediction service, such as database Service, data calculating, decision execution etc..

Customer churn prediction technique of the invention is illustrated below with reference to Fig. 2, as shown in Fig. 2, it is real for the present invention A kind of flow diagram of customer churn prediction technique of example offer is provided, this method can be realized based on equipment shown in FIG. 1, This method can include but is not limited to following steps:

Step S201: equipment is trained sample vector to obtain the first prediction model.

Specifically, after equipment gets sample vector, according to sample vector training Gradient Iteration decision tree (Gradient Boosting Decision Tree, GBDT), to obtain the first prediction model, sample vector includes multiple feature samples, multiple Each feature samples include the first training characteristics and user tag in feature samples, and the first training characteristics are the original in pre-set user The feature extracted in beginning data, behavioral data when initial data includes representation data and operation target application, wherein number of drawing a portrait According to comprising user's gender, age, region, end message etc., behavioral data includes to login number, online hours, outpost number, most Closely once login time point etc.；User tag for describe pre-set user login next time target application distance this time login mesh It marks the time of application, the first prediction model is used to carry out ranking to the importance of multiple feature samples.

In a kind of wherein embodiment, after equipment gets sample vector, training set is generated according to the sample vector, and The training training set is to obtain the first prediction model；It wherein, include multiple feature samples in training set, in multiple feature samples Each feature samples are the feature samples in sample vector；In other words, equipment to multiple feature samples in the sample vector into Row screening, gets training set, wherein the mode screened can be the quantity according to positive negative sample as foundation, i.e. sample vector Including positive sample and negative sample, positive sample is the sample in multiple feature samples comprising preset field, and negative sample is multiple features Do not include the sample of preset field in sample, if the ratio of positive sample and negative sample is less than preset range in sample vector, Equipment can carry out down-sampling to negative sample, i.e., primary to the several sample value values of the train interval of negative sample, so that in training set Positive sample and negative sample ratio within preset range, if the ratio of positive sample and negative sample is more than pre- in sample vector If range, then equipment, which can reduce the quantity of positive sample or increase the quantity of negative sample, makes positive sample in training set and negative sample Within preset range, which is generally arranged between 0.2~0.5 this ratio.For example, if sample vector packet This 20 feature samples of the M20 that includes M1, M2, M3, M4 ..., wherein positive sample is M1, M2, M3, remaining 17 are negative sample, at this moment The ratio of positive negative sample is 0.176, is not belonging in preset range, then carries out down-sampling to negative sample, it can with 2 for interval pair Negative sample is sampled, and the negative sample after sampling is M4, M6, M8, M10, M12, M14, M16, M18, M20, at this moment positive negative sample Ratio be 0.33, belong in preset range, i.e. screening is completed, the feature samples in training set are M1, M2, M3, M4, M6, M8, M10,M12,M14,M16,M18,M20.This this implementation of embodiment is by carrying out again the sample vector of acquisition Screening, controls the ratio of positive negative sample in training set, improves the quality of feature samples, during training pattern rationally Positive and negative sample proportion can be improved the accuracy of model.

Step S202: equipment generates the importance ranking of multiple feature samples in sample vector according to the first prediction model, And obtain the cross feature of preceding k feature samples in importance ranking.

Specifically, equipment obtains the first prediction model according to sample vector training, by the output knot of first prediction model Fruit is compared with the user tag in the sample vector, calculates the accuracy of multiple feature samples in sample vector and recalls Rate, the accuracy of each feature samples in importance ranking are greater than preset threshold, and which is generally arranged at 0.8~ Between 0.9, and recall rate is bigger, the ranking in the importance ranking more before, in other words, as long as feature samples is accurate Degree has been more than preset threshold, is just ranked up according to the recall rate of feature samples；Then preceding k in the importance ranking are obtained Feature samples perform mathematical calculations to obtain cross feature, in the concrete realization, the operation include plus operation, subtract operation, multiplication, At least one of division operation, i.e. at most there are four different cross features between two feature samples.

For example, the feature samples in training set be M1, M2, M3, M4, M6, M8, M10, M12, M14, M16, M18, M20, wherein the accuracy of this seven feature samples of M1, M2, M3, M4, M6, M8, M10 is greater than preset threshold, then according to recall rate Rankings are carried out to this seven feature samples, the bigger ranking of recall rate more before, ranking can be M6, M8, M2, M3, M10, M1, M4, Default k=3 then performs mathematical calculations to obtain two-by-two new cross feature to before ranking 3 feature samples M6, M8 and M2, M6 and The cross feature of M8 can be c₁=M6+M8, c₂=M6-M8, c₃=M6*M8, c₄=M6/M8；It should be understood that M6 and M2 it Between cross feature and M2 and M8 between cross feature can similarly obtain.

Step S203: equipment updates the first prediction model according to cross feature and sample vector.

Specifically, after getting cross feature, equipment carries out feature selecting to the cross feature, and it is special to obtain optimal intersection Sign, optimal cross feature may include multiple cross features, can select the quantity of required cross feature according to the actual situation, Equipment updates the first prediction model according to the optimal cross feature and sample vector, obtains final prediction model.

Step S204: equipment extracts in initial data in the preset period of time for logining target application from user to be predicted Second training characteristics are input to updated first prediction model by two training characteristics.

Specifically, after equipment obtains final prediction model, input user to be predicted login target application it is default when The second training characteristics in section this time login target application to predict that user to be predicted logins target application distance next time Time, wherein preset period of time is usually no more than two hours, in other words, logins two of target application in user to be predicted In hour, equipment obtains behavioral data and representation data of the user to be predicted in target application, and wherein representation data includes User's gender, age, region, end message etc., behavioral data include to login number, online hours, outpost number, the last time Login time point etc.；Then the second training characteristics are extracted in behavioral data and representation data, are inputted second training characteristics and are arrived In final prediction model, with predict user to be predicted login next time target application distance this time login target application when Between, this embodiment provides the predicted time of hour grade, in two hours after can only being logged in using user or more The data of short time predict whether user is lost, and more efficient provide prediction result, enable a device to provide faster suitable Together in the personalized service of user to be predicted.

In the method depicted in fig. 2, equipment is trained sample vector to obtain the first prediction model, further according to One prediction model generates the importance ranking of multiple feature samples in sample vector, and obtains preceding k feature in importance ranking The cross feature of sample updates the first prediction model according to cross feature and sample vector and obtains final prediction model, with pre- It surveys user to be predicted and logins the time that target application distance this time logins target application next time；It is this to pass through acquisition feature weight The cross feature of preceding k feature samples carrys out the mode of training pattern in the property wanted ranking, can expand the coverage rate of important feature, from And the accuracy of prediction model is improved, realize the prediction to customer churn.

For the ease of better implementing the above scheme of the embodiment of the present invention, the present invention is also corresponding to provide a kind of user's stream Prediction meanss are lost, are described in detail with reference to the accompanying drawing:

As shown in figure 3, the embodiment of the present invention provides a kind of structural schematic diagram of customer churn prediction meanss 30, the device 30 It can be a device (for example, chip) in devices described above or the equipment, customer churn prediction meanss 30 can be with It include: training unit 301, acquiring unit 302, updating unit 303, predicting unit 304, wherein

Training unit 301, for being trained to sample vector to obtain the first prediction model, wherein sample vector packet Include multiple feature samples, each feature samples include training characteristics and user tag in multiple feature samples, training characteristics be The feature extracted in the initial data of pre-set user, behavior number when initial data includes representation data and operation target application According to；User tag is used to describe pre-set user and logins the time that target application distance this time logins target application next time, the One prediction model is used to carry out ranking to the importance of multiple feature samples；

Acquiring unit 302, the importance for generating multiple feature samples in sample vector according to the first prediction model are arranged Name, and the cross feature of preceding k feature samples in importance ranking is obtained, cross feature is that preceding k feature samples carry out mathematics The obtained feature of operation；

Updating unit 303, for updating the first prediction model according to cross feature and sample vector；

Predicting unit 304, for being mentioned in initial data in the preset period of time for logining target application from user to be predicted The second training characteristics are taken, the second training characteristics are input to updated first prediction model, to predict one under user to be predicted The secondary time logined target application distance and this time login target application.

Wherein, preset period of time is no more than two hours.

In a kind of wherein embodiment, training unit 301 includes obtaining subelement 305 and training subelement 306, wherein Subelement 305 is obtained for obtaining sample vector；

Training subelement 306 is used to generate training set according to sample vector, and training training set is to obtain the first prediction mould Type；It wherein, include multiple feature samples in training set, each feature samples in multiple feature samples are the spy in sample vector Levy sample.

In a kind of wherein embodiment, sample vector includes positive sample and negative sample, and positive sample is in multiple feature samples Sample comprising preset field, negative sample are the sample for not including preset field in multiple feature samples；Obtain subelement 306 also Include:

Sampling unit 307 adopt to negative sample if being more than preset range for the ratio of positive sample and negative sample Sample, so that the ratio of positive sample and negative sample in training set is within preset range.

In a kind of wherein embodiment, acquiring unit 302 further include:

Computing unit 308, for according to the first prediction model predict as a result, calculate multiple feature samples accuracy and The accuracy of recall rate, each feature samples in importance ranking is greater than preset threshold, and recall rate is bigger, arranges in importance Name in ranking more before.

It should be noted that the function of each functional unit can be found in device described in Fig. 3 in the embodiment of the present application The associated description of step S201- step S204 in embodiment of the method described in Fig. 2 is stated, details are not described herein again.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..

In this application, the unit as illustrated by the separation member may or may not be physically separate , component shown as a unit may or may not be physical unit, it can and it is in one place, or can also To be distributed over a plurality of network elements.Some or all of unit therein can be selected to realize this hair according to the actual needs The purpose of bright example scheme.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, is also possible to two or more units and is integrated in one unit.It is above-mentioned integrated Unit both can take the form of hardware realization, can also realize in the form of software functional units.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right It is required that protection scope subject to.

It should be understood that magnitude of the sequence numbers of the above procedures are not meant to execute suitable in the various embodiments of the application Sequence it is successive, the execution of each process sequence should be determined by its function and internal logic, the implementation without coping with the embodiment of the present invention Process constitutes any restriction.Although the application is described in conjunction with each embodiment herein, however, being protected required by embodiment During the application of shield, those skilled in the art are appreciated that and realize other variations of open embodiment.

Claims

1. a kind of customer churn prediction technique characterized by comprising

Sample vector is trained to obtain the first prediction model, wherein the sample vector includes multiple feature samples, institute Stating each feature samples in multiple feature samples includes the first training characteristics and user tag, and first training characteristics are pre- Behavior number if the feature extracted in the initial data of user, when the initial data includes representation data and operation target application According to；The user tag logins target application distance for describing the pre-set user next time and this time logins the mesh It marks the time of application, first prediction model is used to carry out ranking to the importance of the multiple feature samples；

The importance ranking of multiple feature samples in the sample vector is generated according to first prediction model, and described in acquisition The cross feature of preceding k feature samples in importance ranking, the cross feature are that the preceding k feature samples carry out mathematics fortune Calculate obtained feature；

It is special that the second training is extracted in the initial data in the preset period of time for logining the target application from user to be predicted Second training characteristics are input to updated first prediction model, to predict one under the user to be predicted by sign The secondary time logined the target application distance and this time login the target application.

2. the method according to claim 1, wherein described be trained sample vector to obtain the first prediction Model, comprising:

Obtain sample vector；

Training set is generated according to the sample vector, and the training training set is to obtain the first prediction model；Wherein, the instruction Practicing concentration includes multiple feature samples, and each feature samples in the multiple feature samples are the feature in the sample vector Sample.

3. according to the method described in claim 2, it is characterized in that, described generate training set according to the sample vector, comprising:

The sample vector includes positive sample and negative sample, and the positive sample is in the multiple feature samples comprising preset field Sample, the negative sample be the multiple feature samples in do not include the preset field sample, if the positive sample with The ratio of the negative sample is more than preset range, then down-sampling is carried out to the negative sample, so that described in the training set The ratio of positive sample and the negative sample is within preset range.

4. method according to claim 1-3, which is characterized in that described to be generated according to first prediction model The importance ranking of the multiple feature samples, comprising:

It is described according to first prediction model prediction as a result, calculate the accuracy and recall rate of the multiple feature samples The accuracy of each feature samples in importance ranking is greater than preset threshold, and recall rate is bigger, in the importance ranking In ranking more before.

5. method according to claim 1-3, which is characterized in that the preset period of time is no more than two hours.

6. a kind of customer churn prediction meanss characterized by comprising

Training unit, for being trained to sample vector to obtain the first prediction model, wherein the sample vector includes more A feature samples, each feature samples include the first training characteristics and user tag in the multiple feature samples, and described first Training characteristics are the feature extracted in the initial data of pre-set user, and the initial data includes representation data and operation target Using when behavioral data；The user tag logins the target application apart from this for describing the pre-set user next time It once logins the time of the target application, first prediction model is used to carry out the importance of the multiple feature samples Ranking；

Acquiring unit, the importance for generating multiple feature samples in the sample vector according to first prediction model are arranged Name, and the cross feature of preceding k feature samples in the importance ranking is obtained, the cross feature is the preceding k feature Sample performs mathematical calculations obtained feature；

Predicting unit, for being mentioned in the initial data in the preset period of time for logining the target application from user to be predicted Take the second training characteristics, second training characteristics be input to updated first prediction model, with predict it is described to Prediction user logins the time that the target application distance this time logins the target application next time.

7. device according to claim 6, which is characterized in that the training unit includes:

Subelement is obtained, for obtaining sample vector；

Training subelement, for generating training set according to the sample vector, and the training training set is to obtain the first prediction Model；It wherein, include multiple feature samples in the training set, each feature samples in the multiple feature samples are described Feature samples in sample vector.

8. device according to claim 7, which is characterized in that the sample vector includes positive sample and negative sample, described Positive sample be the multiple feature samples in include preset field sample, the negative sample be the multiple feature samples in not Sample comprising the preset field；The acquisition subelement further include:

Sampling unit, if the ratio for the positive sample and the negative sample is more than preset range, to the negative sample into Row down-sampling, so that the ratio of the positive sample and the negative sample in the training set is within preset range.

9. according to the described in any item devices of claim 6-8, which is characterized in that the acquiring unit further include:

Computing unit, for according to first prediction model predict as a result, calculating the accuracy of the multiple feature samples And recall rate, the accuracy of each feature samples in the importance ranking is greater than preset threshold, and recall rate is bigger, in institute Before stating ranking in importance ranking more.

10. a kind of computer readable storage medium, which is characterized in that the computer storage medium is stored with program instruction, institute Stating program instruction when being executed by a processor makes the processor execute the method according to claim 1 to 5.