CN109255480A

CN109255480A - Between servant lead prediction technique, device, computer equipment and storage medium

Info

Publication number: CN109255480A
Application number: CN201811001657.4A
Authority: CN
Inventors: 刘聪
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2018-08-30
Filing date: 2018-08-30
Publication date: 2019-01-22

Abstract

This application discloses an inter-species servants to lead prediction technique, device, computer equipment and storage medium.This method comprises: obtaining the history performance data of multiple employees, data cleansing is carried out to history performance data, obtains target data；Servant among target data is led into data that field does not lack as training set, training set is inputted Random Forest model function, correspond to the Random Forest model that obtains leading for servant and predict by the data that servant among target data is led field missing as test set；Test set is inputted into Random Forest model, obtains in test set servant between each employee and lead corresponding servant of field leading value.This method, as training set, is inputted the training of Random Forest model function and obtains Random Forest model, predicted value accuracy rate is high, will not generate over-fitting using the history performance data of multiple employees after cleaning.

Description

Between servant lead prediction technique, device, computer equipment and storage medium

Technical field

This application involves technical field of data processing more particularly to an inter-species servant to lead prediction technique, device, computer equipment And storage medium.

Background technique

Currently, common parameter is direct commission when calculating to the enterprise staff performance in insurance industry, hire indirectly Gold etc..Currently in order to relatively reasonable indirect commission ratio (servant leads between abbreviation) is arranged to analyze its shadow to operation cost of enterprises It rings, general to be analyzed and predicted using with reference to history month data, accuracy is lower.And a servant lead analyzing influence condition compared with It is more, manually it is difficult effectively to make accurate judgement in conjunction with each condition.

Summary of the invention

This application provides an inter-species servants to lead prediction technique, device, computer equipment and storage medium, it is intended to solve existing Commission ratio is analyzed and predicted using with reference to history month data indirectly eventually in technology, the lower problem of accuracy.

This application provides an inter-species servants to lead prediction technique comprising:

The history performance data for obtaining multiple employees carries out data cleansing to the history performance data, obtains number of targets According to；Wherein, servant leads field between including in the history performance data of each employee and at least one to a servant leads relevant associated characters Section, the numerical value of included associate field is the numerical value of completion in target data；

Servant leads data that field does not lack as training set between selecting in target data, and servant among target data is led word The data of section missing input Random Forest model function as test set, by training set, corresponding to obtain leading prediction for servant Random Forest model；

Test set is inputted into Random Forest model, obtains in test set servant between each employee and lead corresponding servant of field leading Value.

This application provides an inter-species servants to lead prediction meanss comprising:

Data cleansing unit counts the history performance data for obtaining the history performance data of multiple employees According to cleaning, target data is obtained；Wherein, between including in the history performance data of each employee servant lead field and at least one with Servant leads relevant associate field, and the numerical value of included associate field is the numerical value of completion in target data；

Model acquiring unit leads data that field does not lack as training set for servant between selecting in target data, will Training set is inputted Random Forest model function, to deserved as test set by the data that servant leads field missing among target data Servant leads the Random Forest model of prediction between being used for；

Predicted value acquiring unit obtains in test set between each employee for test set to be inputted Random Forest model Servant leads corresponding servant of field and leads value.

The application provides a kind of computer equipment again, including memory, processor and is stored on the memory simultaneously The computer program that can be run on the processor, the processor realize that the application provides when executing the computer program Described in any item servants lead prediction technique.

Present invention also provides a kind of storage mediums, wherein the storage medium is stored with computer program, the calculating Machine program includes program instruction, and described program instruction makes the processor execute provided by the present application when being executed by a processor Servant leads prediction technique between described in one.

The application provides an inter-species servant and leads prediction technique, device, computer equipment and storage medium.This method passes through acquisition The history performance data of multiple employees carries out data cleansing to the history performance data, obtains target data；Wherein, each Servant leads field between including in the history performance data of employee and at least one leads relevant associate field to a servant, in target data The numerical value of included associate field is the numerical value of completion；The data that servant leads that field does not lack between selecting in target data are made For training set, training set is inputted Random Forest model as test set by the data that servant among target data is led field missing Function, the corresponding Random Forest model for obtaining leading prediction for servant；Test set is inputted into Random Forest model, obtains test set In between each employee servant lead corresponding servant of field and lead value.This method uses the history performance data of multiple employees after cleaning As training set, Random Forest model function is inputted, obtains the Random Forest model for leading prediction for servant, predicted value accuracy rate Height will not generate over-fitting.

Detailed description of the invention

Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in embodiment description Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is some embodiments of the present application, general for this field For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.

Fig. 1 is the schematic flow diagram that inter-species servant provided by the embodiments of the present application leads prediction technique；

Fig. 2 is the sub-process schematic diagram that inter-species servant provided by the embodiments of the present application leads prediction technique；

Fig. 3 is another schematic flow diagram that inter-species servant provided by the embodiments of the present application leads prediction technique；

Fig. 4 is another sub-process schematic diagram that inter-species servant provided by the embodiments of the present application leads prediction technique；

Fig. 5 is another sub-process schematic diagram that inter-species servant provided by the embodiments of the present application leads prediction technique；

Fig. 6 is the schematic block diagram that inter-species servant provided by the embodiments of the present application leads prediction meanss；

Fig. 7 is the subelement schematic block diagram that inter-species servant provided by the embodiments of the present application leads prediction meanss；

Fig. 8 is another schematic block diagram that inter-species servant provided by the embodiments of the present application leads prediction meanss；

Fig. 9 is another subelement schematic block diagram that inter-species servant provided by the embodiments of the present application leads prediction meanss；

Figure 10 is another subelement schematic block diagram that inter-species servant provided by the embodiments of the present application leads prediction meanss；

Figure 11 is a kind of schematic block diagram of computer equipment provided by the embodiments of the present application.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiment is some embodiments of the present application, instead of all the embodiments.Based on this Shen Please in embodiment, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall in the protection scope of this application.

It should be appreciated that ought use in this specification and in the appended claims, term " includes " and "comprising" instruction Described feature, entirety, step, operation, the presence of element and/or component, but one or more of the other feature, whole is not precluded Body, step, operation, the presence or addition of element, component and/or its set.

It is also understood that mesh of the term used in this present specification merely for the sake of description specific embodiment And be not intended to limit the application.As present specification and it is used in the attached claims, unless on Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.

It will be further appreciated that the term "and/or" used in present specification and the appended claims is Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.

Referring to Fig. 1, Fig. 1 is the schematic flow diagram that inter-species servant provided by the embodiments of the present application leads prediction technique.The party Method is applied in the terminals such as desktop computer, laptop computer, tablet computer, also can be applied in server.As shown in Figure 1, should Method includes step S101~S103.

S101, the history performance data for obtaining multiple employees carry out data cleansing to the history performance data, obtain mesh Mark data；Wherein, servant leads field between including in the history performance data of each employee and at least one to a servant leads relevant pass Join field, the numerical value of included associate field is the numerical value of completion in target data.

In the present embodiment, the server end of the performance data of employee is stored with first by the history performance data of multiple employees It imports in a specified data table, then obtains the history performance data of multiple employees from the data form.Wherein, employee Every row is a training examples (i.e. employee) in history performance data, and each column is the feature of the sample, it can be understood as each column pair Answer a feature field.For example, the training examples of every a line have following field:

Employee's work number ID；

Name；

Gender: male=male, female=women；

Age；

Lineal relative's total number of persons in enterprise；

Collaterals' total number of persons in enterprise；

Alumnus's total number of persons in enterprise；

Wage；

Title and rank；

Direct commission total value；

Direct commission rate；

Between servant lead；

Wherein, a servant lead it is corresponding be between servant lead field, it is employee's work number ID, name, gender, the age, straight in enterprise Be relatives' total number of persons, collaterals' total number of persons in enterprise, alumnus's total number of persons in enterprise, wage, title and rank, directly Corresponding commission total value, direct commission rate are to lead relevant associate field to a servant.

In the history performance data for obtaining above-mentioned multiple employees, data cleansing is carried out to the history performance data, is obtained To after target data, servant leads the corresponding missing values of field due to being to need the value predicted so there is no need to completions, and lead phase with a servant The associate field of pass then needs to carry out completion during data cleansing, to meet the data requirements of prediction process.I.e. multiple members The history performance data of work can be considered untreated primary data comprising a servant leads field and a servant leads relevant associated characters Section, and there may be unassignable situations for the associate field in these history performance datas, it is necessary to pass through the side of data cleansing Formula is associated the completion of the numerical value of field.

In one embodiment, as shown in Fig. 2, step S101 includes:

S1011, the history performance data of each employee in the history performance data of multiple employees is subjected to integrality inspection It looks into, if there are missing values, the average values pair of the field according to corresponding to missing values for the associate field in the history performance data of employee Missing values carry out completion, obtain partial data；

Associate field and servant lead the related coefficient between field in S1012, acquisition partial data, retain phase relation numerical digit Associate field before default rank value, data after being cleared up for the first time；

S1013, the partial velocities for obtaining data after first cleaning, the corresponding skewness value of field in data after clearing up for the first time Field beyond the preset coefficient of skew carries out logarithm operation, obtains target data.

In the present embodiment, integrity checking is carried out to the history performance data of each employee, is because predicting The method for not allowing that there are missing values in journey, therefore needing to fill by average value carries out completion to missing values, obtains partial data.

Assuming that there is the data of 100 employees, wherein 10 lineal relative's total numbers of persons lacked in enterprise, 20 lack Alumnus's total number of persons in enterprise, 7 missing title and ranks；At this point, user can be allowed to supplement by way of issuing prompt, or It is that average value is filled automatically.Namely in the data of above-mentioned missing, missing values can be carried out according to the average value of the field Supplement, to ensure, the data of completion do not influence subsequent analysis and operation.

It obtains associate field and servant in partial data and leads the related coefficient between field, such as get direct commission rate The related coefficient that field and servant lead field is 0.8, and the alumnus's total number of persons field and a servant got in enterprise leads field Related coefficient is 0.7, and above-mentioned two field and servant lead the related coefficient ranking front two between field, if default rank value is 3, then can by partial data in addition to direct commission rate field, alumnus's total number of persons field in enterprise and a servant lead field Except all fields delete, data after being cleared up for the first time.

I.e. there are the corresponding skewness values of field to have exceeded the preset coefficient of skew in data after first cleaning, then to the word The corresponding each numerical value of section carries out taking logarithm operation, to reduce the skewness value of the field.Such as corresponding numerical value of the field is x, then Adjusted value after carrying out logarithm operation is lnx, i.e., takes logarithm the bottom of by of e, after above-mentioned adjustment, the data that can be can be used for The foundation of subsequent Random Forest model.

S102, servant among target data is led into data that field does not lack as training set, target data centre servant is led The data of field missing input Random Forest model function as test set, by training set, correspond to servant between being used for and lead prediction Random Forest model.

In the present embodiment, if between being directed to servant lead the missing values in field using average value or random writing method into Row supplement, it is not high to will lead to its accuracy rate, and can have the case where overfitting, generated servant is caused to lead data application When operation cost of enterprises analysis, practical value is low.Make when leading the data that field does not lack using servant between selecting in target data For training set, training set is inputted into Random Forest model function, the corresponding Random Forest model for obtaining leading prediction for servant.

For example, servant between middle selection, which is led the data that field does not lack, inputs cforest () function, cforest () function is adopted With Random Forest model, it may be assumed that

Model <-cforest alumnus's the total number of persons of direct commission rate+in enterprise (servant leads~).

By above-mentioned training process, the Random Forest model that prediction is led for servant can be obtained.

In one embodiment, as shown in figure 3, after step S102 further include:

S102a, the data of corresponding amount are randomly choosed as verifying collection in training set according to preset extraction ratio；

S102b, verifying collection is input to Random Forest model progress model verifying, if the verifying of Random Forest model is correct Rate exceeds preset accuracy threshold value, saves the Random Forest model.

In the present embodiment, it in order to verify the order of accuarcy of Random Forest model, will can be chosen at random again in training set Select the data of corresponding amount as verifying collection, if obtained verification result is the verifying accuracy of Random Forest model beyond preset Accuracy threshold value (preset accuracy threshold value is 80%), then save the Random Forest model as the subsequent prediction mould used Type.

In one embodiment, as shown in figure 4, step S102 includes:

S1021, it is concentrated with the sample set for randomly selecting the first quantity put back to from training, according to the first number of sample set building The post-class processing of amount；

S1022, each post-class processing is trained according to bagging method, obtains multiple decision trees, and by decision Tree combination obtains the Random Forest model that prediction is led for servant.

In the present embodiment, Bagging method is to obtain in ensemble methods (i.e. integrated approach) for training An important ring for the data of base estimator (basic estimator).As its name, Bagging method is exactly by all training Data are put into the bag (can image be interpreted as a flight data recorder or black-envelope is wrapped up in) of a black, and black means to can't see the inside Data details, only know the inside have data set.Then a part of data are taken out at random from this bag to be used to instruct out Practice a base estimator.The data being extracted into be finished after there are two types of selection, put back to or do not put back to.Bagging technology can Effectively to reduce variance, that is, reduce over-fitting degree.

By bagging technology and decision tree, random forest is obtained.Using decision tree as base estimator (base Plinth estimator), a lot of small decision trees of bagging technique drill are then used, finally these small decision trees combine, this Sample has just obtained a piece of forest (random forest).

More specifically, the process for obtaining Random Forest model by raw sample data training is as follows:

1) it concentrates from original training data, is put using bootstrap method (resampling technique in statistics in fact) It randomly selects k new self-service sample sets with returning, and thus constructs k post-class processing, the sample composition not being pumped to every time The outer data of K bag (out-of-bag is abbreviated as BBB)；

2) it is equipped with n feature, then randomly selects mtry feature at each node of every one tree, it is each by calculating The information content that feature contains, the feature of the most classification capacity of selection one carries out node split in feature；

3) each tree is grown to the maximum extent, does not do any cut out；

4) more trees of generation are formed into random forest, is classified with random forest to new data, classification results are pressed Depending on Tree Classifier ballot is how many.

Random forest as its name suggests, is to establish a forest with random manner, has many decision tree groups inside forest At being not associated between each decision tree of random forest.After obtaining forest, when there is a new input sample Into when, just allow each decision tree in forest once to be judged respectively, look at which this sample should belong to Class (for sorting algorithm) then looks at which kind of at most, just predicts that this sample is that is a kind of by selection.

During establishing each decision tree, there is two o'clock to need to pay attention to-sample and fully nonlinear water wave.Be first two with The process of machine sampling, random forest will carry out the sampling of row, column to the data of input.Row is sampled, using putting back to Mode, that is, in the obtained sample set of sampling, may there is duplicate sample.Assuming that input sample be it is N number of, then adopting The sample of sample is also N number of.Make when training in this way, the sample that the input sample of every one tree is all not all of, so that It is opposite to be not easy over-fitting occur.Then column sampling is carried out, from M feature, selects m (m < < M).Later It is that decision tree is established out using the mode of fully nonlinear water wave to the data after sampling, some leaf node of such decision tree is wanted It is the same classification being all directed to that can not continue all samples of division or the inside.General many decision trees All one important step-beta pruning of algorithm, but it is not dry so here, since the process of two stochastical samplings before ensure that Randomness, even if so over-fitting, will not occur in not beta pruning.

S103, test set is inputted into Random Forest model, obtaining in test set servant between each employee, to lead field corresponding Between servant lead value.

In the present embodiment, test set is inputted into Random Forest model, can be obtained in test set and is hired between each employee Corresponding servant of rate field leads value, and servant between each obtain is led to corresponding filling to the deletion sites corresponding to it, with complete The prediction led at servant.

In one embodiment, as shown in figure 5, step S103 includes:

S1031, the operation function between field is led according to Random Forest model acquisition associate field and servant；

S1032, the associate field respective value of employee each in test set is inputted into the operation function, obtained in test set Servant leads corresponding servant of field and leads value between each employee.

In the present embodiment, by test set input Random Forest model be trained after, can be obtained associate field with Between servant lead the operation function between field, such as (servant leads=school of the direct commission rate+10* of 1.1* in enterprise to linear function Friendly total number of persons/enterprise's total number of persons etc.), then by the way that the associate field respective value of employee each in test set is inputted the operation letter Number, obtains in test set servant between each employee and leads corresponding servant of field leading value, the accurate prediction process that servant leads between completing, Avoid the overfitting of prediction data.

As it can be seen that this method is using the history performance data of multiple employees after cleaning as training set, input random forest Pattern function obtains the Random Forest model that prediction is led for servant, and predicted value accuracy rate is high, will not generate over-fitting.

The embodiment of the present application also provides an inter-species servant and leads prediction meanss, this servant leads prediction meanss for executing aforementioned servant Any embodiment of rate prediction technique.Specifically, referring to Fig. 6, Fig. 6 is that inter-species servant provided by the embodiments of the present application leads prediction The schematic block diagram of device.Between servant lead prediction meanss 100 can be configured at desktop computer, tablet computer, laptop computer, etc. terminals In, it can also be configured in server.

As shown in fig. 6, it includes data cleansing unit 101, model acquiring unit 102, predicted value that a servant, which leads prediction meanss 100, Acquiring unit 103.

Data cleansing unit 101 carries out the history performance data for obtaining the history performance data of multiple employees Data cleansing obtains target data；Wherein, between including in the history performance data of each employee servant lead field and at least one with Between servant lead relevant associate field, the numerical value of included associate field is the numerical value of completion in target data.

Employee's work number ID；

Name；

Gender: male=male, female=women；

Age；

Lineal relative's total number of persons in enterprise；

Collaterals' total number of persons in enterprise；

Alumnus's total number of persons in enterprise；

Wage；

Title and rank；

Direct commission total value；

Direct commission rate；

Between servant lead；

In one embodiment, as shown in fig. 7, data cleansing unit 101 includes:

Missing values supplementary units 1011, the history performance number for each employee in the history performance data by multiple employees According to integrity checking is carried out, if there are missing values for the associate field in the history performance data of employee, according to missing values, institute is right It answers the average value of field to carry out completion to missing values, obtains partial data；

Correlation judging unit 1012 leads phase relation between field for obtaining associate field and servant in partial data Number retains the associate field that related coefficient is located at before default rank value, data after being cleared up for the first time；

Skewness computing unit 1013, for obtaining the partial velocities of data after first cleaning, after clearing up for the first time in data The corresponding skewness value of field carries out logarithm operation beyond the field of the preset coefficient of skew, obtains target data.

Model acquiring unit 102, for servant among target data to be led data that field does not lack as training set, by mesh Training set is inputted Random Forest model function, correspondence obtains as test set by the data that servant leads field missing among mark data The Random Forest model of prediction is led for servant.

In one embodiment, as shown in figure 8, a servant leads prediction meanss 100 further include:

Verifying collection selection unit 102a, for randomly choosing the number of corresponding amount in training set according to preset extraction ratio Collect according to as verifying；

Model authentication unit 102b is input to Random Forest model progress model verifying for that will verify collection, if random gloomy The verifying accuracy of woods model exceeds preset accuracy threshold value, saves the Random Forest model.

In one embodiment, as shown in figure 9, model acquiring unit 102 includes:

Post-class processing acquiring unit 1021, for being concentrated with the sample for randomly selecting the first quantity put back to from training Collection constructs the post-class processing of the first quantity according to sample set；

Decision tree assembled unit 1022 obtains more for each post-class processing to be trained according to bagging method A decision tree, and combine decision tree be used between servant lead the Random Forest model of prediction.

Predicted value acquiring unit 103 obtains each employee in test set for test set to be inputted Random Forest model Between servant lead corresponding servant of field and lead value.

In one embodiment, as shown in Figure 10, predicted value acquiring unit 103 includes:

Operation function acquiring unit 1031, for predicted value acquiring unit according to Random Forest model obtain associate field with Between servant lead the operation function between field；

Predictor calculation unit 1032, for the associate field respective value of employee each in test set to be inputted the operation Function, obtains in test set servant between each employee and leads corresponding servant of field leading value.

As it can be seen that the device is using the history performance data of multiple employees after cleaning as training set, input random forest Pattern function obtains the Random Forest model that prediction is led for servant, and predicted value accuracy rate is high, will not generate over-fitting.

Above-mentioned servant, which leads prediction meanss, can be implemented as a kind of form of computer program, which can be such as It is run in computer equipment shown in Figure 11.

Figure 11 is please referred to, Figure 11 is a kind of schematic block diagram of computer equipment provided by the embodiments of the present application.The calculating 500 equipment of machine equipment can be terminal, be also possible to server.The terminal can be tablet computer, laptop, desktop The electronic equipments such as brain, personal digital assistant.

Refering to fig. 11, which includes processor 502, memory and the net connected by system bus 501 Network interface 505, wherein memory may include non-volatile memory medium 503 and built-in storage 504.

The non-volatile memory medium 503 can storage program area 5031 and computer program 5032.The computer program 5032 include program instruction, which is performed, and processor 502 may make to execute an inter-species servant and lead prediction technique.

The processor 502 supports the operation of entire computer equipment 500 for providing calculating and control ability.

The built-in storage 504 provides environment for the operation of the computer program 5032 in non-volatile memory medium 503, should When computer program 5032 is executed by processor 502, processor 502 may make to execute an inter-species servant and lead prediction technique.

The network interface 505 such as sends the task dispatching of distribution for carrying out network communication.Those skilled in the art can manage It solves, structure shown in Figure 11, only the block diagram of part-structure relevant to application scheme, is not constituted to the application side The restriction for the computer equipment 500 that case is applied thereon, specific computer equipment 500 may include more than as shown in the figure Or less component, perhaps combine certain components or with different component layouts.

Wherein, the processor 502 is for running computer program 5032 stored in memory, to realize following function Can: the history performance data of multiple employees is obtained, data cleansing is carried out to the history performance data, obtains target data；Its In, servant leads field between including in the history performance data of each employee and at least one to a servant leads relevant associate field, mesh The numerical value for marking associate field included in data is the numerical value of completion；Servant among target data is led into the number that field does not lack According to as training set, training set is inputted random forest as test set by the data that servant among target data is led field missing Pattern function, the corresponding Random Forest model for obtaining leading prediction for servant；Test set is inputted into Random Forest model, is surveyed Servant leads corresponding servant of field and leads value between each employee of examination concentration.

In one embodiment, processor 502 also performs the following operations: by every a member in the history performance data of multiple employees The history performance data of work carries out integrity checking, if the associate field in the history performance data of employee there are missing values, The average value of the field according to corresponding to missing values carries out completion to missing values, obtains partial data；It obtains and is associated in partial data Field and servant lead the related coefficient between field, retain the associate field that related coefficient is located at before default rank value, obtain Data after first cleaning；Obtain the partial velocities of data after clearing up for the first time, the corresponding skewness of field in data after clearing up for the first time Value carries out logarithm operation beyond the field of the preset coefficient of skew, obtains target data.

In one embodiment, processor 502 also performs the following operations: random in training set according to preset extraction ratio The data of corresponding amount are selected to collect as verifying；Verifying collection is input to Random Forest model and carries out model verifying, if random forest The verifying accuracy of model exceeds preset accuracy threshold value, saves the Random Forest model.

In one embodiment, processor 502 also performs the following operations: from training be concentrated with put back to randomly select first number The sample set of amount constructs the post-class processing of the first quantity according to sample set；By each post-class processing according to bagging method Be trained, obtain multiple decision trees, and combine decision tree be used between servant lead the Random Forest model of prediction.

In one embodiment, processor 502 also performs the following operations: according to Random Forest model obtain associate field and Servant leads the operation function between field；The associate field respective value of employee each in test set is inputted into the operation function, is obtained Into test set, servant leads corresponding servant of field and leads value between each employee.

It will be understood by those skilled in the art that the embodiment of computer equipment shown in Figure 11 is not constituted to computer The restriction of equipment specific composition, in other embodiments, computer equipment may include components more more or fewer than diagram, or Person combines certain components or different component layouts.For example, in some embodiments, computer equipment can only include depositing Reservoir and processor, in such embodiments, the structure and function of memory and processor are consistent with embodiment illustrated in fig. 11, Details are not described herein.

It should be appreciated that in the embodiment of the present application, processor 502 can be central processing unit (Central Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable GateArray, FPGA) or other programmable logic devices Part, discrete gate or transistor logic, discrete hardware components etc..Wherein, general processor can be microprocessor or The processor is also possible to any conventional processor etc..

A kind of storage medium is provided in another embodiment of the application.The storage medium can be computer-readable storage Medium.The storage medium is stored with computer program, and wherein computer program includes program instruction.The program instruction is by processor It is realized when execution: obtaining the history performance data of multiple employees, data cleansing is carried out to the history performance data, obtains target Data；Wherein, servant leads field between including in the history performance data of each employee and at least one to a servant leads relevant association Field, the numerical value of included associate field is the numerical value of completion in target data；Servant among target data is led into field not As training set, the data that servant among target data is led field missing input training set as test set the data of missing Random Forest model function, the corresponding Random Forest model for obtaining leading prediction for servant；Test set is inputted into random forest mould Type, obtains in test set servant between each employee and leads corresponding servant of field leading value.

In one embodiment, realization when which is executed by processor: will be in the history performance data of multiple employees The history performance data of each employee carries out integrity checking, lacks if the associate field in the history performance data of employee exists The average value of mistake value, the field according to corresponding to missing values carries out completion to missing values, obtains partial data；It obtains in partial data Associate field and servant lead the related coefficient between field, retain the associated characters that related coefficient is located at before default rank value Section, data after being cleared up for the first time；The partial velocities for obtaining data after clearing up for the first time, field is corresponding in data after clearing up for the first time Skewness value beyond the preset coefficient of skew field carry out logarithm operation, obtain target data.

In one embodiment, realization when which is executed by processor: randomly selecting of putting back to is concentrated with from training The sample set of first quantity constructs the post-class processing of the first quantity according to sample set；By each post-class processing according to Bagging method is trained, and obtains multiple decision trees, and combine decision tree be used between servant lead the random forest of prediction Model.

In one embodiment, associated characters realization when which is executed by processor: are obtained according to Random Forest model Section and servant lead the operation function between field；The associate field respective value of employee each in test set is inputted into the operation letter Number, obtains in test set servant between each employee and leads corresponding servant of field leading value.

In one embodiment, realization when which is executed by processor: if the data sending terminal terminating communication number According to transmission exceed preset time threshold, the shared drive is discharged.

The storage medium can be the internal storage unit of aforementioned device, such as the hard disk or memory of equipment.It is described to deposit Storage media is also possible to the plug-in type hard disk being equipped on the External memory equipment of the equipment, such as the equipment, intelligent storage Block (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc.. Further, the storage medium can also both including the equipment internal storage unit and also including External memory equipment.

It is apparent to those skilled in the art that for convenience of description and succinctly, foregoing description is set The specific work process of standby, device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein. Those of ordinary skill in the art may be aware that unit described in conjunction with the examples disclosed in the embodiments of the present disclosure and algorithm Step can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and software Interchangeability generally describes each exemplary composition and step according to function in the above description.These functions are studied carefully Unexpectedly the specific application and design constraint depending on technical solution are implemented in hardware or software.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed The scope of the present invention.

In several embodiments provided herein, it should be understood that disclosed unit and method, it can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, can also will have identical function The unit set of energy can be combined or can be integrated into another system at a unit, such as multiple units or components, or Some features can be ignored or not executed.In addition, shown or discussed mutual coupling or direct-coupling or communication link Connect can be through some interfaces, the indirect coupling or communication connection of device or unit, be also possible to electricity, it is mechanical or other Form connection.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.Some or all of unit therein can be selected to realize the embodiment of the present invention according to the actual needs Purpose.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, is also possible to two or more units and is integrated in one unit.It is above-mentioned integrated Unit both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in one storage medium.Based on this understanding, technical solution of the present invention is substantially in other words to existing The all or part of part or the technical solution that technology contributes can be embodied in the form of software products, should Computer software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be Personal computer, server or network equipment etc.) execute all or part of step of each embodiment the method for the present invention Suddenly.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), magnetic disk or The various media that can store program code such as person's CD.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right It is required that protection scope subject to.

Claims

1. an inter-species servant leads prediction technique characterized by comprising

The history performance data for obtaining multiple employees carries out data cleansing to the history performance data, obtains target data；Its In, servant leads field between including in the history performance data of each employee and at least one to a servant leads relevant associate field, mesh The numerical value for marking associate field included in data is the numerical value of completion；

Servant among target data is led into data that field does not lack as training set, servant among target data is led what field lacked Data input Random Forest model function, the corresponding random forest for obtaining leading prediction for servant as test set, by training set Model；

2. according to claim 1 servant leads prediction technique, which is characterized in that described to be carried out to the history performance data Data cleansing obtains target data, comprising:

The history performance data of each employee in the history performance data of multiple employees is subjected to integrity checking, if employee For associate field in history performance data there are missing values, the average value of the field according to corresponding to missing values mends missing values Entirely, partial data is obtained；

It obtains associate field and servant in partial data and leads the related coefficient between field, retain related coefficient and be located at default ranking Associate field before value, data after being cleared up for the first time；

The partial velocities for obtaining data after first cleaning, after clearing up for the first time in data the corresponding skewness value of field beyond preset The field of the coefficient of skew carries out logarithm operation, obtains target data.

3. according to claim 1 servant leads prediction technique, which is characterized in that described that training set is inputted random forest mould Type function, it is corresponding obtain leading the Random Forest model of prediction for servant after, further includes:

The data of corresponding amount are randomly choosed in training set as verifying collection according to preset extraction ratio；

Verifying collection is input to Random Forest model and carries out model verifying, if the verifying accuracy of Random Forest model is beyond default Accuracy threshold value, save the Random Forest model.

4. according to claim 1 servant leads prediction technique, which is characterized in that described that training set is inputted random forest mould Type function, the corresponding Random Forest model for obtaining leading prediction for servant, comprising:

It is concentrated with the sample set for randomly selecting the first quantity put back to from training, is returned according to the classification that sample set constructs the first quantity Gui Shu；

Each post-class processing is trained according to bagging method, obtains multiple decision trees, and decision tree is combined to obtain The Random Forest model of prediction is led for servant.

5. according to claim 1 servant leads prediction technique, which is characterized in that described that test set is inputted random forest mould Type, obtains in test set servant between each employee and leads corresponding servant of field leading value, comprising:

Associate field is obtained according to Random Forest model and servant leads the operation function between field；

The associate field respective value of employee each in test set is inputted into the operation function, obtains each employee in test set Between servant lead corresponding servant of field and lead value.

6. an inter-species servant leads prediction meanss characterized by comprising

It is clear to carry out data to the history performance data for obtaining the history performance data of multiple employees for data cleansing unit It washes, obtains target data；Wherein, servant leads field between including in the history performance data of each employee and at least one leads with a servant Relevant associate field, the numerical value of included associate field is the numerical value of completion in target data；

Model acquiring unit leads data that field does not lack as training set, by target for servant between selecting in target data Training set is inputted Random Forest model function, correspondence is used as test set by the data that servant leads field missing among data The Random Forest model of prediction is led in servant；

Predicted value acquiring unit obtains in test set that servant leads between each employee for test set to be inputted Random Forest model Corresponding servant of field leads value.

7. according to claim 6 servant leads prediction meanss, which is characterized in that the data cleansing unit, comprising:

Missing values supplementary units, the history performance data for each employee in the history performance data by multiple employees carry out Integrity checking, if leading relevant associate field to a servant there are missing values in the history performance data of employee, according to missing The average value for being worth corresponding field carries out completion to missing values, obtains partial data；

Correlation judging unit leads related coefficient between field for obtaining associate field and servant in partial data, retains Related coefficient be located at default rank value before in a servant lead relevant associate field, data after being cleared up for the first time；

Skewness computing unit, for obtaining the partial velocities of data after first cleaning, field is corresponding in data after clearing up for the first time Skewness value beyond the preset coefficient of skew field carry out logarithm operation, obtain target data.

8. according to claim 6 servant leads prediction meanss, which is characterized in that the model acquiring unit, further includes:

Verifying collection selection unit, the data for randomly choosing corresponding amount in training set according to preset extraction ratio, which are used as, to be tested Card collection；

Model authentication unit is input to Random Forest model progress model verifying for that will verify collection, if Random Forest model It verifies accuracy and exceeds preset accuracy threshold value, save the Random Forest model.

9. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor The computer program of operation, which is characterized in that the processor is realized when executing the computer program as in claim 1-5 Described in any item servants lead prediction technique.

10. a kind of storage medium, which is characterized in that the storage medium is stored with computer program, the computer program packet Program instruction is included, described program instruction executes the processor such as any one of claim 1-5 institute Servant leads prediction technique between stating.