CN110837847A

CN110837847A - User classification method and device, storage medium and server

Info

Publication number: CN110837847A
Application number: CN201910967144.7A
Authority: CN
Inventors: 赵毅仁; 胡宏辉
Original assignee: Shanghai Lake Information Technology Co Ltd
Current assignee: Shanghai Lake Information Technology Co Ltd
Priority date: 2019-10-12
Filing date: 2019-10-12
Publication date: 2020-02-25

Abstract

A user classification method and device, a storage medium and a server are provided, and the method comprises the following steps: determining a group of sample data, wherein each sample data comprises a user characteristic vector and a self-conversion value associated with the user characteristic vector, and the self-conversion value is used for indicating whether a user actively executes preset operation or not; for each sample data in a group of sample data, mapping a user characteristic vector of the sample data into an updated user characteristic vector, wherein the dimensionality of the updated user characteristic vector is greater than the dimensionality of the user characteristic vector; training to obtain a user transformation model by using the updated user feature vector and the associated self-transformation value thereof, wherein the user transformation model is used for calculating the probability of executing the preset operation by the user; and determining the category of the user to which the data to be detected belongs based on the user feature vector of the data to be detected and the user conversion model, wherein the category comprises an active conversion type and a passive conversion type. The technical scheme of the invention can improve the conversion rate of the precipitation users in the marketing scene.

Description

User classification method and device, storage medium and server

Technical Field

The invention relates to the technical field of big data processing, in particular to a user classification method and device, a storage medium and a server.

Background

Typically, subsequent loan application processes may be spontaneous after the user registers with the financial institution platform. But more users may choose not to perform subsequent operations, which becomes precipitating users. In order to improve the business conversion rate, a plurality of financial institutions have manual professionals to carry out telemarketing, and the aim is to improve the conversion rate of precipitation users.

In the prior art, in order to improve efficiency, a marketing model can be developed based on historical data. The purpose of the marketing model is to determine the probability of automatic conversion by the user. A common algorithm for constructing the marketing model is a logistic regression algorithm, the automatic forwarding probability of the users is calculated according to the logistic regression model, and a manual marketing specialist can only perform marketing aiming at the users with low automatic conversion rate, so that the conversion rate of the users precipitating in the link is greatly improved.

However, in the marketing model development process, because the feature dimensions of the users acquired in this step are small, the effect is not good when a complex algorithm such as a neural network algorithm is tried, and the existing marketing model algorithm also has a great promotion space.

Disclosure of Invention

The technical problem solved by the invention is how to improve the accuracy of the user classification model so as to classify the user more accurately.

To solve the foregoing technical problem, an embodiment of the present invention provides a user classification method, including: determining a group of sample data, wherein each sample data comprises a user feature vector and a self-conversion value associated with the user feature vector, and the self-conversion value is used for indicating whether a user actively executes a preset operation; for each sample data in the set of sample data, mapping a user feature vector of the sample data into an updated user feature vector, wherein the dimensionality of the updated user feature vector is greater than the dimensionality of the user feature vector; training to obtain a user transformation model by using the updated user feature vector and the associated self-transformation value thereof, wherein the user transformation model is used for calculating the probability of executing the preset operation by the user; determining the category of the user to which the data to be detected belongs based on the user feature vector of the data to be detected and the user conversion model, wherein the category comprises an active conversion type and a passive conversion type.

Optionally, the determining, based on the user feature vector of the data to be detected and the user transformation model, the category of the user to which the data to be detected belongs includes: calculating to obtain the probability of executing the preset operation by the user to which the data to be detected belongs based on the user feature vector of the data to be detected and the user transformation model; and when the probability is smaller than a preset probability, determining the category of the user to which the data to be detected belongs as the active conversion type, otherwise, determining the category of the user to which the data to be detected belongs as the passive conversion type.

Optionally, the mapping the user feature vector of the sample data into an updated user feature vector includes: determining the number of decision trees in a random forest algorithm; mapping the user characteristic vectors of the sample data into the number of mapped user characteristic vectors by utilizing a decision tree algorithm; and forming the updated user characteristic vector by using a random forest algorithm and the mapped user characteristic vector.

Optionally, the training to obtain the user transformation model by using the updated user feature vector and the associated self-transformation value thereof includes: and training to obtain the user transformation model by adopting a neural network algorithm based on the updated user feature vector and the associated self-transformation value thereof.

Optionally, the user conversion model includes hidden layers, and the number of the hidden layers is less than or equal to 3.

Optionally, the neural network algorithm is a deep learning neural network algorithm.

To solve the foregoing technical problem, an embodiment of the present invention further provides a user classifying device, including: the device comprises a first determination module, a second determination module and a third determination module, wherein the first determination module is used for determining a group of sample data, each sample data comprises a user feature vector and a self-conversion value associated with the user feature vector, and the self-conversion value is used for indicating whether a user actively executes a preset operation or not; a mapping module, for each sample data in the set of sample data, mapping a user feature vector of the sample data into an updated user feature vector, wherein a dimension of the updated user feature vector is greater than a dimension of the user feature vector; the training module is used for training to obtain a user transformation model by utilizing the updated user feature vector and the associated self-transformation value thereof, and the user transformation model is used for calculating the probability of executing the preset operation by the user; and the second determining module is used for determining the category of the user to which the data to be detected belongs based on the user feature vector of the data to be detected and the user conversion model, wherein the category comprises an active conversion type and a passive conversion type.

Optionally, the second determining module includes: the calculation submodule is used for calculating and obtaining the probability of executing the preset operation by the user to which the data to be detected belongs based on the user characteristic vector of the data to be detected and the user conversion model; and the first determining submodule is used for determining the category of the user to which the data to be detected belongs as the active conversion type when the probability is smaller than a preset probability, and otherwise, determining the category of the user to which the data to be detected belongs as the passive conversion type.

To solve the above technical problem, an embodiment of the present invention further provides a storage medium having stored thereon computer instructions, where the computer instructions execute the steps of the above method when executed.

In order to solve the above technical problem, an embodiment of the present invention further provides a server, including a memory and a processor, where the memory stores computer instructions executable on the processor, and the processor executes the computer instructions to perform the steps of the above method.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides a user classification method, which comprises the following steps: determining a group of sample data, wherein each sample data comprises a user feature vector and a self-conversion value associated with the user feature vector, and the self-conversion value is used for indicating whether a user actively executes a preset operation; for each sample data in the set of sample data, mapping a user feature vector of the sample data into an updated user feature vector, wherein the dimensionality of the updated user feature vector is greater than the dimensionality of the user feature vector; training to obtain a user transformation model by using the updated user feature vector and the associated self-transformation value thereof, wherein the user transformation model is used for calculating the probability of executing the preset operation by the user; determining the category of the user to which the data to be detected belongs based on the user feature vector of the data to be detected and the user conversion model, wherein the category comprises an active conversion type and a passive conversion type. The embodiment of the invention improves the user characteristic vector of the sample data by mapping the user characteristic vector into the updated user characteristic vector with higher dimensionality, thereby being capable of training to obtain a user transformation model so as to better extract the complex interactive relation among data and improve the model discrimination.

Further, the mapping the user feature vector of the sample data into an updated user feature vector includes: determining the number of decision trees in a random forest algorithm; mapping the user characteristic vectors of the sample data into the number of mapped user characteristic vectors by utilizing a decision tree algorithm; and forming the updated user characteristic vector by using a random forest algorithm and the mapped user characteristic vector. According to the embodiment of the invention, the random forest algorithm is used for variable mapping, so that low-dimensional data can be mapped to a high-dimensional space, and a possibility is provided for obtaining a user transformation model with higher accuracy by adopting a model training method with higher complexity.

Further, the training to obtain the user transformation model by using the updated user feature vector and the associated self-transformation value thereof includes: and training to obtain the user transformation model by adopting a neural network algorithm based on the updated user feature vector and the associated self-transformation value thereof. According to the embodiment of the invention, high-dimensional data obtained by a random forest algorithm can be used as input data of a neural network algorithm, and a user transformation model is obtained by training the neural network algorithm.

Drawings

Fig. 1 is a flowchart illustrating a user classification method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an embodiment of step S102 shown in FIG. 1

FIG. 3 is a simplified schematic diagram of a random forest algorithm according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a perceptron structure of a neural network algorithm according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a user classification device according to an embodiment of the present invention.

Detailed Description

Those skilled in the art will appreciate that, as is the background, existing mechanisms have deficiencies and require further investigation to arrive at a more accurate user transformation model.

The embodiment of the invention provides a user classification method, which comprises the following steps: determining a group of sample data, wherein each sample data comprises a user feature vector and a self-conversion value associated with the user feature vector, and the self-conversion value is used for indicating whether a user actively executes a preset operation; for each sample data in the set of sample data, mapping a user feature vector of the sample data into an updated user feature vector, wherein the dimensionality of the updated user feature vector is greater than the dimensionality of the user feature vector; training to obtain a user transformation model by using the updated user feature vector and the associated self-transformation value thereof, wherein the user transformation model is used for calculating the probability of executing the preset operation by the user; determining the category of the user to which the data to be detected belongs based on the user feature vector of the data to be detected and the user conversion model, wherein the category comprises an active conversion type and a passive conversion type.

According to the embodiment of the invention, the user characteristic vector is mapped into the updated user characteristic vector with higher dimensionality, and the user characteristic vector of the sample data is improved, so that a user transformation model can be obtained through training, the complex interaction relation among data can be better extracted, the model discrimination is improved, and the user type with higher accuracy is obtained. The embodiment of the invention is applied to a marketing scene, and is beneficial to improving the conversion rate of the precipitation users in the marketing scene.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

Fig. 1 is a flowchart illustrating a user classification method according to an embodiment of the present invention. The user classification method can be executed by a server, is applied to a marketing scene, and is used for obtaining a user transformation model with higher precision, so that self-transformation users and non-self-transformation users (for example, precipitation users) can be more accurately distinguished.

In one embodiment, the server may be a server cluster consisting of a plurality of servers. The user classification method may include the steps of:

step S101, determining a group of sample data, wherein each sample data comprises a user feature vector and a self-transformation value associated with the user feature vector, and the self-transformation value is used for indicating whether a user actively executes preset operation or not;

step S102, for each sample data in the set of sample data, mapping the user characteristic vector of the sample data into an updated user characteristic vector, wherein the dimensionality of the updated user characteristic vector is greater than the dimensionality of the user characteristic vector;

step S103, training to obtain a user transformation model by using the updated user feature vector and the associated self-transformation value thereof, wherein the user transformation model is used for calculating the probability of the user for executing the preset operation;

and S104, determining the category of the user to which the data to be detected belongs based on the user feature vector of the data to be detected and the user conversion model, wherein the category comprises an active conversion type and a passive conversion type.

More specifically, in step S101, a set of sample data may be determined. The sample data may be statistically relevant data of precipitation users (e.g., users who do not actively apply for a loan) and relevant data of non-precipitation users (e.g., users who actively apply for a loan).

In one embodiment, each sample data may include a user feature vector and its associated self-translation value that can be used to indicate whether the user is actively performing a preset operation, e.g., may be used to indicate whether the user is actively applying for a loan.

In step S102, for a certain set of sample data, the user feature vector of each sample data may be mapped to obtain an updated user feature vector with a higher dimensionality.

In one embodiment, a random forest algorithm may be used to obtain updated user feature vectors with higher dimensionality. Fig. 2 is a flowchart illustrating an embodiment of step S102 shown in fig. 1. Specifically, the step S102 may include the steps of:

step S1021, determining the number of decision trees in the random forest algorithm;

step S1022, map the user' S characteristic vector of the said sample data into the characteristic vector of said quantity of users after mapping by using the algorithm of decision tree;

and S1023, forming the updated user feature vector by using a random forest algorithm and the mapped user feature vector.

Specifically, in step S1021, parameter values in the random forest algorithm may be determined, for example, the number of decision trees in the random algorithm may be determined, for example, may be 3, 4, or 5, and so on.

In step S1022, the user feature vector of each sample data may be mapped by using a decision tree algorithm, and a certain number of mapped user feature vectors are obtained.

Then, in step S1023, the updated user feature vector may be generated by using a random forest algorithm and the mapped user feature vector.

Further, in step S103, a user conversion model may be obtained by training using the updated user feature vector and its associated self-conversion value, and the user conversion model may be used to calculate a probability that the user performs the preset operation, for example, the preset operation is an active loan application operation or an inactive loan application operation. The user conversion model may be used to calculate the probability that the user is actively applying for a loan operation, or the probability that the user is not actively (e.g., passively) applying for a loan operation.

In one embodiment, the user transformation model may be obtained by training based on the updated user feature vector and its associated self-transformation value by using a neural network algorithm. For example, the updated user feature vector and the associated self-transformation value thereof are used as input data of the neural network algorithm, and each parameter of the neural network model is obtained through training, so that the user transformation model can be obtained.

In specific implementation, the updated user feature vector and the associated self-transformation value thereof may be input into a deep learning neural network model, and each parameter of the deep learning neural network model is obtained through training, so as to obtain the user transformation model.

In a specific implementation, the user translation model (i.e., the neural network model) may include a hidden layer. Through statistical tests, it can be obtained that when the number of the hidden layers is less than or equal to 3, the accuracy of the user transformation model for user classification is high, and the complexity is low.

In step S104, a category of a user to which the data to be detected belongs may be determined based on the user feature vector of the data to be detected and the user transformation model, where the category may include an active transformation type and a passive transformation type.

In an embodiment, the probability that the user to which the data to be detected belongs actively executes the loan application operation may be calculated based on the user feature vector of the data to be detected and the user conversion model.

In another embodiment, the probability that the user to which the data to be detected belongs does not perform (or passively perform) the loan application operation may be calculated based on the user feature vector of the data to be detected and the user conversion model.

Further, after the probability is calculated, if the probability is smaller than a preset probability, the category of the user to which the data to be detected belongs can be determined to be the active conversion type. And if the probability is not less than the preset probability, determining that the category of the user to which the data to be detected belongs is the passive conversion type. Or if the probability is smaller than a preset probability, determining that the category of the user to which the data to be detected belongs is the passive conversion type. And if the probability is not less than the preset probability, determining that the category of the user to which the data to be detected belongs is the active conversion type.

The following specifically explains the embodiment of the present invention by taking the number of decision trees equal to 2 as an example.

Fig. 3 is a simplified schematic diagram of a random forest algorithm according to an embodiment of the present invention. In a specific implementation, first, sample data is determined from the historical data, each sample data being (X)_i,y_i). Wherein, X_iIs a characteristic variable of the user, y_iFor the user's self-translated value, when the user translates by itself, y_iIs 1, and vice versa y_iIs 0, i is a natural number.

And secondly, determining to obtain updated user feature vectors with higher dimensionality by adopting a random forest algorithm. Those skilled in the art understand that the random forest algorithm is composed of a plurality of decision trees. As shown in fig. 2, assume that the number of decision trees is 2, i.e., a random forest containing 2 decision trees is trained. If 2 decision trees get leaf node numbers of 2 and 4 respectively. Ith data X_iIf the data falls on the 2 nd node of the 1 st decision tree and the 3 rd node of the 2 nd decision tree, the corresponding new feature of the data is (0,1,0,0,1, 0).

The more the number of the decision trees trained in the random forest algorithm is, the higher the new feature dimension can be obtained by mapping, so that the mapping from low-latitude data to high-latitude data can be completed. In FIG. 2, data X_iFor example, X_iAnd respectively used as input data of the decision tree 1 and the decision tree 2, and respectively obtained leaf data through calculation of a decision tree algorithm. Wherein, the decision tree 1 obtains a leaf 1 and a leaf 2; decision Tree 2 gets leaves 1 through 4, i.e. X_iThe resulting updated feature vector is (0,1,0,0,1, 0).

Further, deep learning neural network training may be performed on the obtained high-dimensional feature vector, and the training result may refer to fig. 4. Fig. 4 is a schematic diagram of a perceptron structure of a neural network algorithm according to an embodiment of the present invention. As shown in fig. 4, the training result may be y ═ f (∑ e)_iX_iW_i- θ). Wherein y represents a probability of the user performing the preset operation, W_iWeight vector, X, representing user i_iThe feature vector of user i is represented and θ represents the error.

The parameters of the training result may be obtained by training according to specific input data, or may be obtained by experimental tests and calculation errors, for example, the number of layers of the hidden layer of the deep learning neural network may be obtained by experiment and adjustment according to the experimental tests, the errors, and the accuracy.

In order to verify the effect of the user transformation model provided by the embodiment of the invention, the result obtained by the user transformation model can be compared with the result obtained by the traditional method. In practical application, performance indexes such as accuracy and recall rate can be used as comparison objects, so that a user conversion model with better performance can be selected.

In summary, the embodiments of the present invention provide a method for performing variable mapping using a random forest algorithm in a marketing scene and training a model using a decision tree algorithm, so as to map low-dimensional data to a high-dimensional space, and extract a complex interaction relationship between data better by combining a neural network algorithm, improve a degree of discrimination of the model, and finally improve a conversion rate of a precipitation user in the marketing scene.

Fig. 5 is a schematic structural diagram of a user classification device according to an embodiment of the present invention. The user classification device 5 may implement the method solutions shown in fig. 1 and fig. 2, and is executed by a server.

Specifically, the user classification device 5 may include: a first determining module 51, configured to determine a set of sample data, where each sample data includes a user feature vector and a self-transformation value associated with the user feature vector, and the self-transformation value is used to indicate whether a user actively performs a preset operation; a mapping module 52, configured to map, for each sample data in the set of sample data, a user feature vector of the sample data into an updated user feature vector, where a dimension of the updated user feature vector is greater than a dimension of the user feature vector; a training module 53, configured to train to obtain a user transformation model by using the updated user feature vector and its associated self-transformation value, where the user transformation model is used to calculate a probability that a user executes the preset operation; and a second determining module 54, configured to determine, based on the user feature vector of the data to be detected and the user transformation model, a category of a user to which the data to be detected belongs, where the category includes an active transformation type and a passive transformation type.

In a specific implementation, the second determining module 54 may include: the calculating submodule 541 calculates and obtains the probability of executing the preset operation by the user to which the data to be detected belongs based on the user feature vector of the data to be detected and the user conversion model; the first determining submodule 542 is configured to determine that the category of the user to which the data to be detected belongs is the active conversion type when the probability is smaller than a preset probability, and otherwise determine that the category of the user to which the data to be detected belongs is the passive conversion type.

In a specific implementation, the mapping module 52 may include: a second determining submodule 521, configured to determine the number of decision trees in the random forest algorithm; a mapping submodule 522, configured to map the user feature vectors of the sample data into the number of mapped user feature vectors by using a decision tree algorithm; the generating submodule 523 is configured to form the updated user feature vector by using a random forest algorithm and the mapped user feature vector.

In a specific implementation, the training module 53 may include: and the training submodule 531 trains to obtain the user transformation model by using a neural network algorithm based on the updated user feature vector and the associated self-transformation value thereof.

In a specific implementation, the user conversion model may include hidden layers, and the number of the hidden layers is less than or equal to 3.

In a specific implementation, the neural network algorithm may be a deep learning neural network algorithm.

For more details of the operation principle and the operation mode of the user classification device 5, reference may be made to the related description in fig. 1 and fig. 2, and details are not repeated here.

Further, the embodiment of the present invention also discloses a storage medium, on which computer instructions are stored, and when the computer instructions are executed, the technical solution of the method in the embodiment shown in fig. 1 and fig. 2 is executed. Preferably, the storage medium may include a computer-readable storage medium such as a non-volatile (non-volatile) memory or a non-transitory (non-transient) memory. The storage medium may include ROM, RAM, magnetic or optical disks, etc.

Further, an embodiment of the present invention further discloses a server, which includes a memory and a processor, where the memory stores computer instructions capable of being executed on the processor, and the processor executes the computer instructions to execute the technical solutions of the methods in the embodiments shown in fig. 1 and fig. 2.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method for classifying a user, comprising:

determining a group of sample data, wherein each sample data comprises a user feature vector and a self-conversion value associated with the user feature vector, and the self-conversion value is used for indicating whether a user actively executes a preset operation;

for each sample data in the set of sample data, mapping a user feature vector of the sample data into an updated user feature vector, wherein the dimensionality of the updated user feature vector is greater than the dimensionality of the user feature vector;

training to obtain a user transformation model by using the updated user feature vector and the associated self-transformation value thereof, wherein the user transformation model is used for calculating the probability of executing the preset operation by the user;

determining the category of the user to which the data to be detected belongs based on the user feature vector of the data to be detected and the user conversion model, wherein the category comprises an active conversion type and a passive conversion type.

2. The user classification method according to claim 1, wherein the determining the category of the user to which the data to be detected belongs based on the user feature vector of the data to be detected and the user transformation model comprises:

calculating to obtain the probability of executing the preset operation by the user to which the data to be detected belongs based on the user feature vector of the data to be detected and the user transformation model;

and when the probability is smaller than a preset probability, determining the category of the user to which the data to be detected belongs as the active conversion type, otherwise, determining the category of the user to which the data to be detected belongs as the passive conversion type.

3. The method according to claim 1, wherein the mapping the user feature vector of the sample data to an updated user feature vector comprises:

determining the number of decision trees in a random forest algorithm;

mapping the user characteristic vectors of the sample data into the number of mapped user characteristic vectors by utilizing a decision tree algorithm;

and forming the updated user characteristic vector by using a random forest algorithm and the mapped user characteristic vector.

4. The method according to claim 1, wherein the training to obtain the user transformation model using the updated user feature vector and its associated self-transformation value comprises:

and training to obtain the user transformation model by adopting a neural network algorithm based on the updated user feature vector and the associated self-transformation value thereof.

5. The user classification method according to claim 4, wherein the user conversion model comprises hidden layers, and the number of the hidden layers is less than or equal to 3.

6. The user classification method according to claim 4 or 5, characterized in that the neural network algorithm is a deep learning neural network algorithm.

7. A user classifying apparatus, comprising:

the device comprises a first determination module, a second determination module and a third determination module, wherein the first determination module is used for determining a group of sample data, each sample data comprises a user feature vector and a self-conversion value associated with the user feature vector, and the self-conversion value is used for indicating whether a user actively executes a preset operation or not;

a mapping module, for each sample data in the set of sample data, mapping a user feature vector of the sample data into an updated user feature vector, wherein a dimension of the updated user feature vector is greater than a dimension of the user feature vector;

the training module is used for training to obtain a user transformation model by utilizing the updated user feature vector and the associated self-transformation value thereof, and the user transformation model is used for calculating the probability of executing the preset operation by the user;

and the second determining module is used for determining the category of the user to which the data to be detected belongs based on the user feature vector of the data to be detected and the user conversion model, wherein the category comprises an active conversion type and a passive conversion type.

8. The apparatus of claim 7, wherein the second determining module comprises: the calculation submodule is used for calculating and obtaining the probability of executing the preset operation by the user to which the data to be detected belongs based on the user characteristic vector of the data to be detected and the user conversion model;

and the first determining submodule is used for determining the category of the user to which the data to be detected belongs as the active conversion type when the probability is smaller than a preset probability, and otherwise, determining the category of the user to which the data to be detected belongs as the passive conversion type.

9. A storage medium having stored thereon computer instructions, characterized in that the computer instructions are operative to perform the steps of the method of any one of claims 1 to 6.

10. A server comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the method of any one of claims 1 to 6.