CN113112346A - User classification method and device, electronic equipment and storage medium - Google Patents

User classification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113112346A
CN113112346A CN202110483797.5A CN202110483797A CN113112346A CN 113112346 A CN113112346 A CN 113112346A CN 202110483797 A CN202110483797 A CN 202110483797A CN 113112346 A CN113112346 A CN 113112346A
Authority
CN
China
Prior art keywords
user
model
data
classification model
user classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110483797.5A
Other languages
Chinese (zh)
Inventor
许天歌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Puhui Enterprise Management Co Ltd
Original Assignee
Ping An Puhui Enterprise Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Puhui Enterprise Management Co Ltd filed Critical Ping An Puhui Enterprise Management Co Ltd
Priority to CN202110483797.5A priority Critical patent/CN113112346A/en
Publication of CN113112346A publication Critical patent/CN113112346A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Finance (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Evolutionary Biology (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to big data technology, and discloses a user classification method, which comprises the following steps: collecting data of a sample user to obtain a sample data set; constructing an initial user classification model based on the ordered discrete selection model and a preset number of categories; constructing a user classification model by utilizing the sample data set based on the initial user classification model; and classifying the user data to be classified by utilizing the user classification model to obtain a classification result. In addition, the invention also relates to a block chain technology, and the user data to be classified can be stored in the nodes of the block chain. The invention also provides a user classification device, electronic equipment and a computer readable storage medium. The invention can solve the problem of low accuracy of the user classification result.

Description

User classification method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of big data technologies, and in particular, to a user classification method and apparatus, an electronic device, and a computer-readable storage medium.
Background
Under the big data era, users are classified, so that the behaviors of the users can be effectively predicted, the values of various users are mined, and effective information is provided for a service system. For example, the borrowing user, the business system realizes client segmentation by dividing credit grades and predicts default probability of the client.
At present, the existing loan user classification method is usually only simple two-classification, and the total number of the classified classes is small, so that the accuracy of the classification result is low; meanwhile, the classification standard of the existing user classification method is a universal credit grade, and the loss degree or the profit capacity of the loan cannot be determined without combining the repayment performance of the client, so that the value of the client at each grade cannot be effectively distinguished, and the accuracy of the distinguishing result is not high.
Disclosure of Invention
The invention provides a user classification method, a user classification device and a computer readable storage medium, and mainly aims to solve the problem of low accuracy of user classification results.
In order to achieve the above object, the present invention provides a user classification method, including:
collecting sample user data to obtain a sample data set;
constructing an initial user classification model based on the ordered discrete selection model and a preset number of categories;
constructing a user classification model by utilizing the sample data set based on the initial user classification model;
and classifying the user data to be classified by utilizing the user classification model to obtain a classification result.
Optionally, the acquiring data of a sample user to obtain a sample data set includes:
acquiring basic information and repayment data of a sample user to obtain user data;
determining a label of the user data according to a preset category total and the repayment data;
and correspondingly collecting the user data and the label to obtain a sample data set.
Optionally, the constructing an initial user classification model based on the ordered discrete selection model and the preset number of categories includes:
obtaining an ordered discrete selection model;
and enabling the ordered discrete selection model to obey ordered multi-classification distribution, and combining the preset number of categories with the ordered discrete selection model to obtain an initial user classification model.
Optionally, the subjecting the ordered discrete selection model to ordered multi-class distribution, and combining the preset number of classes with the ordered discrete selection model to obtain an initial user classification model includes:
subjecting an error parameter epsilon in the ordered discrete selection model to an ordered multi-class distribution;
and transforming the ordered discrete selection model according to the ordered multi-classification distribution and the preset number of the types to obtain an initial user classification model.
Optionally, the constructing a user classification model by using the sample data set based on the initial user classification model includes:
converting the model parameters in the initial user classification model into likelihood function representation to obtain a likelihood function;
carrying out logarithm taking processing on the likelihood function, and solving by using the sample data set to obtain a model parameter value;
and substituting the model parameter values into the initial user classification model to obtain a user classification model.
Optionally, the classifying the user data to be classified by using the user classification model to obtain a classification result includes:
acquiring user data to be classified from a preset service system;
solving the user data to be classified by using the user classification model and a preset probability condition to obtain the probability that the user data to be classified belongs to each category in a preset category number;
and selecting the category corresponding to the maximum probability value as the classification result of the user data to be classified.
Optionally, the solving of the user data to be classified by using the user classification model and a preset probability condition to obtain the probability that the user data to be classified belongs to each category in a preset number of categories includes:
inputting the user data to be classified into the user classification model to obtain a plurality of probability formulas;
and solving the probability formulas and preset probability conditions in parallel to obtain the probability that the user data to be classified belongs to each category in the preset category number.
In order to solve the above problem, the present invention further provides a user classifying device, including:
the data acquisition module is used for acquiring data of a sample user to obtain a sample data set;
the initial model building module is used for building an initial user classification model based on the ordered discrete selection model and the preset number of categories;
the classification model construction module is used for constructing a user classification model by utilizing the sample data set based on the initial user classification model;
and the user classification module is used for classifying the user data to be classified by utilizing the user classification model to obtain a classification result.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one instruction; and
and the processor executes the instructions stored in the memory to realize the user classification method.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, which stores at least one instruction, and the at least one instruction is executed by a processor in an electronic device to implement the user classification method described above.
The embodiment of the invention constructs the initial user classification model based on the ordered discrete selection model and the preset number of categories, wherein the ordered discrete selection model is a multi-category ordered discrete selection model, can solve the problem that the dependent variable is a plurality of category variables, enlarges the category total number of user classification, and can improve the accuracy of the classification result due to the characteristic that the discrete dependent variable in the ordered discrete selection model has a logical size relationship; meanwhile, based on the initial user classification model, the user classification model is constructed by utilizing the sample data set, the sample data set can comprise various data of the user, such as not only basic information of the user but also repayment data of the user, the data information of the user can be fully utilized, the information loss is effectively reduced, the basis of the classification standard is expanded, and the accuracy of the classification result of the user classification model can be improved. Therefore, the user classification method, the user classification device, the electronic equipment and the computer readable storage medium provided by the invention can solve the problem of low accuracy of the user classification result.
Drawings
Fig. 1 is a schematic flowchart of a user classification method according to an embodiment of the present invention;
FIG. 2 is a functional block diagram of a user classifying device according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device implementing the user classification method according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the application provides a user classification method. The execution subject of the user classification method includes, but is not limited to, at least one of electronic devices such as a server and a terminal that can be configured to execute the method provided by the embodiments of the present application. In other words, the user classification method may be performed by software or hardware installed in the terminal device or the server device, and the software may be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Fig. 1 is a schematic flow chart of a user classification method according to an embodiment of the present invention.
In this embodiment, the user classification method includes:
and S1, collecting data of the sample user to obtain a sample data set.
The sample user data in the embodiment of the present invention mainly refers to data information of a user of a borrowing user, and includes basic information of the user, such as a user name, an age, a scholarship, and the like, and payment data of the user, that is, a payment record of the user.
In detail, the acquiring data of the sample user to obtain a sample data set includes:
acquiring basic information and repayment data of a sample user to obtain user data;
determining a label of the user data according to a preset category total and the repayment data;
and correspondingly collecting the user data and the label to obtain a sample data set.
The preset category total number refers to the category total number classified by the user.
Further, in the embodiment of the present invention, the users are subdivided into four categories based on the breach risk degree of the repayment data of the users: no risk users, low risk users, high risk users, and lost users. Risk-free customers, namely all normal repayment or advanced repayment in the whole repayment period without overdue; low-risk customers who are overdue for a short period (no more than 90 days) and pay in time without bad accounts or unpaid accounts; high-risk customers who have long overdue (overdue for more than 90 days) and whose repayment period ends within 2 years settle bad accounts; the customer is lost, i.e. there is an unrendered account 2 years after the end of the repayment period.
And S2, constructing an initial user classification model based on the ordered discrete selection model and the preset number of the categories.
In detail, the constructing an initial user classification model based on the ordered discrete selection model and the preset number of categories includes:
obtaining an ordered discrete selection model;
and enabling the ordered discrete selection model to obey ordered multi-classification distribution, and combining the preset number of categories with the ordered discrete selection model to obtain an initial user classification model.
Furthermore, the division of the user types is divided according to the default degree, the default degree has a progressive relation, and if a traditional classification model is used, progressive information of the user default degree is lost, so that the ordered discrete selection model adopted in the embodiment of the invention is a multi-classification ordered discrete selection model, the problem that dependent variables are multiple category variables can be solved, and the prediction effect of classification can be improved by using the logical size relation of the discrete dependent variables.
Optionally, the ordered discrete selection model is as follows:
y*=βTX+ε
wherein, y*Is the degree of default of the user, beta is a parameter vector, betaTIs the transpose of the parameter vector beta, X is the user data information, and epsilon is the error parameter of the model.
The ordered discrete choice model can analyze the relationship between the dependent variable Y and the independent variable X when the dependent variable Y is a multi-classification variable rather than a continuous variable. For example, consumers often compare several different brands when purchasing automobiles, such as Ford, Honda, the public, and so on. If the consumers select Ford cars as Y1, select Honda cars as Y2, select public cars as Y3, and predict which car brands the consumers will select according to the consumers own data through the ordered discrete selection model.
Further, in the embodiment of the present invention, the default degree of the user belongs to an unobservable continuous variable, the default degree of the user corresponds to a preset number of categories (i.e., user classifications), and the combination of the preset number of categories and the ordered discrete selection model is to correspond a dependent variable in the ordered discrete selection model to the preset number of categories and substitute the dependent variable into a formula in the ordered discrete selection model, so that:
y=1,if y*≤α1
y=2,ifα1<y*≤α2
y=3,ifα2<y*≤α3
y=4,if y*>α3
where y represents a preset user category, 1 is a risk-free user, 2 is a low-risk user, 3 is a high-risk user, 4 is a loss user, α1、α2、α3Is a model parameter, and α123Segmentation point, y, representing the degree of user default*Is the level of default of the user.
Further, said subjecting said ordered discrete choice model to an ordered multi-class distribution comprises:
subjecting an error parameter epsilon in the ordered discrete selection model to an ordered multi-class distribution;
and transforming the ordered discrete selection model according to the ordered multi-classification distribution and the preset number of the types to obtain an initial user classification model.
Further, the transforming the ordered discrete selection model according to the ordered multi-class distribution and the preset number of classes is to convert dependent variables in the ordered discrete selection model corresponding to the preset number of classes into probabilities based on the ordered multi-class distribution to obtain an initial user classification model, and the transforming includes:
p(y=1)=p(y*≤α1)=P(βTx+ε≤α1)=F(α1Tx)
p(y=2)=p(α1<y*≤α2)=F(α2Tx)-F(α1Tx)
p(y=3)=p(α2<y*≤α3)=F(α3Tx)-F(α2Tx)
p(y=4)=p(y*>α3)=1-F(α3Tx)
where p (y ═ 1) is the probability that the user belongs to an inauguration user, p (y ═ 2) is the probability that the user belongs to a low risk user, p (y ═ 3) is the probability that the user belongs to a high risk user, and p (y ═ 4) is the probability that the user belongs to a lost user.
And S3, constructing a user classification model by using the sample data set based on the initial user classification model.
In detail, the constructing a user classification model by using the sample data set based on the initial user classification model includes:
converting the model parameters in the initial user classification model into likelihood function representation to obtain a likelihood function;
carrying out logarithm taking processing on the likelihood function, and solving by using the sample data set to obtain a model parameter value;
and substituting the model parameter values into the initial user classification model to obtain a user classification model.
Wherein the likelihood function is as follows:
Figure BDA0003049495730000061
wherein, [ y ]i=m]Is shown when yiWhen m is equal, the value is 1; when y isiWhen m is not equal, the value is 0, and the value of i is 1, 2, 3 and 4. n is the total number of samples of the sample data set, yiThen it is the label corresponding to the sample in the sample data set.
Further, the log-taking processing on the likelihood function includes:
Figure BDA0003049495730000071
the embodiment of the invention adopts maximum likelihood estimation to obtain the model parameter alpha in the initial user classification model1、α2、α3And obtaining a user classification model according to the maximum likelihood estimation value of the corresponding parameter vector beta.
Further, the substituting the model parameter values into the initial user classification model means that the model parameter values are used to correspondingly replace original parameters in the initial user classification model.
Further, the user classification model is as follows:
Figure BDA0003049495730000072
Figure BDA0003049495730000073
Figure BDA0003049495730000074
wherein p1, p2, p3, p4 represent probabilities of belonging to no-risk users, low-risk users, high-risk users, and lost users, respectively, α 1, α 2, α 3 are model parameter values, xjIs a variable, i.e. the user data to be classified, betajIs a variable xjJ is the total amount of data of the user data to be classified.
The user classification model can be used for classifying loan users, taking user data (including basic information and repayment data) of the users as input, solving according to a formula in the user classification model, and outputting the probability that the users belong to each category in the preset category number, so as to determine the category to which the users correspondingly belong.
And S4, classifying the user data to be classified by using the user classification model to obtain a classification result.
In detail, the classifying the user data to be classified by using the user classification model to obtain a classification result includes:
acquiring user data to be classified from a preset service system;
solving the user data to be classified by using the user classification model and a preset probability condition to obtain the probability that the user data to be classified belongs to each category in a preset category number;
and selecting the category corresponding to the maximum probability value as the classification result of the user data to be classified.
Wherein the preset probability condition is p1+ p2+ p3+ p4 ═ 1.
Further, the solving the user data to be classified by using the user classification model and a preset probability condition to obtain the probability that the user data to be classified belongs to each category in a preset number of categories includes:
inputting the user data to be classified into the user classification model to obtain a plurality of probability formulas;
and solving the probability formulas and preset probability conditions in parallel to obtain the probability that the user data to be classified belongs to each category in the preset category number.
Optionally, to further ensure the security and privacy of the user data to be classified, the user data to be classified may also be obtained from a node of a block chain.
The embodiment of the invention constructs the user classification model based on the ordered discrete selection model to perform fine classification on the users, and adopts different processing strategies for different types of users, thereby effectively improving the value of user data, predicting the behavior of the users, reducing loss and improving the working efficiency. For example, special attention should be paid to users belonging to high-risk users in the lending platform and an effective collection policy should be adopted.
The embodiment of the invention constructs the initial user classification model based on the ordered discrete selection model and the preset number of categories, wherein the ordered discrete selection model is a multi-category ordered discrete selection model, can solve the problem that the dependent variable is a plurality of category variables, enlarges the category total number of user classification, and can improve the accuracy of the classification result due to the characteristic that the discrete dependent variable in the ordered discrete selection model has a logical size relationship; meanwhile, based on the initial user classification model, the user classification model is constructed by utilizing the sample data set, the sample data set not only comprises the basic information of the user but also comprises repayment data of the user, the data information of the user can be fully utilized, the information loss is effectively reduced, the basis of the classification standard is expanded, and the accuracy of the classification result of the user classification model can be improved. Therefore, the user classification method, the user classification device, the electronic equipment and the computer readable storage medium provided by the invention can solve the problem of low accuracy of the user classification result.
Fig. 2 is a functional block diagram of a user classifying device according to an embodiment of the present invention.
The user classifying device 100 according to the present invention may be installed in an electronic device. According to the implemented functions, the user classification apparatus 100 may include a data acquisition module 101, an initial model construction module 102, a classification model construction module 103, and a user classification module 104. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the data acquisition module 101 is configured to acquire data of a sample user to obtain a sample data set.
The sample user data in the embodiment of the present invention mainly refers to data information of a user of a borrowing user, and includes basic information of the user, such as a user name, an age, a scholarship, and the like, and payment data of the user, that is, a payment record of the user.
In detail, the data obtaining module 101 is specifically configured to:
acquiring basic information and repayment data of a sample user to obtain user data;
determining a label of the user data according to a preset category total and the repayment data;
and correspondingly collecting the user data and the label to obtain a sample data set.
The preset category total number refers to the category total number classified by the user.
Further, in the embodiment of the present invention, the users are subdivided into four categories based on the breach risk degree of the repayment data of the users: no risk users, low risk users, high risk users, and lost users. Risk-free customers, namely all normal repayment or advanced repayment in the whole repayment period without overdue; low-risk customers who are overdue for a short period (no more than 90 days) and pay in time without bad accounts or unpaid accounts; high-risk customers who have long overdue (overdue for more than 90 days) and whose repayment period ends within 2 years settle bad accounts; the customer is lost, i.e. there is an unrendered account 2 years after the end of the repayment period.
The initial model building module 102 is configured to build an initial user classification model based on the ordered discrete selection model and a preset number of categories.
In detail, the initial model building module 102 is specifically configured to:
obtaining an ordered discrete selection model;
and enabling the ordered discrete selection model to obey ordered multi-classification distribution, and combining the preset number of categories with the ordered discrete selection model to obtain an initial user classification model.
Furthermore, the division of the user types is divided according to the default degree, the default degree has a progressive relation, and if a traditional classification model is used, progressive information of the user default degree is lost, so that the ordered discrete selection model adopted in the embodiment of the invention is a multi-classification ordered discrete selection model, the problem that dependent variables are multiple category variables can be solved, and the prediction effect of classification can be improved by using the logical size relation of the discrete dependent variables.
Optionally, the ordered discrete selection model is as follows:
y*=βTX+ε
wherein, y*Is the degree of default of the user, beta is a parameter vector, betaTIs the transpose of the parameter vector beta, X is the user data information, and epsilon is the error parameter of the model.
The ordered discrete choice model can analyze the relationship between the dependent variable Y and the independent variable X when the dependent variable Y is a multi-classification variable rather than a continuous variable. For example, consumers often compare several different brands when purchasing automobiles, such as Ford, Honda, the public, and so on. If the consumers select Ford cars as Y1, select Honda cars as Y2, select public cars as Y3, and predict which car brands the consumers will select according to the consumers own data through the ordered discrete selection model.
Further, in the embodiment of the present invention, the default degree of the user belongs to an unobservable continuous variable, the default degree of the user corresponds to a preset number of categories (i.e., user classifications), and the combination of the preset number of categories and the ordered discrete selection model is to correspond a dependent variable in the ordered discrete selection model to the preset number of categories and substitute the dependent variable into a formula in the ordered discrete selection model, so that:
y=1,if y*≤α1
y=2,ifα1<y*≤α2
y=3,ifα2<y*≤α3
y=4,if y*>α3
where y represents a preset user category, 1 is a risk-free user, 2 is a low-risk user, 3 is a high-risk user, 4 is a loss user, α1、α2、α3Is a model parameter, and α123Segmentation point, y, representing the degree of user default*Is the level of default of the user.
Further, said subjecting said ordered discrete choice model to an ordered multi-class distribution comprises:
subjecting an error parameter epsilon in the ordered discrete selection model to an ordered multi-class distribution;
and transforming the ordered discrete selection model according to the ordered multi-classification distribution and the preset number of the types to obtain an initial user classification model.
Further, the transforming the ordered discrete selection model according to the ordered multi-class distribution and the preset number of classes is to convert dependent variables in the ordered discrete selection model corresponding to the preset number of classes into probabilities based on the ordered multi-class distribution to obtain an initial user classification model, and the transforming includes:
p(y=1)=p(y*≤α1)=P(βTx+ε≤α1)=F(α1Tx)
p(y=2)=p(α1<y*≤α2)=F(α2Tx)-F(α1Tx)
p(y=3)=p(α2<y*≤α3)=F(α3Tx)-F(α2Tx)
p(y=4)=p(y*>α3)=1-F(α3Tx)
where p (y ═ 1) is the probability that the user belongs to an inauguration user, p (y ═ 2) is the probability that the user belongs to a low risk user, p (y ═ 3) is the probability that the user belongs to a high risk user, and p (y ═ 4) is the probability that the user belongs to a lost user.
The classification model building module 103 is configured to build a user classification model by using the sample data set based on the initial user classification model.
In detail, the classification model building module 103 is specifically configured to:
converting the model parameters in the initial user classification model into likelihood function representation to obtain a likelihood function;
carrying out logarithm taking processing on the likelihood function, and solving by using the sample data set to obtain a model parameter value;
and substituting the model parameter values into the initial user classification model to obtain a user classification model.
Wherein the likelihood function is as follows:
Figure BDA0003049495730000111
wherein, [ y ]i=m]Is shown when yiWhen m is equal, the value is 1; when y isiWhen m is not equal, the value is 0, and the value of i is 1, 2, 3 and 4. n is the total number of samples of the sample data set, yiThen it is the label corresponding to the sample in the sample data set.
Further, the log-taking processing on the likelihood function includes:
Figure BDA0003049495730000112
the embodiment of the invention adopts maximum likelihood estimation to obtain the model parameter alpha in the initial user classification model1、α2、α3And obtaining a user classification model according to the maximum likelihood estimation value of the corresponding parameter vector beta.
Further, the substituting the model parameter values into the initial user classification model means that the model parameter values are used to correspondingly replace original parameters in the initial user classification model.
Further, the user classification model is as follows:
Figure BDA0003049495730000121
Figure BDA0003049495730000122
Figure BDA0003049495730000123
wherein p1, p2, p3, p4 represent probabilities of belonging to no-risk users, low-risk users, high-risk users, and lost users, respectively, α 1, α 2, α 3 are model parameter values, xjIs a variable, i.e. the user data to be classified, betajIs a variable xjJ is the total amount of data of the user data to be classified.
The user classification model can be used for classifying loan users, taking user data (including basic information and repayment data) of the users as input, solving according to a formula in the user classification model, and outputting the probability that the users belong to each category in the preset category number, so as to determine the category to which the users correspondingly belong.
The user classification module 104 is configured to classify the user data to be classified by using the user classification model to obtain a classification result.
In detail, the user classification module 104 is specifically configured to:
acquiring user data to be classified from a preset service system;
solving the user data to be classified by using the user classification model and a preset probability condition to obtain the probability that the user data to be classified belongs to each category in a preset category number;
and selecting the category corresponding to the maximum probability value as the classification result of the user data to be classified.
Wherein the preset probability condition is p1+ p2+ p3+ p4 ═ 1.
Further, the solving the user data to be classified by using the user classification model and a preset probability condition to obtain the probability that the user data to be classified belongs to each category in a preset number of categories includes:
inputting the user data to be classified into the user classification model to obtain a plurality of probability formulas;
and solving the probability formulas and preset probability conditions in parallel to obtain the probability that the user data to be classified belongs to each category in the preset category number.
Optionally, to further ensure the security and privacy of the user data to be classified, the user data to be classified may also be obtained from a node of a block chain.
The embodiment of the invention constructs the user classification model based on the ordered discrete selection model to perform fine classification on the users, and adopts different processing strategies for different types of users, thereby effectively improving the value of user data, predicting the behavior of the users, reducing loss and improving the working efficiency. For example, special attention should be paid to users belonging to high-risk users in the lending platform and an effective collection policy should be adopted.
Fig. 3 is a schematic structural diagram of an electronic device for implementing a user classification method according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a user classification program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of the user classification program 12, but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (e.g., user classification programs, etc.) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The user classification program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:
collecting data of a sample user to obtain a sample data set;
constructing an initial user classification model based on the ordered discrete selection model and a preset number of categories;
constructing a user classification model by utilizing the sample data set based on the initial user classification model;
and classifying the user data to be classified by utilizing the user classification model to obtain a classification result.
Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiments corresponding to fig. 1 to fig. 3, which is not repeated herein.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
The present invention also provides a computer-readable storage medium, storing a computer program which, when executed by a processor of an electronic device, may implement:
collecting data of a sample user to obtain a sample data set;
constructing an initial user classification model based on the ordered discrete selection model and a preset number of categories;
constructing a user classification model by utilizing the sample data set based on the initial user classification model;
and classifying the user data to be classified by utilizing the user classification model to obtain a classification result.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method for classifying a user, the method comprising:
collecting data of a sample user to obtain a sample data set;
constructing an initial user classification model based on the ordered discrete selection model and a preset number of categories;
constructing a user classification model by utilizing the sample data set based on the initial user classification model;
and classifying the user data to be classified by utilizing the user classification model to obtain a classification result.
2. The method for classifying a user according to claim 1, wherein the collecting data of the sample user to obtain a sample data set comprises:
acquiring basic information and repayment data of a sample user to obtain user data;
determining a label of the user data according to a preset category total and the repayment data;
and correspondingly collecting the user data and the label to obtain a sample data set.
3. The user classification method according to claim 1, wherein the constructing an initial user classification model based on the ordered discrete selection model and the preset number of classes comprises:
obtaining an ordered discrete selection model;
and enabling the ordered discrete selection model to obey ordered multi-classification distribution, and combining the preset number of categories with the ordered discrete selection model to obtain an initial user classification model.
4. The method for classifying a user according to claim 3, wherein said subjecting said ordered discrete selection model to an ordered multi-classification distribution and combining said preset number of classes with said ordered discrete selection model to obtain an initial user classification model comprises:
subjecting an error parameter epsilon in the ordered discrete selection model to an ordered multi-class distribution;
and transforming the ordered discrete selection model according to the ordered multi-classification distribution and the preset number of the types to obtain an initial user classification model.
5. The user classification method according to claim 1, wherein said constructing a user classification model using the sample data set based on the initial user classification model comprises:
converting the model parameters in the initial user classification model into likelihood function representation to obtain a likelihood function;
carrying out logarithm taking processing on the likelihood function, and solving by using the sample data set to obtain a model parameter value;
and substituting the model parameter values into the initial user classification model to obtain a user classification model.
6. The method for classifying users according to claim 1, wherein the classifying the user data to be classified by using the user classification model to obtain a classification result comprises:
acquiring user data to be classified from a preset service system;
solving the user data to be classified by using the user classification model and a preset probability condition to obtain the probability that the user data to be classified belongs to each category in a preset category number;
and selecting the category corresponding to the maximum probability value as the classification result of the user data to be classified.
7. The method for classifying users according to claim 6, wherein said solving the user data to be classified by using the user classification model and a preset probability condition to obtain the probability that the user data to be classified belongs to each category in a preset number of categories comprises:
inputting the user data to be classified into the user classification model to obtain a plurality of probability formulas;
and solving the probability formulas and preset probability conditions in parallel to obtain the probability that the user data to be classified belongs to each category in the preset category number.
8. An apparatus for classifying a user, the apparatus comprising:
the data acquisition module is used for acquiring user data of the sample to obtain a sample data set;
the initial model building module is used for building an initial user classification model based on the ordered discrete selection model and the preset number of categories;
the classification model construction module is used for constructing a user classification model by utilizing the sample data set based on the initial user classification model;
and the user classification module is used for classifying the user data to be classified by utilizing the user classification model to obtain a classification result.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a user classification method according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a user classification method according to any one of claims 1 to 7.
CN202110483797.5A 2021-04-30 2021-04-30 User classification method and device, electronic equipment and storage medium Pending CN113112346A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110483797.5A CN113112346A (en) 2021-04-30 2021-04-30 User classification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110483797.5A CN113112346A (en) 2021-04-30 2021-04-30 User classification method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113112346A true CN113112346A (en) 2021-07-13

Family

ID=76720781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110483797.5A Pending CN113112346A (en) 2021-04-30 2021-04-30 User classification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113112346A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006164257A (en) * 2004-11-15 2006-06-22 Sbi Holdings Inc Scoring model evaluation method using credit authorization, scoring model evaluation apparatus, authorization system, and scoring model evaluation program
JP2007004828A (en) * 2004-11-15 2007-01-11 Sbi Holdings Inc Scoring model evaluation device utilizing credit administration, and administration system
JP2010003106A (en) * 2008-06-20 2010-01-07 Nippon Telegr & Teleph Corp <Ntt> Classification model generation device, classification device, classification model generation method, classification method, classification model generation program, classification program and recording medium
CN108399418A (en) * 2018-01-23 2018-08-14 北京奇艺世纪科技有限公司 A kind of user classification method and device
WO2019019255A1 (en) * 2017-07-25 2019-01-31 平安科技(深圳)有限公司 Apparatus and method for establishing prediction model, program for establishing prediction model, and computer-readable storage medium
CN109949152A (en) * 2019-04-15 2019-06-28 武汉理工大学 A kind of personal credit's violation correction method
CN109993652A (en) * 2019-02-20 2019-07-09 复旦大学 A kind of debt-credit assessing credit risks method and device
CN110502691A (en) * 2019-07-05 2019-11-26 平安科技(深圳)有限公司 Product method for pushing, device and readable storage medium storing program for executing based on client segmentation
WO2020037942A1 (en) * 2018-08-20 2020-02-27 平安科技(深圳)有限公司 Risk prediction processing method and apparatus, computer device and medium
CN112115322A (en) * 2020-09-25 2020-12-22 平安科技(深圳)有限公司 User grouping method and device, electronic equipment and storage medium
WO2021042556A1 (en) * 2019-09-03 2021-03-11 平安科技(深圳)有限公司 Classification model training method, apparatus and device, and computer-readable storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006164257A (en) * 2004-11-15 2006-06-22 Sbi Holdings Inc Scoring model evaluation method using credit authorization, scoring model evaluation apparatus, authorization system, and scoring model evaluation program
JP2007004828A (en) * 2004-11-15 2007-01-11 Sbi Holdings Inc Scoring model evaluation device utilizing credit administration, and administration system
JP2010003106A (en) * 2008-06-20 2010-01-07 Nippon Telegr & Teleph Corp <Ntt> Classification model generation device, classification device, classification model generation method, classification method, classification model generation program, classification program and recording medium
WO2019019255A1 (en) * 2017-07-25 2019-01-31 平安科技(深圳)有限公司 Apparatus and method for establishing prediction model, program for establishing prediction model, and computer-readable storage medium
CN108399418A (en) * 2018-01-23 2018-08-14 北京奇艺世纪科技有限公司 A kind of user classification method and device
WO2020037942A1 (en) * 2018-08-20 2020-02-27 平安科技(深圳)有限公司 Risk prediction processing method and apparatus, computer device and medium
CN109993652A (en) * 2019-02-20 2019-07-09 复旦大学 A kind of debt-credit assessing credit risks method and device
CN109949152A (en) * 2019-04-15 2019-06-28 武汉理工大学 A kind of personal credit's violation correction method
CN110502691A (en) * 2019-07-05 2019-11-26 平安科技(深圳)有限公司 Product method for pushing, device and readable storage medium storing program for executing based on client segmentation
WO2021042556A1 (en) * 2019-09-03 2021-03-11 平安科技(深圳)有限公司 Classification model training method, apparatus and device, and computer-readable storage medium
CN112115322A (en) * 2020-09-25 2020-12-22 平安科技(深圳)有限公司 User grouping method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
林满等: "基于有序离散选择模型的综合评价方法", 统计与决策, 31 August 2014 (2014-08-31), pages 26 - 28 *

Similar Documents

Publication Publication Date Title
CN112541745B (en) User behavior data analysis method and device, electronic equipment and readable storage medium
CN113688923A (en) Intelligent order abnormity detection method and device, electronic equipment and storage medium
CN111652278A (en) User behavior detection method and device, electronic equipment and medium
CN112306835A (en) User data monitoring and analyzing method, device, equipment and medium
CN114491047A (en) Multi-label text classification method and device, electronic equipment and storage medium
CN113051480A (en) Resource pushing method and device, electronic equipment and storage medium
CN113868528A (en) Information recommendation method and device, electronic equipment and readable storage medium
CN114881616A (en) Business process execution method and device, electronic equipment and storage medium
CN113268665A (en) Information recommendation method, device and equipment based on random forest and storage medium
CN113868529A (en) Knowledge recommendation method and device, electronic equipment and readable storage medium
CN113516417A (en) Service evaluation method and device based on intelligent modeling, electronic equipment and medium
CN112579621A (en) Data display method and device, electronic equipment and computer storage medium
CN114612194A (en) Product recommendation method and device, electronic equipment and storage medium
CN114625975B (en) Knowledge graph-based customer behavior analysis system
CN113435746B (en) User workload scoring method and device, electronic equipment and storage medium
CN116468547A (en) Credit card resource allocation method and system based on data mining
CN112561500B (en) Salary data generation method, device, equipment and medium based on user data
CN113112346A (en) User classification method and device, electronic equipment and storage medium
CN115049383A (en) Combined payment recommendation method and device, electronic equipment and storage medium
CN113657546A (en) Information classification method and device, electronic equipment and readable storage medium
CN113434660A (en) Product recommendation method, device, equipment and storage medium based on multi-domain classification
CN113449002A (en) Vehicle recommendation method and device, electronic equipment and storage medium
CN113191805A (en) Vehicle owner replacement evaluation method, system, electronic equipment and storage medium
CN116991364B (en) Software development system management method based on big data
CN114202367A (en) Rights and interests allocation method, device, equipment and medium based on user portrait

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination