CN110796170A - Client dynamic support model generation method and device and electronic equipment - Google Patents

Client dynamic support model generation method and device and electronic equipment Download PDF

Info

Publication number
CN110796170A
CN110796170A CN201910921528.5A CN201910921528A CN110796170A CN 110796170 A CN110796170 A CN 110796170A CN 201910921528 A CN201910921528 A CN 201910921528A CN 110796170 A CN110796170 A CN 110796170A
Authority
CN
China
Prior art keywords
subset
positive
user set
negative
historical users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910921528.5A
Other languages
Chinese (zh)
Inventor
高明宇
王鹏
张潮华
郑彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qiyu Information Technology Co Ltd
Original Assignee
Beijing Qiyu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qiyu Information Technology Co Ltd filed Critical Beijing Qiyu Information Technology Co Ltd
Priority to CN201910921528.5A priority Critical patent/CN110796170A/en
Publication of CN110796170A publication Critical patent/CN110796170A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Finance (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to a customer dynamic model generation method, a customer dynamic model generation device, an electronic device and a computer readable medium. The method comprises the following steps: acquiring financial data of a plurality of historical users; dividing the plurality of historical users into a positive sample user set and a negative sample user set according to the dynamic branch state; screening financial data of historical users in the positive sample user set and the negative sample user set to generate a positive sample subset, a negative sample subset and an unclassified sample subset; and training a machine learning model based on the subset of positive samples, the subset of negative samples, and the subset of unclassified samples to generate a customer motor model. The client dynamic-support model generation method and device can perform data processing on the existing positive and negative sample data to generate a more accurate positive and negative sample data set, and further train a machine learning model through the accurate positive and negative sample data set to generate a financial risk model.

Description

Client dynamic support model generation method and device and electronic equipment
Technical Field
The present disclosure relates to the field of computer information processing, and in particular, to a method and an apparatus for generating a customer dynamic support model, an electronic device, and a computer-readable medium.
Background
In general, the machine learning model needs to learn positive samples and negative samples, the positive samples are samples corresponding to correctly classified classes, and the negative samples can select any other samples that are not correctly classified in principle. The machine learning model establishes a specific task according to the positive and negative samples, then trains the machine learning through specific data, and after the training is finished, the machine learning model suitable for a certain specific task is obtained.
In general, even though the same algorithm of the machine learning model is used, the machine learning models trained with different data are completely different. However, for the model in the financial field, especially for the machine model of the user financial class for evaluating the user behavior, for example, the customers can be classified into active customers (positive examples) and inactive customers (negative examples) according to the customer activity condition, and the traditional behavior model can be directly modeled by using the above data. However, already active customers can confirm that, however, some of the inactive customers do not represent future inactivity, so it is difficult to accurately determine whether all users in the negative examples are users with negative example characteristics.
If the unmoved client is directly used as the binary model in the negative sample training machine learning model, because a large number of positive samples (clients that may move in the future) exist in the negative sample, the negative sample can introduce a lot of error data, and the final model trained by the negative sample is not ideal.
Therefore, a new customer-created model generation method, apparatus, electronic device, and computer-readable medium are needed.
The above information disclosed in this background section is only for enhancement of understanding of the background of the disclosure and therefore it may contain information that does not constitute prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
In view of this, the present disclosure provides a method, an apparatus, an electronic device, and a computer readable medium for generating a customer dynamic branch model, which can perform data processing on the existing positive and negative sample data to generate a more accurate positive and negative sample data set, and further train a machine learning model through the accurate positive and negative sample data set to generate a financial risk model.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to an aspect of the present disclosure, a method for generating a customer-driven supporting model is provided, the method including: acquiring financial data of a plurality of historical users, wherein the financial data comprises a dynamic branch state; dividing the plurality of historical users into a positive sample user set and a negative sample user set according to the dynamic branch state; screening financial data of historical users in the positive sample user set and the negative sample user set to generate a positive sample subset, a negative sample subset and an unclassified sample subset; and training a machine learning model based on the subset of positive samples, the subset of negative samples, and the subset of unclassified samples to generate a customer motor model.
Optionally, the dynamic branch state comprises a dynamic branch and an unmoved branch; the grouping the plurality of historical users into a positive sample user set and a negative sample user set according to the motor state comprises: dividing the moved historical users into a positive sample user set; and classifying the historical users of the motionless branch into a negative sample user set.
Optionally, the filtering financial data of historical users in the positive sample user set and the negative sample user set to generate a positive sample subset, a negative sample subset, and an unclassified sample subset comprises: filtering financial data of historical users in a positive sample user set and a negative sample user set respectively through an outlier monitoring algorithm to generate the positive sample subset, the negative sample subset and the unclassified sample subset.
Optionally, the filtering financial data of historical users in the positive sample user set and the negative sample user set respectively by an outlier monitoring algorithm to generate the positive sample subset, the negative sample subset, and the unclassified sample subset comprises: generating a first target hypersphere by an outlier monitoring algorithm and financial data of historical users in the positive sample user set; and filtering financial data of historical users in the positive sample user set based on the first target hyper-sphere to generate the positive sample subset and the unclassified sample subset.
Optionally, generating the first target hypersphere by the outlier monitoring algorithm and financial data of the historical users in the positive sample user set comprises: generating a hypersphere equation set through financial data of historical users in the positive sample set and an initial hypersphere equation in an abnormal value monitoring algorithm; determining a relaxation variable threshold and an optimization objective; solving the hypersphere equation based on a relaxation variable threshold and an optimization target to obtain an optimal solution of the hypersphere equation; and generating the first target hyper-sphere based on the optimal solution.
Optionally, filtering financial data of historical users in the positive sample user set based on the first target hyper-sphere to generate the positive sample subset and the unclassified sample subset comprises: determining location coordinates of financial data of historical users in the positive sample user set on a multi-dimensional coordinate system; determining a center of sphere distance of the historical user and the first target hyper-sphere based on the location coordinates; adding the historical user into the positive sample subset when the center-of-sphere distance is less than or equal to the radius of the first target hyper-sphere; and adding the historical user into the unclassified subset of samples when the center of sphere distance is greater than the radius of the first target hypersphere.
Optionally, the filtering financial data of historical users in the positive sample user set and the negative sample user set respectively by an outlier monitoring algorithm to generate the positive sample subset, the negative sample subset, and the unclassified sample subset comprises: generating a second target hypersphere by an outlier monitoring algorithm and financial data of historical users in the negative sample user set; and filtering financial data of historical users in the negative sample user set based on the second target hypersphere to generate the negative sample subset and the unclassified sample subset.
Optionally, training a machine learning model based on the subset of positive samples, the subset of negative samples, and the subset of unclassified samples to generate a customer animated branch model comprises: training a first machine learning model based on the positive sample subset and the negative sample subset to generate a user classification model; updating the positive and negative sample subsets based on the unclassified sample subset and the user classification model; and training a second machine learning model based on the updated subset of positive samples and the updated subset of negative samples to generate the customer motor support model.
Optionally, updating the subset of positive samples, the subset of negative samples based on the subset of unclassified samples and the user classification model comprises: determining, by the user classification module, positive exemplar labels and negative exemplar labels for historical users in the unclassified exemplar subset; grouping historical users having positive sample labels into the positive sample subset; and grouping the historical users with negative example labels into the negative example subset.
Optionally, the method further comprises: inputting financial data of the current user into the client dynamic support model, and acquiring dynamic support willingness intensity of the current user; and generating a user policy for the current user based on the willingness to move and support intensity.
According to an aspect of the present disclosure, a client-driven model generation apparatus is provided, the apparatus including: the data module is used for acquiring financial data of a plurality of historical users, wherein the financial data comprises dynamic states; the set module is used for dividing the plurality of historical users into a positive sample user set and a negative sample user set according to the dynamic branch state; the screening module is used for screening the financial data of the historical users in the positive sample user set and the negative sample user set to generate a positive sample subset, a negative sample subset and an unclassified sample subset; and a training module to train a machine learning model based on the subset of positive samples, the subset of negative samples, and the subset of unclassified samples to generate a customer motor model.
Optionally, the dynamic branch state comprises a dynamic branch and an unmoved branch; the aggregation module comprises: the positive sample unit is used for dividing the moved historical users into a positive sample user set; and the negative sample unit is used for dividing the history users without moving into the negative sample user set.
Optionally, the screening module is further configured to screen financial data of historical users in the positive sample user set and the negative sample user set respectively through an outlier monitoring algorithm to generate the positive sample subset, the negative sample subset, and the unclassified sample subset.
Optionally, the screening module comprises: a first target unit for generating a first target hypersphere by an outlier monitoring algorithm and financial data of historical users in the positive sample user set; and a first set unit for filtering financial data of historical users in the positive sample user set based on the first target hyper-sphere to generate the positive sample subset and the unclassified sample subset.
Optionally, the first target unit comprises: the equation subunit is used for generating a hypersphere equation set through financial data of the historical user in the positive sample set and an initial hypersphere equation in an abnormal value monitoring algorithm; the parameter subunit is used for determining a relaxation variable threshold and an optimization target; the solving subunit is used for solving the hypersphere equation based on the relaxation variable threshold and the optimization target to obtain the optimal solution of the hypersphere equation; and a first subunit for generating the first target hypersphere based on the optimal solution.
Optionally, the first aggregation unit includes: the coordinate subunit is used for determining the position coordinates of the financial data of the historical users in the positive sample user set on a multi-dimensional coordinate system; a distance subunit configured to determine a center distance between the historical user and the first target hyper-sphere based on the position coordinates; a positive sample subunit, configured to add the historical user to the positive sample subset when the center-of-sphere distance is less than or equal to the radius of the first target hyper-sphere; and an unclassified subunit for adding the historical user to the unclassified subset of samples when the centroid distance is greater than the radius of the first target hypersphere.
Optionally, the screening module comprises: the second target unit is used for generating a second target hypersphere through an abnormal value monitoring algorithm and financial data of historical users in the negative sample user set; and a second set unit for filtering financial data of historical users in the negative sample user set based on the second target hyper-sphere to generate the negative sample subset and the unclassified sample subset.
Optionally, the training module comprises: the first model unit is used for training a first machine learning model based on the positive sample subset and the negative sample subset to generate a user classification model; a first computing unit for updating the positive and negative subsets of samples based on the unclassified subset of samples and the user classification model; and the second model unit is used for training a second machine learning model based on the updated positive sample subset and the negative sample subset to generate the customer motor model.
Optionally, the first computing unit includes: a label subunit, configured to determine, by the user classification module, positive sample labels and negative sample labels for the historical users in the unclassified sample subset; and a classification subunit, configured to classify the historical users with positive sample labels into the positive sample subset; and classifying the historical users with the negative sample labels into the negative sample subset.
Optionally, the method further comprises: the model module is used for inputting the financial data of the current user into the client dynamic support model and acquiring the dynamic support willingness intensity of the current user; and generating a user policy for the current user based on the willingness to move and support intensity.
According to an aspect of the present disclosure, an electronic device is provided, the electronic device including: one or more processors; storage means for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method as above.
According to an aspect of the disclosure, a computer-readable medium is proposed, on which a computer program is stored, which program, when being executed by a processor, carries out the method as above.
According to the customer dynamic support model generation method, the customer dynamic support model generation device, the electronic equipment and the computer readable medium, the plurality of historical users are divided into a positive sample user set and a negative sample user set according to the dynamic support state; screening financial data of historical users in the positive sample user set and the negative sample user set to generate a positive sample subset, a negative sample subset and an unclassified sample subset; and training the machine learning model based on the positive sample subset, the negative sample subset and the unclassified sample subset to generate a client dynamic support model, performing data processing on the existing positive and negative sample data to generate a more accurate positive and negative sample data set, and further training the machine learning model through the accurate positive and negative sample data set to generate a financial risk model.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings. The drawings described below are merely some embodiments of the present disclosure, and other drawings may be derived from those drawings by those of ordinary skill in the art without inventive effort.
FIG. 1 is a flow diagram illustrating a method for customer-generated models, according to an example embodiment.
Fig. 2 is a schematic diagram illustrating a customer-interaction model generation method according to another exemplary embodiment.
Fig. 3 is a flow chart illustrating a method for client-based model generation according to another exemplary embodiment.
Fig. 4 is a flowchart illustrating a method for client-based model generation according to another exemplary embodiment.
Fig. 5 is a block diagram illustrating a customer-generated model generation apparatus according to an example embodiment.
FIG. 6 is a block diagram illustrating an electronic device in accordance with an example embodiment.
FIG. 7 is a block diagram illustrating a computer-readable medium in accordance with an example embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals denote the same or similar parts in the drawings, and thus, a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various components, these components should not be limited by these terms. These terms are used to distinguish one element from another. Thus, a first component discussed below may be termed a second component without departing from the teachings of the disclosed concept. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It is to be understood by those skilled in the art that the drawings are merely schematic representations of exemplary embodiments, and that the blocks or processes shown in the drawings are not necessarily required to practice the present disclosure and are, therefore, not intended to limit the scope of the present disclosure.
FIG. 1 is a flow diagram illustrating a method for customer-generated models, according to an example embodiment. The customer-supported model generating method 10 includes at least steps S102 to S108.
As shown in fig. 1, in S102, financial data of a plurality of historical users is acquired, wherein the financial data includes a dynamic branch status. The dynamic support state is used for indicating whether the client has the borrowing service, the user who has the financial borrowing service is a supported user, and the user who does not have the financial borrowing service is an unmounted user.
In S104, the plurality of historical users are classified into a positive sample user set and a negative sample user set according to the dynamic branch state. The dynamic branch state comprises a dynamic branch and an unmoved branch; the grouping the plurality of historical users into a positive sample user set and a negative sample user set according to the motor state comprises: dividing the moved historical users into a positive sample user set; and classifying the historical users of the motionless branch into a negative sample user set.
In S106, financial data of historical users in the positive and negative sample user sets are filtered to generate positive, negative and unclassified sample subsets. The method comprises the following steps: screening financial data of historical users in a positive sample user set and a negative sample user set respectively through an outlier monitoring algorithm (OneClassSVM) to generate the positive sample subset, the negative sample subset and the unclassified sample subset.
Training sample data into a minimum hypersphere (more than 3-dimensional characteristics) according to Support Vector Domain Description (SVDD) by using an OneClassSVM algorithm, wherein a curve is formed in two dimensions, and all data are wrapped, namely abnormal points are eliminated.
In one embodiment, the filtering financial data of historical users in a positive sample user set and a negative sample user set to generate the positive sample subset, the negative sample subset, and the unclassified sample subset by an outlier monitoring algorithm includes: generating a first target hypersphere by an outlier monitoring algorithm and financial data of historical users in the positive sample user set; and filtering financial data of historical users in the positive sample user set based on the first target hyper-sphere to generate the positive sample subset and the unclassified sample subset. Generating a second target hypersphere by an outlier monitoring algorithm and financial data of historical users in the negative sample user set; and filtering financial data of historical users in the negative sample user set based on the second target hypersphere to generate the negative sample subset and the unclassified sample subset.
The content of "filtering financial data of historical users in the positive sample user set and the negative sample user set respectively by an outlier monitoring algorithm to generate the positive sample subset, the negative sample subset and the unclassified sample subset" will be described in detail in the corresponding embodiment of fig. 3.
In S108, a machine learning model is trained based on the subset of positive samples, the subset of negative samples, and the subset of unclassified samples to generate a customer motor model. The method comprises the following steps: training a first machine learning model based on the positive sample subset and the negative sample subset to generate a user classification model; updating the positive and negative sample subsets based on the unclassified sample subset and the user classification model; and training a second machine learning model based on the updated subset of positive samples and the updated subset of negative samples to generate the customer motor support model.
Further comprising: inputting financial data of the current user into the client dynamic support model, and acquiring dynamic support willingness intensity of the current user; and generating a user policy for the current user based on the willingness to move and support intensity.
The content of "training a machine learning model based on the subset of positive samples, the subset of negative samples, and the subset of unclassified samples to generate a customer motivation model" will be described in detail in the corresponding embodiment of fig. 4.
According to the customer dynamic branch model generation method, the plurality of historical users are divided into a positive sample user set and a negative sample user set according to the dynamic branch state; screening financial data of historical users in the positive sample user set and the negative sample user set to generate a positive sample subset, a negative sample subset and an unclassified sample subset; and training the machine learning model based on the positive sample subset, the negative sample subset and the unclassified sample subset to generate a client dynamic support model, performing data processing on the existing positive and negative sample data to generate a more accurate positive and negative sample data set, and further training the machine learning model through the accurate positive and negative sample data set to generate a financial risk model.
It should be clearly understood that this disclosure describes how to make and use particular examples, but the principles of this disclosure are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.
Fig. 2 is a schematic diagram illustrating a customer-interaction model generation method according to another exemplary embodiment. As shown in fig. 2, the clients can be classified according to the client dynamic state, and the clients can be classified into dynamic and non-dynamic clients.
The samples of the financial data of the historical users are divided into two types according to the client dynamic support state, wherein one type is that the clients which have dynamic support are positive samples, and the other type is that whether the clients which do not determine dynamic support are negative samples, wherein a part of the clients can dynamic support in the next period and become positive samples.
Training the positive samples which are already dynamically branched through an OneClassSVM, finding out a high-density group, removing some outlier noise points, and obtaining a real positive sample; the same method is used for training the negative sample, a part with more concentrated negative sample characteristics is found, the part of the sample is defined as a true negative sample, and the true sample and the true negative sample should not intersect with each other theoretically.
And dividing the original data into a real sample, a true negative sample and an unknown sample according to the two high-dimensional hypersphere. And obtaining a new training set by using the real samples and the true negative samples, and classifying the unclassified samples to divide the unknown samples into the real samples and the true negative samples. Finally, a new training set training model is combined, and the model performance is improved.
Fig. 3 is a flow chart illustrating a method for client-based model generation according to another exemplary embodiment. The flow shown in fig. 3 is a detailed description of S106 "filtering financial data of historical users in the positive sample user set and the negative sample user set to generate a positive sample subset, a negative sample subset, and an unclassified sample subset" in the flow shown in fig. 1.
As shown in fig. 3, in S302, a positive sample user set and a negative sample user set are obtained.
In S304, a first target hyper-sphere is generated from the outlier monitoring algorithm and the set of positive sample users.
In S306, a second target hypersphere is generated from the outlier monitoring algorithm and the negative sample user set.
In S308, the positive and negative sample user sets are filtered based on the first and second target hyper-spheres to generate the positive, negative, and unclassified sample sub-sets.
Wherein generating a first target hypersphere by an outlier monitoring algorithm and financial data of historical users in a positive sample user set comprises: generating a hypersphere equation set through financial data of historical users in the positive sample set and an initial hypersphere equation in an abnormal value monitoring algorithm; determining a relaxation variable threshold and an optimization objective; solving the hypersphere equation based on a relaxation variable threshold and an optimization target to obtain an optimal solution of the hypersphere equation; and generating the first target hyper-sphere based on the optimal solution.
Wherein filtering financial data of historical users in the positive sample user set based on the first target hyper-sphere to generate the positive sample subset and the unclassified sample subset comprises: determining location coordinates of financial data of historical users in the positive sample user set on a multi-dimensional coordinate system; determining a center of sphere distance of the historical user and the first target hyper-sphere based on the location coordinates; adding the historical user into the positive sample subset when the center-of-sphere distance is less than or equal to the radius of the first target hyper-sphere; and
adding the historical user into the unclassified subset of samples when the center-of-sphere distance is greater than a radius of the first target hyper-sphere.
Wherein the first target hyper-sphere or the second target sphere can be solved by adopting a Lagrangian dual method to obtain the sphere center distance of the user.
The 0neClassSVM method can be understood as finding a hyperplane to circle out the positive example in the sample, (target hypersphere), and the prediction is to use the hyperplane to make a decision, and the sample in the target hypersphere is considered as the positive sample.
Assuming the generated hypersphere parameters are center o and corresponding hypersphere radius r > 0, hypersphere volume v (r) is minimized, center o is the linear combination of the support rows; it may be required that the distance of the data points xi representing the user feature from the center in all the unclassified sample sets is strictly less than r. Meanwhile, a relaxation variable ζ i with a penalty coefficient of C is constructed, and the optimization problem is shown as follows:
MinV(r)+C∑ζi;
||xi-o||≤r+ζi,i=1,2,3,…m;
ζi≥0,i=1,2,3,…m。
and solving by adopting Lagrangian dual, and then solving the distance from z to the center.
Then, the user is added to the positive sample subset when the center distance is less than or equal to the radius of the first target hyper-sphere. Adding the user to the negative sample subset when the center-of-sphere distance is less than or equal to the radius of the second target hyper-sphere. The remaining users are added to the unclassified subset of samples.
Where the introduction of slack variables is often to facilitate solving over a larger feasible domain. If the value is 0, the state converges to the original state, and if the value is larger than zero, the constraint is relaxed.
The study of the linear programming problem is based on a standard model. Thus, for a given mathematical model of a non-standard type linear programming problem, it needs to be normalized. Generally, for different forms of linear programming models, some methods can be used to normalize them. Wherein the content of the first and second substances,
when the constraint condition is ≦ ("≧") type linear programming problem, a new non-negative variable may be added (or subtracted) to the left of the inequality, i.e., it can be converted into an equation. This newly added non-negative variable is called a slack variable (or residual variable), and may also be referred to collectively as a slack variable. The coefficients of the newly added slack variable are generally considered to be zero in the objective function.
As with the hypersphere formula above, an initial hypersphere equation can be constructed based on the hypersphere equation and the financial data of the historical users in the positive sample set, an optimization goal is determined, and the hypersphere method is continuously solved through the threshold of the relaxation variables until an optimal solution that meets the optimization goal is obtained. And generating the target hypersphere equation according to the parameter of the hypersphere equation corresponding to the optimal solution.
Fig. 4 is a flowchart illustrating a method for client-based model generation according to another exemplary embodiment. The flow shown in FIG. 4 is a detailed description of S108 "training a machine learning model based on the subset of positive samples, the subset of negative samples, and the subset of unclassified samples to generate a customer motor support model" in the flow shown in FIG. 2,
as shown in fig. 4, in S402, the first machine learning model is trained based on the positive sample subset and the negative sample subset, and a user classification model is generated. The first machine learning model may be a two-classification model for learning features of the positive and negative examples to classify the unknown sample.
In S404, positive and negative exemplar labels are determined by the user classification module for the historical users in the unclassified exemplar subset.
In S406, the historical users with positive sample labels are classified into the positive sample subset, and the historical users with negative sample labels are classified into the negative sample subset.
In S408, training a second machine learning model based on the updated subset of positive samples and the updated subset of negative samples generates the customer motor model.
The updated positive sample and the negative sample have more accurate sample data, and more accurate machine learning models can be obtained. The second machine learning model may be Logistic Regression (LR).
Those skilled in the art will appreciate that all or part of the steps implementing the above embodiments are implemented as computer programs executed by a CPU. When executed by the CPU, performs the functions defined by the above-described methods provided by the present disclosure. The program may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic or optical disk, or the like.
Furthermore, it should be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.
Fig. 5 is a block diagram illustrating a customer-generated model generation apparatus according to an example embodiment. As shown in fig. 5, the customer-created-model generating apparatus 50 includes: a data module 502, an aggregation module 504, a screening module 506, and a training module 508; the client-side model generation apparatus 50 may further include: a model module 510.
The data module 502 is configured to obtain financial data of a plurality of historical users, where the financial data includes a branch state;
the set module 504 is configured to classify the plurality of historical users into a positive sample user set and a negative sample user set according to the dynamic branch status; the dynamic branch state comprises a dynamic branch and an unmoved branch; the aggregation module comprises: the positive sample unit is used for dividing the moved historical users into a positive sample user set; and the negative sample unit is used for dividing the history users without moving into the negative sample user set.
The screening module 506 is configured to screen financial data of historical users in the positive sample user set and the negative sample user set to generate a positive sample subset, a negative sample subset, and an unclassified sample subset; the filtering module 506 is further configured to filter the financial data of the historical users in the positive sample user set and the negative sample user set respectively through an outlier monitoring algorithm to generate the positive sample subset, the negative sample subset, and the unclassified sample subset.
The screening module 506 includes: a first target unit for generating a first target hypersphere by an outlier monitoring algorithm and financial data of historical users in the positive sample user set; the first target unit includes: the equation subunit is used for generating a hypersphere equation set through financial data of the historical user in the positive sample set and an initial hypersphere equation in an abnormal value monitoring algorithm; the parameter subunit is used for determining a relaxation variable threshold and an optimization target; the solving subunit is used for solving the hypersphere equation based on the relaxation variable threshold and the optimization target to obtain the optimal solution of the hypersphere equation; and a first subunit for generating the first target hypersphere based on the optimal solution.
The screening module 506 includes: a first set unit to filter financial data of historical users in the positive sample user set based on the first target hyper-sphere to generate the positive sample subset and the unclassified sample subset. The first aggregation unit includes: the coordinate subunit is used for determining the position coordinates of the financial data of the historical users in the positive sample user set on a multi-dimensional coordinate system; a distance subunit configured to determine a center distance between the historical user and the first target hyper-sphere based on the position coordinates; a positive sample subunit, configured to add the historical user to the positive sample subset when the center-of-sphere distance is less than or equal to the radius of the first target hyper-sphere; and an unclassified subunit for adding the historical user to the unclassified subset of samples when the centroid distance is greater than the radius of the first target hypersphere.
The screening module 506 includes: the second target unit is used for generating a second target hypersphere through an abnormal value monitoring algorithm and financial data of historical users in the negative sample user set; and
the screening module 506 includes: a second set unit for filtering financial data of historical users in the negative sample user set based on the second target hyper-sphere to generate the negative sample subset and the unclassified sample subset.
The training module 508 is configured to train a machine learning model based on the subset of positive samples, the subset of negative samples, and the subset of unclassified samples to generate a customer motor model.
The training module 508 includes: the first model unit is used for training a first machine learning model based on the positive sample subset and the negative sample subset to generate a user classification model; a first computing unit for updating the positive and negative subsets of samples based on the unclassified subset of samples and the user classification model; the first calculation unit includes: a label subunit, configured to determine, by the user classification module, positive sample labels and negative sample labels for the historical users in the unclassified sample subset; and a classification subunit, configured to classify the historical users with positive sample labels into the positive sample subset; and classifying the historical users with the negative sample labels into the negative sample subset.
The training module 508 includes: and the second model unit is used for training a second machine learning model based on the updated positive sample subset and the negative sample subset to generate the customer dynamic support model.
The model module 510 is configured to input financial data of a current user into the client dynamic support model, and obtain dynamic support willingness strength of the current user; and generating a user policy for the current user based on the willingness to move and support intensity.
According to the client dynamic branch model generation device, the plurality of historical users are divided into a positive sample user set and a negative sample user set according to the dynamic branch state; screening financial data of historical users in the positive sample user set and the negative sample user set to generate a positive sample subset, a negative sample subset and an unclassified sample subset; and training the machine learning model based on the positive sample subset, the negative sample subset and the unclassified sample subset to generate a client dynamic support model, performing data processing on the existing positive and negative sample data to generate a more accurate positive and negative sample data set, and further training the machine learning model through the accurate positive and negative sample data set to generate a financial risk model.
FIG. 6 is a block diagram illustrating an electronic device in accordance with an example embodiment.
An electronic device 600 according to this embodiment of the disclosure is described below with reference to fig. 6. The electronic device 600 shown in fig. 6 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present disclosure.
As shown in fig. 6, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one storage unit 620, a bus 630 that connects the various system components (including the storage unit 620 and the processing unit 610), a display unit 640, and the like.
Wherein the storage unit stores program code executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present disclosure described in the above-mentioned electronic prescription flow processing method section of the present specification. For example, the processing unit 610 may perform the steps as shown in fig. 1, 3, 4.
The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 600' (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/0) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, as shown in fig. 7, the technical solution according to the embodiment of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the above method according to the embodiment of the present disclosure.
The software product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The computer readable medium carries one or more programs which, when executed by a device, cause the computer readable medium to perform the functions of: acquiring financial data of a plurality of historical users; dividing the plurality of historical users into a positive sample user set and a negative sample user set according to the dynamic branch state; screening financial data of historical users in the positive sample user set and the negative sample user set to generate a positive sample subset, a negative sample subset and an unclassified sample subset; and training a machine learning model based on the subset of positive samples, the subset of negative samples, and the subset of unclassified samples to generate a customer motor model.
Those skilled in the art will appreciate that the modules described above may be distributed in the apparatus according to the description of the embodiments, or may be modified accordingly in one or more apparatuses unique from the embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Exemplary embodiments of the present disclosure are specifically illustrated and described above. It is to be understood that the present disclosure is not limited to the precise arrangements, instrumentalities, or instrumentalities described herein; on the contrary, the disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A method for generating a customer-driven model is characterized by comprising the following steps:
acquiring financial data of a plurality of historical users, wherein the financial data comprises a dynamic branch state;
dividing the plurality of historical users into a positive sample user set and a negative sample user set according to the dynamic branch state;
screening financial data of historical users in the positive sample user set and the negative sample user set to generate a positive sample subset, a negative sample subset and an unclassified sample subset; and
training a machine learning model based on the subset of positive samples, the subset of negative samples, and the subset of unclassified samples to generate a customer motor model.
2. The method of claim 1, wherein the dynamic branch states comprise a dynamic branch and an unmoved branch;
the grouping the plurality of historical users into a positive sample user set and a negative sample user set according to the motor state comprises:
dividing the moved historical users into a positive sample user set; and
and classifying the historical users without moving branches into a negative sample user set.
3. The method of any of claims 1-2, wherein filtering financial data of historical users in the positive and negative sample user sets to generate positive, negative and unclassified sample subsets comprises:
filtering financial data of historical users in a positive sample user set and a negative sample user set respectively through an outlier monitoring algorithm to generate the positive sample subset, the negative sample subset and the unclassified sample subset.
4. The method of any of claims 1-3, wherein filtering financial data of historical users in a positive sample user set and a negative sample user set, respectively, by an outlier monitoring algorithm to generate the positive sample subset, the negative sample subset, and the unclassified sample subset comprises:
generating a first target hypersphere by an outlier monitoring algorithm and financial data of historical users in the positive sample user set; and
filtering financial data of historical users in the positive sample user set based on the first target hyper-sphere to generate the positive sample subset and the unclassified sample subset.
5. The method of any of claims 1-4, wherein generating the first target hypersphere by an outlier monitoring algorithm and financial data of historical users in the positive sample user set comprises:
generating a hypersphere equation set through financial data of historical users in the positive sample set and an initial hypersphere equation in an abnormal value monitoring algorithm;
determining a relaxation variable threshold and an optimization objective;
solving the hypersphere equation based on a relaxation variable threshold and an optimization target to obtain an optimal solution of the hypersphere equation; and
generating the first target hyper-sphere based on the optimal solution.
6. The method of any of claims 1-5, wherein filtering financial data of historical users in the positive sample user set based on the first target hypersphere to generate the positive sample subset and the unclassified sample subset comprises:
determining location coordinates of financial data of historical users in the positive sample user set on a multi-dimensional coordinate system;
determining a center of sphere distance of the historical user and the first target hyper-sphere based on the location coordinates;
adding the historical user into the positive sample subset when the center-of-sphere distance is less than or equal to the radius of the first target hyper-sphere; and
adding the historical user into the unclassified subset of samples when the center-of-sphere distance is greater than a radius of the first target hyper-sphere.
7. The method of any of claims 1-6, wherein filtering financial data of historical users in a positive sample user set and a negative sample user set, respectively, by an outlier monitoring algorithm to generate the positive sample subset, the negative sample subset, and the unclassified sample subset comprises:
generating a second target hypersphere by an outlier monitoring algorithm and financial data of historical users in the negative sample user set; and
filtering financial data of historical users in the negative sample user set based on the second target hypersphere to generate the negative sample subset and the unclassified sample subset.
8. A client-driven model generation apparatus, comprising:
the data module is used for acquiring financial data of a plurality of historical users, wherein the financial data comprises dynamic states;
the set module is used for dividing the plurality of historical users into a positive sample user set and a negative sample user set according to the dynamic branch state;
the screening module is used for screening the financial data of the historical users in the positive sample user set and the negative sample user set to generate a positive sample subset, a negative sample subset and an unclassified sample subset; and
a training module to train a machine learning model based on the subset of positive samples, the subset of negative samples, and the subset of unclassified samples to generate a customer motor model.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN201910921528.5A 2019-09-27 2019-09-27 Client dynamic support model generation method and device and electronic equipment Pending CN110796170A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910921528.5A CN110796170A (en) 2019-09-27 2019-09-27 Client dynamic support model generation method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910921528.5A CN110796170A (en) 2019-09-27 2019-09-27 Client dynamic support model generation method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN110796170A true CN110796170A (en) 2020-02-14

Family

ID=69439859

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910921528.5A Pending CN110796170A (en) 2019-09-27 2019-09-27 Client dynamic support model generation method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110796170A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112950353A (en) * 2021-02-08 2021-06-11 北京淇瑀信息科技有限公司 User strategy generation method and device based on 7-day movement support model and electronic equipment
CN113191812A (en) * 2021-05-12 2021-07-30 深圳索信达数据技术有限公司 Service recommendation method, computer device and computer-readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216436A (en) * 2008-01-03 2008-07-09 东华大学 Fabric flaw automatic detection method based on Support Vector data description theory
CN107798390A (en) * 2017-11-22 2018-03-13 阿里巴巴集团控股有限公司 A kind of training method of machine learning model, device and electronic equipment
CN109871954A (en) * 2018-12-24 2019-06-11 腾讯科技(深圳)有限公司 Training sample generation method, method for detecting abnormality and device
CN110135972A (en) * 2019-04-23 2019-08-16 上海淇玥信息技术有限公司 A kind of method, apparatus, system and recording medium for improving user and moving branch rate

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216436A (en) * 2008-01-03 2008-07-09 东华大学 Fabric flaw automatic detection method based on Support Vector data description theory
CN107798390A (en) * 2017-11-22 2018-03-13 阿里巴巴集团控股有限公司 A kind of training method of machine learning model, device and electronic equipment
CN109871954A (en) * 2018-12-24 2019-06-11 腾讯科技(深圳)有限公司 Training sample generation method, method for detecting abnormality and device
CN110135972A (en) * 2019-04-23 2019-08-16 上海淇玥信息技术有限公司 A kind of method, apparatus, system and recording medium for improving user and moving branch rate

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112950353A (en) * 2021-02-08 2021-06-11 北京淇瑀信息科技有限公司 User strategy generation method and device based on 7-day movement support model and electronic equipment
CN113191812A (en) * 2021-05-12 2021-07-30 深圳索信达数据技术有限公司 Service recommendation method, computer device and computer-readable storage medium
CN113191812B (en) * 2021-05-12 2024-02-02 深圳索信达数据技术有限公司 Service recommendation method, computer equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
US11455493B2 (en) Explanations for artificial intelligence based recommendations
AU2021232839B2 (en) Updating Attribute Data Structures to Indicate Trends in Attribute Data Provided to Automated Modelling Systems
CN110781922A (en) Sample data generation method and device for machine learning model and electronic equipment
US10769009B2 (en) Root cause analysis for correlated development and operations data
US11176508B2 (en) Minimizing compliance risk using machine learning techniques
US10592368B2 (en) Missing values imputation of sequential data
CN110995459B (en) Abnormal object identification method, device, medium and electronic equipment
US20190354849A1 (en) Automatic data preprocessing
CN110751190A (en) Financial risk model generation method and device and electronic equipment
Saito et al. Doubly robust prediction and evaluation methods improve uplift modeling for observational data
CN110796482A (en) Financial data classification method and device for machine learning model and electronic equipment
CN112508199A (en) Feature selection method, device and related equipment for cross-feature federated learning
CN111191893A (en) Wind control text processing method and device and electronic equipment
CN110796170A (en) Client dynamic support model generation method and device and electronic equipment
CN105335595A (en) Feeling-based multimedia processing
CN112486784A (en) Method, apparatus and medium for diagnosing and optimizing data analysis system
CN110796172A (en) Sample label determination method and device for financial data and electronic equipment
JP7306432B2 (en) Information processing method, information processing device and program
US20230169358A1 (en) Continuous knowledge graph for links and weight predictions
CN113568739B (en) User resource quota allocation method and device and electronic equipment
US11238955B2 (en) Single sample genetic classification via tensor motifs
CN110796262B (en) Test data optimization method and device of machine learning model and electronic equipment
CN111556998A (en) Transfer learning and domain adaptation using distributable data models
US20220358397A1 (en) Moving decision boundaries in machine learning models
US20230122472A1 (en) Hybrid Techniques for Quality Estimation of a Decision-Making Policy in a Computer System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination