CN112037005B - Fusion method and device of score cards, computer equipment and storage medium - Google Patents

Fusion method and device of score cards, computer equipment and storage medium Download PDF

Info

Publication number
CN112037005B
CN112037005B CN202010705871.9A CN202010705871A CN112037005B CN 112037005 B CN112037005 B CN 112037005B CN 202010705871 A CN202010705871 A CN 202010705871A CN 112037005 B CN112037005 B CN 112037005B
Authority
CN
China
Prior art keywords
card
scoring
fusion
machine learning
cards
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010705871.9A
Other languages
Chinese (zh)
Other versions
CN112037005A (en
Inventor
黄馨
李怡文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suning Financial Technology Nanjing Co Ltd
Original Assignee
Suning Financial Technology Nanjing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Financial Technology Nanjing Co Ltd filed Critical Suning Financial Technology Nanjing Co Ltd
Priority to CN202010705871.9A priority Critical patent/CN112037005B/en
Publication of CN112037005A publication Critical patent/CN112037005A/en
Application granted granted Critical
Publication of CN112037005B publication Critical patent/CN112037005B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a fusion method and device of score cards, computer equipment and a storage medium, and belongs to the technical field of risk control. The method comprises the following steps: acquiring an expert rating card and a machine learning rating card, wherein the characteristic variables used by the expert rating card are different from the characteristic variables used by the machine learning rating card; respectively fusing the expert scoring cards and the machine learning scoring cards on a plurality of weight sets to obtain a plurality of fused scoring cards; and verifying the plurality of fusion scoring cards by using the verification sample, and screening out the optimal fusion scoring card. Compared with the prior art that the variables selected by experts are directly added into the machine learning model, the process is repeated until the variables meeting the model conditions are selected, the method and the device can not only avoid the problem of large repeated workload, but also effectively avoid the problem of sample data deviation, and improve the model prediction capability.

Description

Fusion method and device of score cards, computer equipment and storage medium
Technical Field
The invention relates to the technical field of risk control, in particular to a fusion method and device of score cards, computer equipment and a storage medium.
Background
In the technical field of risk control, taking credit risk control in the financial field as an example, a financial institution generally needs to construct a wind control model to evaluate credit risk of a business object, and when the credit risk evaluation is performed on the business object, a scoring card can be used in the wind control model to evaluate credit of the business object, wherein the higher the score is, the lower the corresponding credit risk is, and vice versa.
The scoring card can be divided into an expert scoring card and a machine scoring card, and can be directly modeled by machine learning under the condition that a large amount of sample data is accumulated in a product. But when a product does not have enough data to model, or a new product does not have data to accumulate (unsupervised learning), an expert scoring card is often selected for use. Whether a machine learning scoring card or an expert card, they have certain limitations when used alone. Machine learning has strong dependence on data, once a sample has deviation, the prediction capability of the model is directly influenced, meanwhile, modeling data generally has hysteresis, cannot timely react to recent strategies and front-end business adjustment, and the prediction capability of the model to the current client after online is reduced compared with that during modeling. The expert scoring card solves the problem of data to a certain extent, has strong interpretability, is simpler and more effective in processing character type fields and discontinuous type numerical value fields, but has strong subjectivity and no reproducibility.
At present, many wind control scoring cards integrate expert experience into machine learning, and a common method is to directly add variables selected by experts into a machine learning model, and repeat the process until variables meeting model conditions are selected. Taking a logic regression as an example, directly adding a new variable into a regression expression may cause the significance of a certain variable to be reduced, and directly reflects that the p value of the variable becomes larger and exceeds a threshold value, the added variable needs to be replaced until a proper variable combination is selected.
Disclosure of Invention
In order to solve the problems mentioned in the background art, embodiments of the present invention provide a score card fusion method, apparatus, computer device and storage medium.
The embodiment of the invention provides the following specific technical scheme:
in a first aspect, a fusion method of score cards is provided, and the method includes:
acquiring an expert scoring card and a machine learning scoring card, wherein the characteristic variables used by the expert scoring card are different from the characteristic variables used by the machine learning scoring card;
fusing the expert scoring cards and the machine learning scoring cards on a plurality of weight sets respectively to obtain a plurality of fused scoring cards;
and verifying the plurality of fusion scoring cards by using a verification sample to screen out the optimal fusion scoring card.
Further, the acquiring an expert rating card and a machine learning rating card includes:
constructing a training sample set and a characteristic variable library, wherein the training sample set comprises positive samples and negative samples;
screening a plurality of characteristic variables from the characteristic variable library according to an expert experience method to create an expert rating card;
all the screened characteristic variables are removed from the characteristic variable library to obtain residual characteristic variables;
performing machine learning on the training sample set according to the remaining feature variables to create a machine learning score card.
Further, after the step of removing all the screened feature variables from the feature variable library to obtain remaining feature variables, the method further includes:
carrying out correlation test on all the screened characteristic variables and each residual characteristic variable;
according to a correlation test result, eliminating the feature variables related to the screened feature variables from the remaining feature variables to form a feature variable set for constructing a machine learning score card;
the performing machine learning on the training sample set according to the remaining feature variables to create a machine learning score card, comprising:
and performing machine learning on the training sample set according to the characteristic variable set to construct and obtain a machine learning score card.
Further, each of the weight combinations includes a first weight corresponding to the expert scorecard and a second weight corresponding to the machine learning scorecard, numerical values of the first weights in the plurality of weight combinations sequentially decrease by a predetermined step size, numerical values of the second weights sequentially increase by a predetermined step size, and a sum of the numerical values of the first weights and the numerical values of the second weights in the same weight combination is 1.
Further, the verifying the plurality of fusion score cards by using the verification sample to screen out an optimal fusion score card includes:
respectively inputting the characteristic variables of the verification samples into a plurality of fusion scoring cards to calculate a preset index corresponding to each fusion scoring card, wherein the preset index comprises at least one of a KS value, a Gini coefficient value and an AUC value;
and screening out the optimal fusion scoring card from the plurality of fusion scoring cards according to the preset index corresponding to each fusion scoring card.
Further, the method further comprises:
after the optimal fusion scoring card is deployed on line, if the monitored population stability index exceeds a preset threshold value, determining a characteristic variable with the largest variation in the fusion scoring card;
and adjusting the optimal numerical value of the first weight and the optimal numerical value of the second weight in the fusion score card according to the characteristic variable with the maximum variation.
In a second aspect, a fusion device for scoring cards is provided, the device comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an expert scoring card and a machine learning scoring card, and the characteristic variables used by the expert scoring card are different from the characteristic variables used by the machine learning scoring card;
the fusion module is used for respectively fusing the expert scoring cards and the machine learning scoring cards on a plurality of weight sets to obtain a plurality of fusion scoring cards;
and the verification module is used for verifying the fusion scoring cards by using a verification sample and screening out the optimal fusion scoring card.
Further, the obtaining module comprises:
the construction submodule is used for constructing a training sample set and a characteristic variable library, wherein the training sample set comprises positive samples and negative samples;
the first creating submodule is used for screening a plurality of characteristic variables from the characteristic variable library according to an expert experience method to create an expert scoring card;
and the second creating submodule is used for removing all the screened characteristic variables from the characteristic variable library to obtain residual characteristic variables, and performing machine learning on the training sample set according to the residual characteristic variables to create a machine learning score card.
Further, the second creating sub-module is specifically configured to:
carrying out correlation test on all the screened characteristic variables and each residual characteristic variable;
according to a correlation test result, eliminating the feature variables related to the screened feature variables from the remaining feature variables to form a feature variable set for constructing a machine learning score card;
and performing machine learning on the training sample set according to the characteristic variable set to construct and obtain a machine learning score card.
Further, each of the weight combinations includes a first weight corresponding to the expert scorecard and a second weight corresponding to the machine learning scorecard, numerical values of the first weights in the plurality of weight combinations sequentially decrease by a predetermined step size, numerical values of the second weights sequentially increase by a predetermined step size, and a sum of the numerical values of the first weights and the numerical values of the second weights in the same weight combination is 1.
Further, the verification module is specifically configured to:
respectively inputting the characteristic variables of the verification samples into a plurality of fusion scoring cards to calculate a preset index corresponding to each fusion scoring card, wherein the preset index comprises at least one of a KS value, a Gini coefficient value and an AUC value;
and screening out the optimal fusion score card from the plurality of fusion score cards according to the preset index corresponding to each fusion score card.
Further, the apparatus further comprises an adjustment module, the adjustment module is specifically configured to:
after the optimal fusion scoring card is deployed on line, if the monitored population stability index exceeds a preset threshold value, determining a characteristic variable with the largest variation in the fusion scoring card;
and adjusting the optimal numerical value of the first weight and the optimal numerical value of the second weight in the fusion score card according to the characteristic variable with the maximum variation.
In a third aspect, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the following steps when executing the computer program:
acquiring an expert rating card and a machine learning rating card, wherein the characteristic variables used by the expert rating card are different from the characteristic variables used by the machine learning rating card;
fusing the expert scoring cards and the machine learning scoring cards on a plurality of weight sets respectively to obtain a plurality of fused scoring cards;
and verifying the plurality of fusion scoring cards by using a verification sample to screen out the optimal fusion scoring card.
In a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring an expert rating card and a machine learning rating card, wherein the characteristic variables used by the expert rating card are different from the characteristic variables used by the machine learning rating card;
fusing the expert scoring cards and the machine learning scoring cards on a plurality of weight sets respectively to obtain a plurality of fused scoring cards;
and verifying the plurality of fusion scoring cards by using a verification sample to screen out the optimal fusion scoring card.
The embodiment of the invention provides a fusion method and device of score cards, computer equipment and a storage medium, wherein the expert score cards and machine learning score cards are created, and are respectively fused on a plurality of weight groups, verified and screened, so that expert experience is fused with machine learning.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a fusion method of score cards according to an embodiment of the present invention;
FIG. 2 is a detailed flowchart of step S1 of the method shown in FIG. 1;
FIG. 3 is a detailed flowchart of step S3 of the method shown in FIG. 1;
fig. 4 is a structural diagram of a fusion device of score cards according to an embodiment of the present invention;
fig. 5 is an internal structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It is to be understood that, unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, what is meant is "including, but not limited to".
Furthermore, in the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified.
As described in the background, in the prior art, in the process of incorporating the expert experience into the machine learning, it is common practice to directly add the variables selected by the expert into the machine learning model, and repeat this process until the variables meeting the model conditions are selected. In order to ensure that model indexes meet requirements, the method has large repeated workload, and variables selected by experts do not necessarily have strong prediction capability (generally expressed as low IV value) on selected modeling data, and the variables of the experts are measured by the variable requirements of a machine learning model, so that certain contradictions exist. Therefore, the embodiment of the invention provides a method for fusing scoring cards, which is characterized in that an expert scoring card and a machine learning scoring card are created, the expert scoring card and the machine learning scoring card are respectively fused on a plurality of weight groups, and verification and screening are carried out, so that expert experience and machine learning are fused.
It should be noted that the method provided by the embodiment of the present invention may be applied to various risk control technologies, for example, credit risk control in the financial field, and various business security risk control in the internet field, such as number stealing, wool party, malicious ordering, malicious payment, and the like.
Fig. 1 is a flowchart of a method for fusing score cards according to an embodiment of the present invention, where the method is applied to a device for fusing score cards, where the device may be configured in any computer device, and the computer device may be a server, and the server may be an independent server, or a server cluster formed by multiple servers.
As shown in fig. 1, the fusion method of score cards provided in the embodiment of the present invention includes steps S1 to S3:
step S1, acquiring an expert scoring card and a machine learning scoring card, wherein the characteristic variables used by the expert scoring card are different from the characteristic variables used by the machine learning scoring card.
The expert scoring card and the machine learning scoring card respectively comprise scores corresponding to different value intervals of different characteristic variables. As shown in table 1, the exemplary contents of a rating card provided in the embodiment of the present invention are shown.
Table 1: the contents of the scoring card:
Figure RE-GDA0002706333550000071
Figure RE-GDA0002706333550000081
in this scoring card, there are two feature variables in common: age and sex. For a 39 year old male, the corresponding age and gender scores were 35 and 45 points, respectively, and the final score was 80.
Specifically, as shown in fig. 2, the implementation process of step S1 may include steps S11 to S14:
s11, a training sample set and a characteristic variable library are constructed, wherein the training sample set comprises positive samples and negative samples.
In this embodiment, before creating the expert scoring card and the machine learning scoring card, the computer device needs to establish a training sample set and a feature variable library.
When the training sample set is established, the samples for modeling may be for the business object, for example, the samples represent characteristic information of the business object, and the samples may be generally divided into two types, namely positive samples and negative samples, the negative samples are samples marked as overdue payment or fraud, and the positive samples are samples not marked as overdue payment or fraud. In order to better distinguish the positive and negative samples and generalize the characteristics of the negative samples, the number of the negative samples is usually required to be not less than a preset number (for example, not less than 1000), or not less than a preset proportion (for example, not less than 10%) of the whole samples. Meanwhile, in order to ensure the effect of machine learning scoring card, a sample with a preset time span needs to be selected when a modeling sample is selected, so as to avoid the condition of generating data deviation. For example, taking the application of loan as an example, the number of people applying for loan is increased dramatically due to the influence of the country returning in spring festival, wherein the number of people applying for loan includes a large number of rural population, and the data of the passenger group is changed greatly compared with that in normal times, if the passenger group applying in february is only selected as a modeling sample, the prediction effect of the model is reduced, and the prediction capability of the passenger group applying in other months is reduced. To avoid such sample bias, samples are taken over a time span of at least three months.
The plurality of characteristic variables in the characteristic variable library may include a primary variable and a derivative variable generated by a preset conversion function based on each primary variable, and the derivative variable conversion functions corresponding to different primary variables are different. The raw variables are specifically characteristic variables that can be directly extracted from raw data, the raw data may be user information for purchasing a target product and/or product information of the target product, and the raw variables may include basic characteristic variables such as gender, age, academic history, occupation, income, registration time, and various behavior characteristic variables. When constructing the feature variable library, the interpretability of the feature variables is ensured. The scoring card has the greatest advantage of strong interpretability, and the change of each characteristic variable can be reflected in the final scoring visually. Therefore, complicated processing logic is avoided when constructing the characteristic variables, and the meaning of each variable is ensured to be clear. Taking the pedestrian report derived variable as an example, dividing the number of reports for loan approval inquiry credit investigation in the last 6 months by the number of reports for loan approval inquiry credit investigation in the last 12 months can reflect the degree of the recent loan intention of the applicant compared with the loan intention in the last year, and the higher the proportion, the stronger the recent loan intention of the applicant is. The two variables are also derived, if the two variables are multiplied, the obtained variables lose business meaning, and even if the variables are applied to a model, the changes of the variables cannot be explained. In practical application, the number of the characteristic variables in the characteristic variable library is not less than 50, and in addition, in order to facilitate the selection of the characteristic variables in the subsequent construction of the expert scoring card, all the characteristic variables in the characteristic variable library can be classified, and the total category does not exceed 10 categories.
And S12, screening a plurality of characteristic variables from the characteristic variable library according to an expert experience method to create an expert rating card.
Specifically, feature variables screened from a feature variable library according to an expert experience method are obtained, the screened feature variables are used as experience variables, scores corresponding to different value intervals of the experience variables are obtained, and an expert score card is created.
In practical application, according to business rules, selecting experience variables from different types of the feature variables in the feature variable library through expert experience, obtaining scores corresponding to different value intervals of the experience variables, and creating an expert score card. Correlation test is required to be carried out on the selected empirical variables, the correlation of independent variables in the multivariate regression formula is high, namely, multiple collinearity, which can cause the variance of parameter estimation values to increase, wherein the 'parameter' refers to a characteristic variable input to the wind control model, so that the parameter is unstable, and the regression formula can cause the following three consequences: (1) Even if there is a significant relationship between the predictor variables and the response, the coefficients may not appear significant; (2) The coefficients of highly correlated predictors vary widely between samples; (3) Removing any highly correlated terms from the model will greatly affect the estimated coefficients of the other highly correlated terms. In order to avoid the influence of multiple collinearity, the Variance Inflation Factor (VIF) of the parameters can be controlled not to exceed a preset VIF threshold value or the correlation between the parameters is not higher than a preset correlation threshold value. For example, when the variance expansion factor of all parameters is 1, it means that there is no multiple collinearity problem, and some VIFs are greater than 1, there is some collinearity, but as long as a certain threshold is not exceeded, the collinearity does not affect the result of the model. Generally, if there are fewer variables to choose from, in order to have enough variables to model, the VIF threshold is relaxed appropriately, e.g., the threshold is set to 4, so long as the VIF of the parameter does not exceed 4, which can be used for modeling. If the feature variable library is large in size, for example, 500 variables, the choice is large, and the threshold value may be adjusted downward, for example, set to 2. Similarly, the correlation threshold between parameters can also be adjusted according to the size of the feature variable library, and usually the correlation threshold is set to be below 0.75, and when there are enough variables, the correlation threshold can be adjusted to be 0.5 or even lower.
And S13, removing all the screened characteristic variables from the characteristic variable library to obtain residual characteristic variables.
Specifically, after the expert rating card is created, the experience variables used by the expert rating card need to be removed from the feature variable library, and the remaining feature variables in the feature variable library are determined as knowledge variables for creating the machine learning rating card.
And S14, performing machine learning on the training sample set according to the residual characteristic variables to create a machine learning score card.
Specifically, machine learning algorithms such as logistic regression, random forest or xgboost are used to train preset training sample data to obtain scores corresponding to different value intervals of each characteristic variable, and then a machine learning score card is constructed. Machine learning is divided into supervised learning and unsupervised learning according to whether training sample data has labels or not.
In a preferred example, after the step of removing all screened feature variables from the feature variable library to obtain remaining feature variables, the method further includes:
carrying out correlation test on all the screened characteristic variables and each residual characteristic variable;
according to the correlation test result, eliminating the feature variables related to the screened feature variables from the remaining feature variables to form a feature variable set for constructing a machine learning score card;
the step S14 of performing machine learning on the training sample set according to the remaining characteristic variables to create a machine learning score card may include:
and performing machine learning on the training sample set according to the characteristic variable set to construct and obtain a machine learning score card.
In this embodiment, correlation tests are performed on the remaining feature variables in the feature variable library and the empirical variables used by the expert scoring cards. Similarly, in order to avoid multiple collinearity, the correlation between all the empirical variables and each of the remaining characteristic variables is calculated, screening is performed according to a preset correlation threshold, the characteristic variables strongly correlated with the empirical variables are eliminated, and the final remaining characteristic variables in the characteristic variable library are used as knowledge variables to form a characteristic variable set for creating the machine learning score card. And modeling is carried out on the training sample set constructed in the S11, and a machine learning scoring card is constructed.
The logistic regression algorithm is taken as an example to illustrate, after the characteristic variables in the characteristic variable set of the machine learning score card are subjected to rough binning and WOE conversion, a model is trained by using a stepwise method of the logistic regression algorithm, namely, the variables are gradually introduced one by using a stepwise regression idea. Firstly, each characteristic variable in the characteristic variable set of the machine learning score card is used as an alternative characteristic variable, the p values of all the alternative characteristic variables under scoretest are calculated, the characteristic variable with the minimum p value is selected and substituted into the logistic regression model, whether a model screening condition is met or not is checked, and the model screening condition can be set to be that the model coefficient is negative, the p value is smaller than 0.05 and the like. And if the model screening condition is met, determining the characteristic variable as a model entering variable. And secondly, repeating the previous steps, sequencing p values of the scoretest of the remaining alternative characteristic variables, selecting the variable corresponding to the minimum value, substituting the variable into the model, and checking whether the variable meets the model screening condition. Repeating the steps until no selectable variable exists, and obtaining the final logistic regression model.
The result of the logistic regression model is a probability value of an interval of 0 to 1, and conversion into a positive integer is required as a final score, wherein the conversion method comprises the following steps: the predicted log (odds) is scaled, plus an offset: score = factor log (odds) + offset. After the reference score and the reference odds are confirmed, the corresponding values of the factor and the offset can be calculated respectively. In addition, the scores of the variables are calibrated = -1 variable logistic regression coefficient WOE factor, and the score of each variable is added with an offset so that the minimum value of the score of each variable is 0. Thus, a logistic regression score card can be obtained.
And S2, fusing the expert scoring cards and the machine learning scoring cards on a plurality of weight sets respectively to obtain a plurality of fused scoring cards.
Each weight combination comprises a first weight corresponding to the expert scoring card and a second weight corresponding to the machine learning scoring card, numerical values of the first weights in the multiple weight combinations are sequentially decreased by a preset step length, numerical values of the second weights are sequentially increased by the preset step length, and the sum of the numerical values of the first weights and the numerical values of the second weights in the same weight combination is 1. The predetermined step size can be set according to actual needs, for example, according to the number of weight combinations, and preferably, when the number of weight combinations is 10, the predetermined step size can be set to 0.1.
Specifically, the expert scoring cards and the machine learning scoring cards are weighted and summed according to the first weight and the second weight in each weight combination to form a plurality of fusion scoring cards.
In practical application, when the expert scoring card and the machine learning scoring card are fused, the expert scoring card and the machine learning scoring card can be subjected to weighted summation, and the weight corresponding to the expert scoring card is marked as a first weight r, wherein 0 is 0<r<Marking the weight corresponding to the machine learning scoring card as a second weight (1-r), and combining the expert scoring card and the machine learning scoring card by N weights [ r [ -r ] 1 ,1-r 1 ]、[r 2 ,1-r 2 ]……[r N ,1-r N ]According to a scoring card fusion formula: and r expert scoring card + (1-r) machine learning scoring card, and obtaining N fusion scoring cards after weighted summation.
And S3, verifying the plurality of fusion scoring cards by using the verification sample, and screening out the optimal fusion scoring card.
The verification sample is different from the training sample, and the verification sample has a preset time span, and the preset time span can be set according to actual needs, for example, set to 6 months.
Specifically, as shown in fig. 3, the implementation process of step S3 may include steps S31 to S32:
and S31, respectively inputting the characteristic variables of the verification samples into a plurality of fusion scoring cards to calculate a preset index corresponding to each fusion scoring card.
Wherein the preset index may include at least one of a KS value, a Gini coefficient value, and an AUC value.
The KS value refers to the maximum difference of the cumulative distribution of good and bad populations, and the higher the KS is, the stronger the sequencing capability is; the Gini coefficient value represents a difference between the cumulative distribution of the bad account numbers and the random distribution, and a high Gini coefficient value means a large difference between good/bad. AUC (Area Under Curve) refers to the Area enclosed by the coordinate axes Under the ROC Curve, and the larger the AUC value is, the better the model effect is.
And S32, screening the optimal fusion score card from the multiple fusion score cards according to the preset index corresponding to each fusion score card.
When the preset index includes a KS value, a Gini coefficient value, and an AUC value, the implementation process of step S32 may include:
sorting each fusion scoring card in a descending order according to the KS value corresponding to each fusion scoring card to obtain a first sorting list;
selecting M fusion scoring cards ranked at the top M positions in the first ranking list, and performing descending ranking on the M fusion scoring cards according to Gini coefficient values corresponding to the fusion scoring cards to obtain a second ranking list;
and in the second ranking list, selecting K fusion score cards ranked at the top K, and determining the fusion score card with the largest AUC value in the K fusion score cards as the optimal fusion score card, wherein K is a positive integer less than M.
Notably, when the value of KS is abnormally high, further analysis of the correctness of the results is required. Models with KS values above 0.2 are generally available, but above 0.6 further analysis is required.
In a preferred example, after step S3 is performed, the method may further include:
after the optimal fusion scoring card is deployed on line, if the monitored population stability index exceeds a preset threshold value, determining a characteristic variable with the largest variation in the fusion scoring card;
and adjusting the numerical value of the first weight and the numerical value of the second weight in the optimal fusion score card according to the characteristic variable with the maximum variation.
In this embodiment, after the optimal fusion score card is deployed online, since the customer group data changes in real time, the score card needs to be continuously tracked and monitored. The most important monitoring index is the application Population Stability Index (PSI), which measures the change of the current sample and the sample during modeling, and the lower the index is, the smaller the change of the sample is. When the PSI exceeds a predetermined value (e.g., 0.25), it indicates that a large change has occurred in the population, and further analysis of the parameter is required, and for the parameter with the large change, it is considered to be replaced. Parameters in the expert scoring card can be directly replaced by modeling experts, parameters in the machine learning scoring card can be subjected to modeling process again, and new parameters are selected. However, the process period is long, and if the model needs to be modified in a short time, the influence of the model caused by knowledge variable deviation can be reduced by increasing the expert scoring card weight and reducing the machine learning scoring card weight.
In addition, the expert scoring card has stronger mobility, so that the expert scoring card can be adjusted at any time according to service change. For example, a variable originally used in the rating card comes from an external data company, which is called off by a regulatory body to shut down data source traffic due to compliance issues. For the sudden situation, a machine learning score card is created again and then deployed online, a certain time is needed, and the current service is influenced. However, if the expert card parameters and weights are adjusted, the influence can be greatly buffered, and more time is obtained for optimizing the scoring card model.
The embodiment of the invention provides a fusion method of score cards, which is characterized in that an expert score card and a machine learning score card are created, the expert score card and the machine learning score card are fused on a plurality of weight groups respectively, and verification and screening are carried out, so that expert experience is fused with machine learning.
Fig. 4 is a block diagram of a fusion device of rating cards according to an embodiment of the present invention, and as shown in fig. 4, the fusion device may include:
an obtaining module 41, configured to obtain an expert scoring card and a machine learning scoring card, where a feature variable used by the expert scoring card is different from a feature variable used by the machine learning scoring card;
the fusion module 42 is used for respectively fusing the expert scoring cards and the machine learning scoring cards on a plurality of weight combinations to obtain a plurality of fusion scoring cards;
and the verification module 43 is configured to verify the multiple fusion scoring cards by using the verification sample, and screen out an optimal fusion scoring card.
Further, the obtaining module 41 includes:
the constructing submodule 411 is used for constructing a training sample set and a characteristic variable library, wherein the training sample set comprises a positive sample and a negative sample;
a first creating sub-module 412, configured to screen a plurality of feature variables from the feature variable library according to an expert experience method to create an expert rating card;
and the second creating submodule 413 is used for removing all the screened characteristic variables from the characteristic variable library to obtain residual characteristic variables, and performing machine learning on the training sample set according to the residual characteristic variables to create a machine learning score card.
Further, the second creating submodule 413 is specifically configured to:
carrying out correlation test on all the screened characteristic variables and each residual characteristic variable;
according to the correlation test result, rejecting the feature variables related to the screened feature variables from the remaining feature variables to form a feature variable set for constructing a machine learning score card;
and performing machine learning on the training sample set according to the characteristic variable set to construct and obtain a machine learning score card.
Further, each weight combination comprises a first weight corresponding to the expert scoring card and a second weight corresponding to the machine learning scoring card, numerical values of the first weights in the multiple weight combinations are sequentially decreased by a preset step length, numerical values of the second weights are sequentially increased by a preset step length, and the sum of the numerical values of the first weights and the numerical values of the second weights in the same weight combination is 1.
Further, the verification module 43 is specifically configured to:
respectively inputting the characteristic variables of the verification samples into a plurality of fusion score cards to calculate a preset index corresponding to each fusion score card, wherein the preset index comprises at least one of a KS value, a Gini coefficient value and an AUC value;
and screening out the optimal fusion scoring card from the plurality of fusion scoring cards according to the preset index corresponding to each fusion scoring card.
Further, the apparatus further comprises an adjusting module 44, and the adjusting module 44 is specifically configured to:
after the optimal fusion scoring card is deployed on line, if the monitored population stability index exceeds a preset threshold, determining a characteristic variable with the largest variation in the fusion scoring card;
and adjusting the numerical value of the first weight and the numerical value of the second weight in the optimal fusion score card according to the characteristic variable with the maximum variation.
The fusion device for the rating cards provided by the embodiment of the invention and the fusion method for the rating cards provided by the embodiment of the invention belong to the same invention concept, can execute the fusion method for the rating cards provided by the embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the fusion method for the rating cards. For the technical details that are not described in detail in this embodiment, reference may be made to the fusion method of the score card provided in this embodiment of the present invention, which is not described herein again.
Fig. 5 is an internal structural diagram of a computer device according to an embodiment of the present invention. The computer device may be a server, and its internal structure diagram may be as shown in fig. 5. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a score card fusion method.
Those skilled in the art will appreciate that the configuration shown in fig. 5 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the computing devices to which aspects of the present invention may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
The embodiment of the invention provides computer equipment, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the computer program to realize the following steps:
acquiring an expert scoring card and a machine learning scoring card, wherein the characteristic variables used by the expert scoring card are different from the characteristic variables used by the machine learning scoring card;
respectively fusing the expert scoring cards and the machine learning scoring cards on a plurality of weight sets to obtain a plurality of fused scoring cards;
and verifying the plurality of fusion scoring cards by using the verification sample, and screening out the optimal fusion scoring card.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the following steps:
acquiring an expert rating card and a machine learning rating card, wherein the characteristic variables used by the expert rating card are different from the characteristic variables used by the machine learning rating card;
respectively fusing the expert scoring cards and the machine learning scoring cards on a plurality of weight sets to obtain a plurality of fused scoring cards;
and verifying the plurality of fusion scoring cards by using the verification sample, and screening out the optimal fusion scoring card.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct Rambus Dynamic RAM (DRDRAM), and Rambus Dynamic RAM (RDRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is specific and detailed, but not to be understood as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (9)

1. A fusion method of score cards, the method comprising:
acquiring an expert rating card and a machine learning rating card, wherein the characteristic variables used by the expert rating card are different from the characteristic variables used by the machine learning rating card;
fusing the expert scoring cards and the machine learning scoring cards on a plurality of weight sets respectively to obtain a plurality of fused scoring cards;
verifying the plurality of fusion scoring cards by using a verification sample to screen out an optimal fusion scoring card;
wherein each weight combination comprises a first weight corresponding to the expert scoring card and a second weight corresponding to the machine learning scoring card;
after the optimal fusion scoring card is deployed on line, if the monitored population stability index exceeds a preset threshold value, determining a characteristic variable with the largest variation in the fusion scoring card;
and adjusting the optimal numerical value of the first weight and the optimal numerical value of the second weight in the fusion scoring card according to the characteristic variable with the maximum variation.
2. The method of claim 1, wherein obtaining an expert scoring card and a machine learning scoring card comprises:
constructing a training sample set and a characteristic variable library, wherein the training sample set comprises positive samples and negative samples;
screening a plurality of characteristic variables from the characteristic variable library according to an expert experience method to create an expert rating card;
removing all the screened characteristic variables from the characteristic variable library to obtain residual characteristic variables;
performing machine learning on the training sample set according to the remaining feature variables to create a machine learning score card.
3. The method according to claim 2, wherein after the step of removing all the selected feature variables from the feature variable library to obtain remaining feature variables, the method further comprises:
carrying out correlation test on all the screened characteristic variables and each residual characteristic variable;
according to a correlation test result, eliminating the feature variables related to the screened feature variables from the remaining feature variables to form a feature variable set for constructing a machine learning score card;
the performing machine learning on the training sample set according to the remaining feature variables to create a machine learning score card, comprising:
and performing machine learning on the training sample set according to the characteristic variable set to construct and obtain a machine learning score card.
4. The method according to any one of claims 1 to 3, wherein the values of the first weights in a plurality of weight combinations are sequentially decreased by a predetermined step size, the values of the second weights are sequentially increased by a predetermined step size, and the sum of the values of the first weights and the values of the second weights in the same weight combination is 1.
5. The method according to claim 4, wherein the validating the plurality of fusion score cards using a validation sample to screen out an optimal fusion score card comprises:
respectively inputting the characteristic variables of the verification samples into a plurality of fusion scoring cards to calculate a preset index corresponding to each fusion scoring card, wherein the preset index comprises at least one of a KS value, a Gini coefficient value and an AUC value;
and screening out the optimal fusion scoring card from the plurality of fusion scoring cards according to the preset index corresponding to each fusion scoring card.
6. A fusion device of score cards, the device comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an expert scoring card and a machine learning scoring card, and the characteristic variables used by the expert scoring card are different from the characteristic variables used by the machine learning scoring card;
the fusion module is used for respectively fusing the expert scoring cards and the machine learning scoring cards on a plurality of weight sets to obtain a plurality of fusion scoring cards;
the verification module is used for verifying the fusion scoring cards by using a verification sample to screen out the optimal fusion scoring card;
wherein each weight combination comprises a first weight corresponding to the expert scoring card and a second weight corresponding to the machine learning scoring card;
the fusion device further comprises an adjusting module, wherein the adjusting module is used for: after the optimal fusion scoring card is deployed on line, if the monitored population stability index exceeds a preset threshold value, determining a characteristic variable with the largest variation in the fusion scoring card;
and adjusting the optimal numerical value of the first weight and the optimal numerical value of the second weight in the fusion score card according to the characteristic variable with the maximum variation.
7. The apparatus of claim 6, wherein the obtaining module comprises:
the construction submodule is used for constructing a training sample set and a characteristic variable library, wherein the training sample set comprises positive samples and negative samples;
the first creating submodule is used for screening a plurality of characteristic variables from the characteristic variable library according to an expert experience method to create an expert scoring card;
and the second creating submodule is used for removing all the screened characteristic variables from the characteristic variable library to obtain residual characteristic variables, and performing machine learning on the training sample set according to the residual characteristic variables to create a machine learning score card.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the fusion method of score cards according to any one of claims 1 to 5 when executing the computer program.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the fusion method of score cards according to any one of claims 1 to 5.
CN202010705871.9A 2020-07-21 2020-07-21 Fusion method and device of score cards, computer equipment and storage medium Active CN112037005B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010705871.9A CN112037005B (en) 2020-07-21 2020-07-21 Fusion method and device of score cards, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010705871.9A CN112037005B (en) 2020-07-21 2020-07-21 Fusion method and device of score cards, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112037005A CN112037005A (en) 2020-12-04
CN112037005B true CN112037005B (en) 2022-12-30

Family

ID=73579337

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010705871.9A Active CN112037005B (en) 2020-07-21 2020-07-21 Fusion method and device of score cards, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112037005B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766649B (en) * 2020-12-31 2022-03-15 平安科技(深圳)有限公司 Target object evaluation method based on multi-scoring card fusion and related equipment thereof
CN112734568B (en) * 2021-01-29 2024-01-12 深圳前海微众银行股份有限公司 Credit scoring card model construction method, device, equipment and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107644375A (en) * 2016-07-22 2018-01-30 花生米浙江数据信息服务股份有限公司 Small trade company's credit estimation method that a kind of expert model merges with machine learning model
CN108596495A (en) * 2018-04-26 2018-09-28 浙江工业大学 A kind of retail credit business points-scoring system and method
CN111160721A (en) * 2019-12-11 2020-05-15 东方微银科技(北京)有限公司 Iterative scoring model establishing and optimizing method based on random optimization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107644375A (en) * 2016-07-22 2018-01-30 花生米浙江数据信息服务股份有限公司 Small trade company's credit estimation method that a kind of expert model merges with machine learning model
CN108596495A (en) * 2018-04-26 2018-09-28 浙江工业大学 A kind of retail credit business points-scoring system and method
CN111160721A (en) * 2019-12-11 2020-05-15 东方微银科技(北京)有限公司 Iterative scoring model establishing and optimizing method based on random optimization

Also Published As

Publication number Publication date
CN112037005A (en) 2020-12-04

Similar Documents

Publication Publication Date Title
CN108564286B (en) Artificial intelligent financial wind-control credit assessment method and system based on big data credit investigation
US20190180379A1 (en) Life insurance system with fully automated underwriting process for real-time underwriting and risk adjustment, and corresponding method thereof
CN109858957A (en) Finance product recommended method, device, computer equipment and storage medium
WO2014055238A1 (en) System and method for building and validating a credit scoring function
CN112102073A (en) Credit risk control method and system, electronic device and readable storage medium
CN112037005B (en) Fusion method and device of score cards, computer equipment and storage medium
CN111583012B (en) Method for evaluating default risk of credit, debt and debt main body by fusing text information
Garrido et al. A Robust profit measure for binary classification model evaluation
CN111738819A (en) Method, device and equipment for screening characterization data
CN111091276A (en) Enterprise risk scoring method and device, computer equipment and storage medium
CN114638696A (en) Credit risk prediction model training method and system
CN112990989B (en) Value prediction model input data generation method, device, equipment and medium
CN111144738A (en) Information processing method, information processing device, computer equipment and storage medium
CN113920372A (en) Data classification method, device, equipment and storage medium
CN117132383A (en) Credit data processing method, device, equipment and readable storage medium
CN116739722A (en) Financing lease quotation method and system based on risk assessment
CN116385151A (en) Method and computing device for risk rating prediction based on big data
CN113919937B (en) KS monitoring system based on loan assessment wind control
KR102499182B1 (en) Loan regular auditing system using artificia intellicence
CN115239491A (en) Futures trading method and computer equipment
Sezgin Statistical methods in credit rating
KR102590232B1 (en) System for providing day trading stock picking service using bigdata based artificial intelligence
CN112862602B (en) User request determining method, storage medium and electronic device
CN114239981A (en) Asset level prediction method, device, equipment and storage medium
CN117934061A (en) User repurchase behavior prediction method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant