CN112785086A - Credit overdue risk prediction method and device - Google Patents

Credit overdue risk prediction method and device Download PDF

Info

Publication number
CN112785086A
CN112785086A CN202110187104.8A CN202110187104A CN112785086A CN 112785086 A CN112785086 A CN 112785086A CN 202110187104 A CN202110187104 A CN 202110187104A CN 112785086 A CN112785086 A CN 112785086A
Authority
CN
China
Prior art keywords
credit
data
risk prediction
historical
overdue risk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110187104.8A
Other languages
Chinese (zh)
Inventor
邓佳颖
谢联民
肖迪
汪硕文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110187104.8A priority Critical patent/CN112785086A/en
Publication of CN112785086A publication Critical patent/CN112785086A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Evolutionary Computation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Operations Research (AREA)
  • Finance (AREA)
  • Tourism & Hospitality (AREA)
  • Accounting & Taxation (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Educational Administration (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Technology Law (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The embodiment of the application provides a credit overdue risk prediction method and a credit overdue risk prediction device, which can be used in the technical field of artificial intelligence, and the method comprises the following steps: generating a corresponding target credit feature vector based on the credit data of the target user; inputting the target credit characteristic vector into a preset credit overdue risk prediction model, and determining a credit overdue risk prediction result of a target user based on the output of the credit overdue risk prediction model; the credit overdue risk prediction model is obtained by training based on balance historical credit data in advance, and the balance historical credit data is obtained by carrying out data balance processing on non-balance historical data acquired in advance. According to the method and the device, the distribution uniformity of training data used for training the credit overdue risk prediction model can be improved, the accuracy, the reliability and the effectiveness of the trained credit overdue risk prediction model can be effectively improved, and the accuracy and the effectiveness of credit overdue risk prediction can be further effectively improved.

Description

Credit overdue risk prediction method and device
Technical Field
The application relates to the technical field of data processing, in particular to the technical field of artificial intelligence, and specifically relates to a credit overdue risk prediction method and device.
Background
In credit business, the qualification and repayment ability of financial users directly determine the financial risk of financial institutions such as banks. However, since credit investigation systems in some regions are not perfect, when a financial institution evaluates the credit rating of a financial user, the financial institution needs to refer to a personal credit investigation report and rely on basic data and asset information provided by the financial user, and the evaluation method not only can lead some financial users to fill in false assets or pass the audit by means of borrowing, but also needs professional data auditors to analyze and judge, and is time-consuming, labor-consuming and low in accuracy, which can lead to overdue loan or default behavior.
At present, the prior art can realize the prediction of the default of credit behavior to improve the automation degree of the credit overdue risk early warning, for example, based on statistics, operations and research, and artificial intelligence methods, however, these relatively automated methods all show that they have good credit risk prediction capability, but ignore the defects of the data set to be analyzed, that is, the data in the actual credit risk assessment are of various types and are unstructured, and especially, the problem of data category imbalance exists. For example: the credit customer grade evaluation problem is converted into a secondary classification problem, and if 100 credit customers exist, the credit grades are 98, the credit grade difference is 2, even if the customer to be evaluated is predicted to be good in credit grade by the learning algorithm, the classification accuracy can reach 98%, namely, the performance of the classification algorithm is directly influenced by most types of data, so that the learning algorithm trained by the unbalanced data set has no value and significance, and the effectiveness of the whole risk prediction model cannot be verified.
Therefore, how to effectively solve the unbalanced problem of the data to be trained in the aspect of researching the actual credit overdue risk prediction problem is important for improving the accuracy and reliability of credit risk prediction.
Disclosure of Invention
Aiming at the problems in the prior art, the credit overdue risk prediction method and device can improve the distribution uniformity of training data used for training the credit overdue risk prediction model, can effectively improve the accuracy, reliability and effectiveness of the trained credit overdue risk prediction model, and further can effectively improve the accuracy and effectiveness of the credit overdue risk prediction.
In order to solve the technical problem, the application provides the following technical scheme:
in a first aspect, the present application provides a method for predicting overdue risk, comprising:
generating a corresponding target credit feature vector based on the credit data of the target user;
inputting the target credit feature vector into a preset credit overdue risk prediction model, and determining a credit overdue risk prediction result of the target user based on the output of the credit overdue risk prediction model;
the credit overdue risk prediction model is obtained by training based on balance historical credit data in advance, and the balance historical credit data is obtained by carrying out data balance processing on non-balance historical data acquired in advance.
Further, before the inputting the target credit feature vector into a preset credit overdue risk prediction model, the method further includes:
obtaining historical credit data and tags corresponding to a plurality of historical users, wherein the tags comprise: the first label is used for indicating that the corresponding historical user has the credit overdue risk and the second label is used for indicating that the corresponding historical user does not have the credit overdue risk;
generating a corresponding first data set according to the historical credit data and the labels corresponding to the historical users, and dividing the first data set into a training set and a testing set;
if the data in the training set is unbalanced historical data, performing data balancing processing on the training set to convert the data in the training set into balanced historical credit data and form a second data set;
applying the second data set to obtain a corresponding credit feature vector;
and training to obtain the credit overdue risk prediction model based on the credit feature vector.
Further, before the dividing the first data set into a training set and a test set, the method further includes:
preprocessing historical credit data corresponding to each historical user, wherein the preprocessing comprises the following steps: and performing at least one of completion processing on the missing data and deletion processing on the abnormal and invalid data.
Further, before the step of performing data balance processing on the training set to convert the data in the training set into balance historical credit data and form a second data set if the data in the training set is unbalanced history data, the method further includes:
and if the ratio of the total number of the historical users corresponding to the first label to the total number of the historical users corresponding to the second label in the training set is greater than a first threshold value or less than a second threshold value, determining that the data in the training set is unbalanced historical data, wherein the first threshold value is greater than 1, and the second threshold value is greater than 0 and less than 1.
Further, the data balancing the training set to convert the data in the training set into balanced historical credit data and form a second data set includes:
determining minority class data and majority class data in the training set according to the total number of the historical users corresponding to the first label and the total number of the historical users corresponding to the second label in the training set;
performing K neighbor interpolation processing on the minority data by using a preset data balancing mode, and performing K mean clustering processing on the majority data to convert the data in the training set into balance historical credit data and form a corresponding second data set;
wherein, the data equalization mode comprises: a preset SMOTE algorithm.
Further, the applying the second data set to obtain a corresponding credit feature vector includes:
and performing data dimension reduction and feature screening processing on the second data set to obtain credit feature vectors corresponding to the historical users and the virtual users obtained through data balance processing.
Further, the training of the credit overdue risk prediction model based on the credit feature vector includes:
applying a preset random forest algorithm, and training based on the credit feature vector to obtain an initial prediction model;
and performing effect evaluation on the initial prediction model according to the evaluation indexes corresponding to the test set, and adjusting the initial prediction model based on the corresponding effect evaluation result to obtain a corresponding credit overdue risk prediction model.
In a second aspect, the present application provides a credit overdue risk prediction apparatus, comprising:
the vector generation module is used for generating a corresponding target credit feature vector based on the credit data of the target user;
the model prediction module is used for inputting the target credit feature vector into a preset credit overdue risk prediction model and determining a credit overdue risk prediction result of the target user based on the output of the credit overdue risk prediction model;
the credit overdue risk prediction model is obtained by training based on balance historical credit data in advance, and the balance historical credit data is obtained by carrying out data balance processing on non-balance historical data acquired in advance.
In a third aspect, the present application provides an electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method for predicting a risk of overdue credit when executing the program.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method for predicting overdue risk of credit.
According to the technical scheme, the method and the device for predicting the credit overdue risk have the advantages that the corresponding target credit characteristic vector is generated based on the credit data of the target user; inputting the target credit characteristic vector into a preset credit overdue risk prediction model, and determining a credit overdue risk prediction result of a target user based on the output of the credit overdue risk prediction model; the credit overdue risk prediction model is obtained by training in advance based on the balance historical credit data, the balance historical credit data is obtained by carrying out data balance processing on the pre-acquired unbalance historical data, the data balance processing is carried out on the pre-acquired unbalance historical data, the distribution uniformity of training data used for training the credit overdue risk prediction model can be effectively improved, the accuracy, the reliability and the effectiveness of the trained credit overdue risk prediction model can be effectively improved, thereby effectively improving the accuracy and the effectiveness of the credit overdue risk prediction, providing more accurate and reliable judgment basis for financial institutions such as banks and the like, the credit overdue risk is subjected to early warning processing or wind control processing in advance, so that the credit overdue risk of the user is effectively reduced, and the user experience of the financial institution is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a first flowchart of a credit overdue risk prediction method in an embodiment of the present application.
Fig. 2 is a schematic diagram of a second flowchart in the credit overdue risk prediction method in the embodiment of the present application.
Fig. 3 is a third flowchart of the credit overdue risk prediction method in the embodiment of the present application.
Fig. 4 is a fourth flowchart of the credit overdue risk prediction method in the embodiment of the present application.
Fig. 5 is a flowchart of step 030 in the credit overdue risk prediction method in the embodiment of the present application.
Fig. 6 is a flowchart of the step 050 in the credit overdue risk prediction method in the embodiment of the present application.
Fig. 7 is a first configuration diagram of the credit overdue risk prediction apparatus in the embodiment of the present application.
Fig. 8 is a second configuration diagram of the credit overdue risk prediction apparatus in the embodiment of the present application.
FIG. 9 is a logic flow diagram of a credit overdue risk prediction method in an application example of the present application.
FIG. 10 is a logic flow diagram of KF-SMOTE algorithm in the application example of the present application.
Fig. 11 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the method and apparatus for predicting the overdue credit risk disclosed in the present application may be used in the technical field of artificial intelligence, and may also be used in any fields other than the technical field of artificial intelligence.
Although the existing credit behavior default prediction modes show that the existing credit behavior default prediction modes have good credit risk prediction capability, the defects of the data set to be analyzed are ignored, namely the data in the actual credit risk assessment are various in types and are unstructured, and particularly the problem of data category imbalance exists, the application provides a credit overdue risk prediction method, a credit overdue risk prediction device, electronic equipment and a computer readable storage medium, and a corresponding target credit feature vector is generated based on the credit data of a target user; inputting the target credit feature vector into a preset credit overdue risk prediction model, and determining a credit overdue risk prediction result of the target user based on the output of the credit overdue risk prediction model; wherein the credit overdue risk prediction model is obtained by training in advance based on balance historical credit data, and the balance historical credit data is obtained by carrying out data balance processing on the pre-acquired non-balance historical data, the data balance processing is carried out on the pre-acquired non-balance historical data, the distribution uniformity of training data used for training the credit overdue risk prediction model can be effectively improved, the accuracy, the reliability and the effectiveness of the trained credit overdue risk prediction model can be effectively improved, thereby effectively improving the accuracy and the effectiveness of the credit overdue risk prediction, providing more accurate and reliable judgment basis for financial institutions such as banks and the like, the credit overdue risk is subjected to early warning processing or wind control processing in advance, so that the credit overdue risk of the user is effectively reduced, and the user experience of the financial institution is improved.
Based on the above, the present application further provides a credit overdue risk prediction system for implementing the credit overdue risk prediction method provided in one or more embodiments of the present application, where the credit overdue risk prediction system at least includes the credit overdue risk prediction apparatus, a target processing system of a financial institution and a client device, and first, in a model application stage, the credit overdue risk prediction apparatus may receive credit data of a target user from the client device held by a financial user, and then generate a corresponding target credit feature vector based on the credit data of the target user; and inputting the target credit eigenvector into a preset credit overdue risk prediction model, determining a credit overdue risk prediction result of the target user based on the output of the credit overdue risk prediction model, sending the credit overdue risk prediction result to a target processing system of a financial institution for early warning and display, and sending the credit overdue risk prediction result to client equipment of financial institution pneumatic control personnel. In the model training phase, the credit overdue risk prediction device may obtain historical credit data and tags corresponding to a plurality of historical users of the financial institution from a target processing system of the financial institution, where the tags include: the first label is used for indicating that the corresponding historical user has the credit overdue risk and the second label is used for indicating that the corresponding historical user does not have the credit overdue risk; generating a corresponding first data set according to the historical credit data and the labels corresponding to the historical users, and dividing the first data set into a training set and a testing set; if the data in the training set is unbalanced historical data, performing data balancing processing on the training set to convert the data in the training set into balanced historical credit data and form a second data set; applying the second data set to obtain a corresponding credit feature vector; and training to obtain the credit overdue risk prediction model based on the credit feature vector.
In a practical application scenario, the credit overdue risk prediction means may be implemented by a server; the server may be communicatively coupled to at least one client device.
It is understood that the client devices may include smart phones, tablet electronic devices, network set-top boxes, portable computers, desktop computers, Personal Digital Assistants (PDAs), in-vehicle devices, smart wearable devices, and the like. Wherein, intelligence wearing equipment can include intelligent glasses, intelligent wrist-watch, intelligent bracelet etc..
In another practical application scenario, the part of the credit overdue risk prediction performed by the credit overdue risk prediction apparatus may be performed in the server as described above, or all operations may be performed in the client device. The selection may be specifically performed according to the processing capability of the client device, the limitation of the user usage scenario, and the like. This is not a limitation of the present application. If all operations are performed in the client device, the client device may further include a processor for detailed processing of the credit overdue risk prediction.
The client device may have a communication module (i.e., a communication unit), and may be communicatively connected to a remote server to implement data transmission with the server. The server may include a server on the task scheduling center side, and in other implementation scenarios, the server may also include a server on an intermediate platform, for example, a server on a third-party server platform that is communicatively linked to the task scheduling center server. The server may include a single computer device, or may include a server cluster formed by a plurality of servers, or a server structure of a distributed apparatus.
The server and the client device may communicate using any suitable network protocol, including network protocols not yet developed at the filing date of this application. The network protocol may include, for example, a TCP/IP protocol, a UDP/IP protocol, an HTTP protocol, an HTTPS protocol, or the like. Of course, the network Protocol may also include, for example, an RPC Protocol (Remote Procedure Call Protocol), a REST Protocol (Representational State Transfer Protocol), and the like used above the above Protocol.
The following embodiments and application examples are specifically and individually described in detail.
In order to solve the problem that the model training result accuracy is low due to unbalanced data, so that the credit overdue risk prediction result is inaccurate, the application provides an embodiment of a credit overdue risk prediction method, which specifically includes the following contents, with reference to fig. 1:
step 100: a corresponding target credit feature vector is generated based on the credit data of the target user.
In one or more embodiments of the present application, the credit data may specifically include client credit investigation data of a financial user of a financial institution, basic information, loan history detail data authorized to be obtained by the financial user, and the like, and may specifically be set according to an actual application situation.
In one or more embodiments of the present application, the target credit feature vector refers to a credit feature vector for a target user, wherein the credit feature vector refers to a feature vector constructed to have strong correlation with credit overdue prediction, and important features can be found by using expert experience.
Step 200: inputting the target credit feature vector into a preset credit overdue risk prediction model, and determining a credit overdue risk prediction result of the target user based on the output of the credit overdue risk prediction model; the credit overdue risk prediction model is obtained by training based on balance historical credit data in advance, and the balance historical credit data is obtained by carrying out data balance processing on non-balance historical data acquired in advance.
In one or more embodiments of the present application, the credit overdue risk prediction model may be a machine learning model capable of data prediction, and preferably may be a Random Forest model (Random Forest), in which a Random Forest is a classifier comprising a plurality of decision trees and the output categories are determined by the mode of the categories output by the individual trees.
In step 200, the balance history credit data and the unbalance history data are relative, and whether the history credit data belongs to the unbalance history data or the balance history credit data can be judged according to the ratio or the difference between the majority class data and the minority class data in the history credit data. For example: if the data percentage of the current historical credit data without the risk of credit overrun is 79%, the data percentage of the current historical credit data without the risk of credit overrun is 21%, and the ratio of the data percentage of the current historical credit data without the risk of credit overrun is 79/21, because the ratio is greater than the judgment threshold value 1.5, the current historical credit data is indicated to be unbalanced historical credit data, and therefore the current historical credit data needs to be subjected to data balance processing to be converted into balanced historical credit data, for example, after the unbalanced historical credit data is subjected to data balance processing, the data percentage of the current historical credit data without the risk of credit overrun is 55%, the data percentage of the current historical credit data with the risk of credit overrun is 45%, the ratio of the data to the current historical credit data with the risk of credit overrun is 55/45, and the ratio is less than the judgment threshold value 1.
It is understood that the data balance processing may apply the existing data processing method capable of interpolating a few classes and clustering or deleting a plurality of classes of data.
As can be seen from the above description, the credit overdue risk prediction method provided in the embodiment of the present application performs data balance processing on pre-acquired non-balance historical data, can effectively improve the distribution uniformity of training data used for training the credit overdue risk prediction model, and can effectively improve the accuracy, reliability, and effectiveness of the trained credit overdue risk prediction model, and further can effectively improve the accuracy and effectiveness of credit overdue risk prediction, and provide a more accurate and reliable determination basis for financial institutions such as banks, so as to perform early warning processing or pneumatic control processing on the credit overdue risk, thereby effectively reducing the credit overdue risk of a user, and improving the user experience of the financial institutions.
In order to provide a specific way to perform data balance processing on the training set, in an embodiment of the credit overdue risk prediction method provided by the present application, referring to fig. 2, the following is further specifically included before step 200 in the credit overdue risk prediction method:
step 010: obtaining historical credit data and tags corresponding to a plurality of historical users, wherein the tags comprise: the first label is used for indicating that the corresponding historical user has the credit overdue risk, and the second label is used for indicating that the corresponding historical user does not have the credit overdue risk.
Specifically, a customer sample required by modeling analysis can be selected, customer credit investigation data, basic information and loan history detail data authorized to be obtained by a customer are obtained, and the credit investigation data, the basic information and the loan history detail are analyzed, converted and arranged into a data set.
Step 020: and generating a corresponding first data set according to the historical credit data and the labels corresponding to the historical users, and dividing the first data set into a training set and a testing set.
In step 020, the first data set is specifically used for storing the corresponding relationship among the historical users, the historical credit data and the tags.
Step 030: and if the data in the training set is unbalanced historical data, performing data balancing processing on the training set to convert the data in the training set into balanced historical credit data and form a second data set.
Step 040: applying the second data set to obtain a corresponding credit feature vector.
Step 050: and training to obtain the credit overdue risk prediction model based on the credit feature vector.
As can be seen from the above description, the credit overdue risk prediction method provided in the embodiment of the present application can effectively improve the distribution uniformity of training data used for training the credit overdue risk prediction model by performing data balance processing on a training set, and can effectively improve the accuracy, reliability, and effectiveness of the trained credit overdue risk prediction model.
In order to provide a concrete way of data preprocessing, in an embodiment of the credit overdue risk prediction method provided by the present application, referring to fig. 3, the following is further specifically included between step 010 and step 020 in the credit overdue risk prediction method:
step 011: preprocessing historical credit data corresponding to each historical user, wherein the preprocessing comprises the following steps: and performing at least one of completion processing on the missing data and deletion processing on the abnormal and invalid data.
From the above description, the credit overdue risk prediction method provided by the embodiment of the application can effectively improve the application reliability and accuracy of historical credit data, so as to effectively improve the accuracy, reliability and effectiveness of the trained credit overdue risk prediction model.
In order to provide a specific way to determine unbalanced data, in an embodiment of the credit overdue risk prediction method provided by the present application, referring to fig. 4, the following is further specifically included between step 020 and step 030 in the credit overdue risk prediction method:
step 021: and if the ratio of the total number of the historical users corresponding to the first label to the total number of the historical users corresponding to the second label in the training set is greater than a first threshold value or less than a second threshold value, determining that the data in the training set is unbalanced historical data, wherein the first threshold value is greater than 1, and the second threshold value is greater than 0 and less than 1.
Specifically, if the majority class data is compared with the minority class data, the ratio needs to be compared with a first threshold, where the first threshold is a value greater than 1, for example, the first threshold may be the above-mentioned judgment threshold 1.5; if the minority class data is compared to the majority class data, then the ratio needs to be compared to a second threshold, which is a value greater than 0 and less than 1, for example the second threshold may be 0.75.
From the above description, the credit overdue risk prediction method provided by the embodiment of the application can effectively improve the efficiency and accuracy of judging unbalanced data, can effectively improve the efficiency and reliability of data balance processing, and can further effectively improve the efficiency and reliability of credit overdue risk prediction.
In order to provide a concrete way of data balance, in an embodiment of the credit overdue risk prediction method provided by the present application, referring to fig. 5, the steps 030 in the credit overdue risk prediction method specifically include the following:
step 031: and determining minority class data and majority class data in the training set according to the total number of the historical users corresponding to the first label and the total number of the historical users corresponding to the second label in the training set.
Step 032: performing K neighbor interpolation processing on the minority data by using a preset data balancing mode, and performing K mean clustering processing on the majority data to convert the data in the training set into balance historical credit data and form a corresponding second data set; wherein, the data equalization mode comprises: a preset SMOTE algorithm.
It is understood that the SMOTE (synthetic minor updating technique) algorithm may preferably adopt a fusion SMOTE algorithm, which may be written as KF-SMOTE algorithm, and may optimize a multidimensional unbalanced credit history data set with high complexity. The method is characterized in that a minority oversampling technology is synthesized, the minority samples are added by adopting a simple sample copying strategy through random oversampling, so that the problem of model overfitting is easily generated, namely, the learned information of the model is too special and not generalized enough, the basic idea of the SMOTE algorithm is to analyze the minority samples and artificially synthesize new samples according to the minority samples to be added into a data set, and the algorithm flow is as follows:
(1) for each sample x in the minority class, calculating the distance from the sample x to all samples in the minority class sample set by using the Euclidean distance as a standard to obtain the k neighbor of the sample x.
(2) And setting a sampling ratio according to the sample imbalance ratio to determine a sampling multiplying factor N, and randomly selecting a plurality of samples from k neighbors of each sample x of the minority class, wherein the selected neighbor is assumed to be o.
(3) For each randomly selected neighbor o, a new sample is constructed with the original sample according to the following formula:
o(new)=o+rand(0,1)*(x-o)
from the above description, the credit overdue risk prediction method provided by the embodiment of the application can effectively improve the reliability and efficiency of data balance processing, can effectively improve the accuracy, reliability and effectiveness of the trained credit overdue risk prediction model, and can further effectively improve the accuracy and effectiveness of credit overdue risk prediction.
In order to provide a manner for extracting feature vectors, in an embodiment of the credit overdue risk prediction method provided by the present application, the step 040 in the credit overdue risk prediction method specifically includes the following steps:
step 041: and performing data dimension reduction and feature screening processing on the second data set to obtain credit feature vectors corresponding to the historical users and the virtual users obtained through data balance processing.
From the above description, the credit overdue risk prediction method provided by the embodiment of the application can effectively improve the reliability and efficiency of feature vector extraction, can effectively improve the reliability and efficiency of the trained credit overdue risk prediction model, and can further effectively improve the reliability and efficiency of credit overdue risk prediction.
In order to provide a concrete way of model training, in an embodiment of the credit overdue risk prediction method provided by the present application, referring to fig. 6, the step 050 of the credit overdue risk prediction method specifically includes the following steps:
step 051: and applying a preset random forest algorithm, and training based on the credit feature vector to obtain an initial prediction model.
Step 052: and performing effect evaluation on the initial prediction model according to the evaluation indexes corresponding to the test set, and adjusting the initial prediction model based on the corresponding effect evaluation result to obtain a corresponding credit overdue risk prediction model.
From the above description, the credit overdue risk prediction method provided by the embodiment of the application can effectively improve the accuracy, reliability and effectiveness of the trained credit overdue risk prediction model, and further can effectively improve the accuracy and effectiveness of the credit overdue risk prediction.
From the aspect of software, in order to solve the problem that the model training result accuracy is low due to unbalanced data, so that the credit overdue risk prediction result is inaccurate, the application provides an embodiment of a credit overdue risk prediction apparatus for implementing all or part of the contents of the credit overdue risk prediction method, which is shown in fig. 7, and the credit overdue risk prediction apparatus specifically includes the following contents:
a vector generation module 10 for generating a corresponding target credit feature vector based on the credit data of the target user.
In one or more embodiments of the present application, the credit data may specifically include client credit investigation data of a financial user of a financial institution, basic information, loan history detail data authorized to be obtained by the financial user, and the like, and may specifically be set according to an actual application situation.
In one or more embodiments of the present application, the target credit feature vector refers to a credit feature vector for a target user, wherein the credit feature vector refers to a feature vector constructed to have strong correlation with credit overdue prediction, and important features can be found by using expert experience.
The model prediction module 20 is used for inputting the target credit feature vector into a preset credit overdue risk prediction model and determining a credit overdue risk prediction result of the target user based on the output of the credit overdue risk prediction model; the credit overdue risk prediction model is obtained by training based on balance historical credit data in advance, and the balance historical credit data is obtained by carrying out data balance processing on non-balance historical data acquired in advance.
In one or more embodiments of the present application, the credit overdue risk prediction model may be a machine learning model capable of data prediction, and preferably may be a Random Forest model (Random Forest), in which a Random Forest is a classifier comprising a plurality of decision trees and the output categories are determined by the mode of the categories output by the individual trees.
In the model prediction module 20, the balance historical credit data is relative to the unbalance historical data, and it can be specifically determined whether the historical credit data belongs to the unbalance historical data or the balance historical credit data according to the ratio or difference between the majority class data and the minority class data in the historical credit data. For example: if the data percentage of the current historical credit data without the risk of credit overrun is 79%, the data percentage of the current historical credit data without the risk of credit overrun is 21%, and the ratio of the data percentage of the current historical credit data without the risk of credit overrun is 79/21, because the ratio is greater than the judgment threshold value 1.5, the current historical credit data is indicated to be unbalanced historical credit data, and therefore the current historical credit data needs to be subjected to data balance processing to be converted into balanced historical credit data, for example, after the unbalanced historical credit data is subjected to data balance processing, the data percentage of the current historical credit data without the risk of credit overrun is 55%, the data percentage of the current historical credit data with the risk of credit overrun is 45%, the ratio of the data to the current historical credit data with the risk of credit overrun is 55/45, and the ratio is less than the judgment threshold value 1.
It is understood that the data balance processing may apply the existing data processing method capable of interpolating a few classes and clustering or deleting a plurality of classes of data.
The embodiment of the credit overdue risk prediction apparatus provided in the present application may be specifically configured to execute the processing procedure of the embodiment of the credit overdue risk prediction method in the foregoing embodiment, and the function of the processing procedure is not described herein again, and reference may be made to the detailed description of the method embodiment.
As can be seen from the above description, the credit overdue risk prediction apparatus provided in the embodiment of the present application performs data balance processing on pre-acquired non-balance historical data, can effectively improve the distribution uniformity of training data used for training the credit overdue risk prediction model, can effectively improve the accuracy, reliability, and effectiveness of the trained credit overdue risk prediction model, and further can effectively improve the accuracy and effectiveness of credit overdue risk prediction, provides a more accurate and reliable criterion for financial institutions such as banks, and performs early warning processing or pneumatic control processing on the credit overdue risk, thereby effectively reducing the credit overdue risk of a user, and improving the user experience of the financial institutions.
In order to provide a specific way to perform data balance processing on a training set, in an embodiment of the credit overdue risk prediction apparatus provided by the present application, referring to fig. 8, the credit overdue risk prediction apparatus further specifically includes a model training module 01, and the model training module is configured to perform the following:
step 010: obtaining historical credit data and tags corresponding to a plurality of historical users, wherein the tags comprise: the first label is used for indicating that the corresponding historical user has the credit overdue risk, and the second label is used for indicating that the corresponding historical user does not have the credit overdue risk.
Specifically, a customer sample required by modeling analysis can be selected, customer credit investigation data, basic information and loan history detail data authorized to be obtained by a customer are obtained, and the credit investigation data, the basic information and the loan history detail are analyzed, converted and arranged into a data set.
Step 020: and generating a corresponding first data set according to the historical credit data and the labels corresponding to the historical users, and dividing the first data set into a training set and a testing set.
In step 020, the first data set is specifically used for storing the corresponding relationship among the historical users, the historical credit data and the tags.
Step 030: and if the data in the training set is unbalanced historical data, performing data balancing processing on the training set to convert the data in the training set into balanced historical credit data and form a second data set.
Step 040: applying the second data set to obtain a corresponding credit feature vector.
Step 050: and training to obtain the credit overdue risk prediction model based on the credit feature vector.
As can be seen from the above description, the credit overdue risk prediction apparatus provided in the embodiment of the present application can effectively improve the distribution uniformity of training data used for training the credit overdue risk prediction model by performing data balance processing on a training set, and can effectively improve the accuracy, reliability, and effectiveness of the trained credit overdue risk prediction model.
In order to provide a concrete way of data preprocessing, in an embodiment of the credit overdue risk prediction apparatus provided by the present application, the model training module 01 in the credit overdue risk prediction apparatus is further configured to perform the following:
step 011: preprocessing historical credit data corresponding to each historical user, wherein the preprocessing comprises the following steps: and performing at least one of completion processing on the missing data and deletion processing on the abnormal and invalid data.
From the above description, the credit overdue risk prediction apparatus provided in the embodiment of the present application can effectively improve the application reliability and accuracy of the historical credit data, so as to effectively improve the accuracy, reliability and effectiveness of the trained credit overdue risk prediction model.
In order to provide a concrete way of judging unbalanced data, in an embodiment of the credit overdue risk prediction apparatus provided by the present application, the model training module 01 in the credit overdue risk prediction apparatus is further configured to perform the following:
step 021: and if the ratio of the total number of the historical users corresponding to the first label to the total number of the historical users corresponding to the second label in the training set is greater than a first threshold value or less than a second threshold value, determining that the data in the training set is unbalanced historical data, wherein the first threshold value is greater than 1, and the second threshold value is greater than 0 and less than 1.
Specifically, if the majority class data is compared with the minority class data, the ratio needs to be compared with a first threshold, where the first threshold is a value greater than 1, for example, the first threshold may be the above-mentioned judgment threshold 1.5; if the minority class data is compared to the majority class data, then the ratio needs to be compared to a second threshold, which is a value greater than 0 and less than 1, for example the second threshold may be 0.75.
From the above description, the credit overdue risk prediction apparatus provided in the embodiment of the present application can effectively improve the efficiency and accuracy of determining unbalanced data, can effectively improve the efficiency and reliability of data balance processing, and can further effectively improve the efficiency and reliability of credit overdue risk prediction.
In order to provide a concrete way of data balance, in an embodiment of the credit overdue risk prediction apparatus provided by the present application, the model training module 01 in the credit overdue risk prediction apparatus is further configured to perform the following:
step 031: and determining minority class data and majority class data in the training set according to the total number of the historical users corresponding to the first label and the total number of the historical users corresponding to the second label in the training set.
Step 032: performing K neighbor interpolation processing on the minority data by using a preset data balancing mode, and performing K mean clustering processing on the majority data to convert the data in the training set into balance historical credit data and form a corresponding second data set; wherein, the data equalization mode comprises: a preset SMOTE algorithm.
It is understood that the SMOTE (synthetic minor updating technique) algorithm may preferably adopt a fusion SMOTE algorithm, which may be written as KF-SMOTE algorithm, and may optimize a multidimensional unbalanced credit history data set with high complexity. The method is characterized in that a minority oversampling technology is synthesized, the minority samples are added by adopting a simple sample copying strategy through random oversampling, so that the problem of model overfitting is easily generated, namely, the learned information of the model is too special and not generalized enough, the basic idea of the SMOTE algorithm is to analyze the minority samples and artificially synthesize new samples according to the minority samples to be added into a data set, and the algorithm flow is as follows:
(1) for each sample x in the minority class, calculating the distance from the sample x to all samples in the minority class sample set by using the Euclidean distance as a standard to obtain the k neighbor of the sample x.
(2) And setting a sampling ratio according to the sample imbalance ratio to determine a sampling multiplying factor N, and randomly selecting a plurality of samples from k neighbors of each sample x of the minority class, wherein the selected neighbor is assumed to be o.
(3) For each randomly selected neighbor o, a new sample is constructed with the original sample according to the following formula:
o(new)=o+rand(0,1)*(x-o)
from the above description, the credit overdue risk prediction apparatus provided in the embodiment of the application can effectively improve the reliability and efficiency of data balance processing, and can effectively improve the accuracy, reliability and effectiveness of the trained credit overdue risk prediction model, so as to effectively improve the accuracy and effectiveness of credit overdue risk prediction.
In order to provide a way of extracting feature vectors, in an embodiment of the credit overdue risk prediction apparatus provided by the present application, the model training module 01 in the credit overdue risk prediction apparatus is further configured to perform the following:
step 041: and performing data dimension reduction and feature screening processing on the second data set to obtain credit feature vectors corresponding to the historical users and the virtual users obtained through data balance processing.
From the above description, the credit overdue risk prediction apparatus provided in the embodiment of the present application can effectively improve reliability and efficiency of feature vector extraction, can effectively improve reliability and efficiency of a trained credit overdue risk prediction model, and can further effectively improve reliability and efficiency of credit overdue risk prediction.
In order to provide a concrete way of model training, in an embodiment of the credit overdue risk prediction apparatus provided by the present application, the model training module 01 in the credit overdue risk prediction apparatus is further configured to perform the following:
step 051: and applying a preset random forest algorithm, and training based on the credit feature vector to obtain an initial prediction model.
Step 052: and performing effect evaluation on the initial prediction model according to the evaluation indexes corresponding to the test set, and adjusting the initial prediction model based on the corresponding effect evaluation result to obtain a corresponding credit overdue risk prediction model.
From the above description, the credit overdue risk prediction apparatus provided in the embodiment of the application can effectively improve the accuracy, reliability and effectiveness of the trained credit overdue risk prediction model, and further can effectively improve the accuracy and effectiveness of the credit overdue risk prediction.
In order to further explain the scheme, the application also provides a specific application example of the credit overdue risk prediction method realized by applying the credit overdue risk prediction device, and in the application example, the acquisition of a sample data set, the preprocessing of an unbalanced data set, the characteristic engineering, the training and parameter adjustment of a risk prediction model and the evaluation of model performance are included, so that the intelligent credit risk evaluation prediction of a credit application client is realized, the defect that the accuracy of the prediction model is low due to the influence of nonuniform data is mainly overcome, and the performance of the prediction model is improved. The technical innovation point is that a fusion SMOTE algorithm-KF-SMOTE with a few classes of K neighbor interpolation and a plurality of classes of K mean clustering is provided, a multidimensional unbalanced credit historical data set with high complexity can be optimized, the performance of a credit risk assessment prediction model is ensured, and a more scientific reference is provided for the construction and risk management and control of a bank credit system.
Referring to fig. 9, an application example of the present application provides a method for predicting overdue credit risk for unbalanced data sets, including: sample data acquisition, data preprocessing, KF-SMOTE algorithm solving the unbalanced problem of the data set, characteristic engineering, and a credit overdue risk assessment model and model assessment based on a random forest algorithm. The method specifically comprises the following steps:
step 1): and acquiring sample data, namely selecting a client sample required by modeling analysis, acquiring client credit investigation data, basic information and loan history detail data authorized to be acquired by a client, analyzing the credit investigation data, the basic information and the loan history detail, converting the credit investigation data, the basic information and the loan history detail into a data set, and then sorting the data set into a data set.
Step 2): and (3) data preprocessing, namely performing induction and sorting on the data acquired in the step 1), including processing of missing, abnormal and invalid values.
Step 3): referring to fig. 10, the processing of the unbalanced data set includes firstly dividing a training set and a test set, secondly applying the proposed KF-SMOTE algorithm in the training set, performing K neighbor interpolation on a small number of types of data in the preprocessed data set, and performing K mean clustering on a large number of types of data, thereby completing homogenization of the unbalanced data set and obtaining a training set with uniform data; and the KF-SMOTE algorithm is used for clustering and interpolating a plurality of types of data and a plurality of types of data respectively.
Step 4): and (4) feature engineering, namely performing data dimensionality reduction and feature screening on the preprocessed data, finding out important features by using expert experience, and constructing feature vectors with strong correlation with credit overdue prediction.
Step 5): building a risk prediction model, and training on a training data set by using a random forest algorithm; repeatedly adjusting the adjustable parameters of the model until the model with optimal performance is obtained; and finally, obtaining a prediction model evaluation index by using the test set.
Step 6): and model evaluation, namely comprehensively evaluating the distinguishing capability, the predicting capability and the stability of the model through model evaluation indexes and comparison with other algorithm models.
From the above description, in order to solve the problem of model failure caused by data imbalance in the conventional credit overdue risk prediction method, the credit overdue risk prediction method provided by the application example of the application example preprocesses unbalanced modeling sample data through the proposed KF-SMOTE algorithm, so that the accuracy of the prediction model is improved, and a guarantee is provided for establishing a reasonable and effective credit overdue risk prediction model. Its advantages are as follows:
1. the KF-SMOTE algorithm optimizes a multidimensional unbalanced credit data set with high complexity, and ensures the reliability of a credit risk assessment prediction model;
2. the establishment of the risk prediction model is combined with an integrated random forest algorithm, so that the model has strong generalization capability and training speed.
In terms of hardware, in order to solve the problem that the model training result accuracy is low due to unbalanced data, so that the credit overdue risk prediction result is inaccurate, the application provides an embodiment of an electronic device for implementing all or part of the contents of the credit overdue risk prediction method, where the electronic device specifically includes the following contents:
fig. 11 is a schematic block diagram of a system configuration of an electronic device 9600 according to an embodiment of the present application. As shown in fig. 11, the electronic device 9600 can include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this FIG. 11 is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.
In one embodiment, the credit overdue risk prediction function may be integrated into the central processor. Wherein the central processor may be configured to control:
step 100: a corresponding target credit feature vector is generated based on the credit data of the target user.
In one or more embodiments of the present application, the credit data may specifically include client credit investigation data of a financial user of a financial institution, basic information, loan history detail data authorized to be obtained by the financial user, and the like, and may specifically be set according to an actual application situation.
In one or more embodiments of the present application, the target credit feature vector refers to a credit feature vector for a target user, wherein the credit feature vector refers to a feature vector constructed to have strong correlation with credit overdue prediction, and important features can be found by using expert experience.
Step 200: inputting the target credit feature vector into a preset credit overdue risk prediction model, and determining a credit overdue risk prediction result of the target user based on the output of the credit overdue risk prediction model; the credit overdue risk prediction model is obtained by training based on balance historical credit data in advance, and the balance historical credit data is obtained by carrying out data balance processing on non-balance historical data acquired in advance.
In one or more embodiments of the present application, the credit overdue risk prediction model may be a machine learning model capable of data prediction, and preferably may be a Random Forest model (Random Forest), in which a Random Forest is a classifier comprising a plurality of decision trees and the output categories are determined by the mode of the categories output by the individual trees.
In step 200, the balance history credit data and the unbalance history data are relative, and whether the history credit data belongs to the unbalance history data or the balance history credit data can be judged according to the ratio or the difference between the majority class data and the minority class data in the history credit data. For example: if the data percentage of the current historical credit data without the risk of credit overrun is 79%, the data percentage of the current historical credit data without the risk of credit overrun is 21%, and the ratio of the data percentage of the current historical credit data without the risk of credit overrun is 79/21, because the ratio is greater than the judgment threshold value 1.5, the current historical credit data is indicated to be unbalanced historical credit data, and therefore the current historical credit data needs to be subjected to data balance processing to be converted into balanced historical credit data, for example, after the unbalanced historical credit data is subjected to data balance processing, the data percentage of the current historical credit data without the risk of credit overrun is 55%, the data percentage of the current historical credit data with the risk of credit overrun is 45%, the ratio of the data to the current historical credit data with the risk of credit overrun is 55/45, and the ratio is less than the judgment threshold value 1.
It is understood that the data balance processing may apply the existing data processing method capable of interpolating a few classes and clustering or deleting a plurality of classes of data.
As can be seen from the above description, the electronic device provided in the embodiment of the present application performs data balance processing on pre-acquired unbalanced historical data, so that the distribution uniformity of training data used for training the credit overdue risk prediction model can be effectively improved, the accuracy, reliability, and effectiveness of the training-obtained credit overdue risk prediction model can be effectively improved, the accuracy and effectiveness of credit overdue risk prediction can be effectively improved, a more accurate and reliable determination basis is provided for financial institutions such as banks, so as to perform early warning processing or pneumatic control processing on credit overdue risks, thereby effectively reducing the credit overdue risks of users, and improving the user experience of the financial institutions.
In another embodiment, the credit overdue risk prediction apparatus may be configured separately from the central processor 9100, for example, the credit overdue risk prediction apparatus may be configured as a chip connected to the central processor 9100, and the credit overdue risk prediction function may be implemented by the control of the central processor.
As shown in fig. 11, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 also does not necessarily include all of the components shown in fig. 11; in addition, the electronic device 9600 may further include components not shown in fig. 11, which may be referred to in the prior art.
As shown in fig. 11, a central processor 9100, sometimes referred to as a controller or operational control, can include a microprocessor or other processor device and/or logic device, which central processor 9100 receives input and controls the operation of the various components of the electronic device 9600.
The memory 9140 can be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 9100 can execute the program stored in the memory 9140 to realize information storage or processing, or the like.
The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. Power supply 9170 is used to provide power to electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, an LCD display, but is not limited thereto.
The memory 9140 can be a solid state memory, e.g., Read Only Memory (ROM), Random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 9140 could also be some other type of device. Memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 being used for storing application programs and function programs or for executing a flow of operations of the electronic device 9600 by the central processor 9100.
The memory 9140 can also include a data store 9143, the data store 9143 being used to store data, such as contacts, digital data, pictures, sounds, and/or any other data used by an electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers for the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, contact book applications, etc.).
The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. The communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, which may be the same as in the case of a conventional mobile communication terminal.
Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and receive audio input from the microphone 9132, thereby implementing ordinary telecommunications functions. The audio processor 9130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100, thereby enabling recording locally through the microphone 9132 and enabling locally stored sounds to be played through the speaker 9131.
Embodiments of the present application also provide a computer-readable storage medium capable of implementing all the steps in the credit overdue risk prediction method in the above embodiments, where the computer-readable storage medium stores thereon a computer program that, when executed by a processor, implements all the steps of the credit overdue risk prediction method in the above embodiments, where the execution subject is a server or a client, for example, the processor implements the following steps when executing the computer program:
step 100: a corresponding target credit feature vector is generated based on the credit data of the target user.
In one or more embodiments of the present application, the credit data may specifically include client credit investigation data of a financial user of a financial institution, basic information, loan history detail data authorized to be obtained by the financial user, and the like, and may specifically be set according to an actual application situation.
In one or more embodiments of the present application, the target credit feature vector refers to a credit feature vector for a target user, wherein the credit feature vector refers to a feature vector constructed to have strong correlation with credit overdue prediction, and important features can be found by using expert experience.
Step 200: inputting the target credit feature vector into a preset credit overdue risk prediction model, and determining a credit overdue risk prediction result of the target user based on the output of the credit overdue risk prediction model; the credit overdue risk prediction model is obtained by training based on balance historical credit data in advance, and the balance historical credit data is obtained by carrying out data balance processing on non-balance historical data acquired in advance.
In one or more embodiments of the present application, the credit overdue risk prediction model may be a machine learning model capable of data prediction, and preferably may be a Random Forest model (Random Forest), in which a Random Forest is a classifier comprising a plurality of decision trees and the output categories are determined by the mode of the categories output by the individual trees.
In step 200, the balance history credit data and the unbalance history data are relative, and whether the history credit data belongs to the unbalance history data or the balance history credit data can be judged according to the ratio or the difference between the majority class data and the minority class data in the history credit data. For example: if the data percentage of the current historical credit data without the risk of credit overrun is 79%, the data percentage of the current historical credit data without the risk of credit overrun is 21%, and the ratio of the data percentage of the current historical credit data without the risk of credit overrun is 79/21, because the ratio is greater than the judgment threshold value 1.5, the current historical credit data is indicated to be unbalanced historical credit data, and therefore the current historical credit data needs to be subjected to data balance processing to be converted into balanced historical credit data, for example, after the unbalanced historical credit data is subjected to data balance processing, the data percentage of the current historical credit data without the risk of credit overrun is 55%, the data percentage of the current historical credit data with the risk of credit overrun is 45%, the ratio of the data to the current historical credit data with the risk of credit overrun is 55/45, and the ratio is less than the judgment threshold value 1.
It is understood that the data balance processing may apply the existing data processing method capable of interpolating a few classes and clustering or deleting a plurality of classes of data.
As can be seen from the above description, the computer-readable storage medium provided in the embodiment of the present application performs data balance processing on pre-acquired non-balance historical data, so as to effectively improve the distribution uniformity of training data used for training the credit overdue risk prediction model, effectively improve the accuracy, reliability, and effectiveness of the trained credit overdue risk prediction model, and further effectively improve the accuracy and effectiveness of credit overdue risk prediction, provide a more accurate and reliable criterion for financial institutions such as banks, and perform early warning processing or pneumatic control processing on the credit overdue risk, thereby effectively reducing the credit overdue risk of a user, and improving the user experience of the financial institutions.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A method for predicting overdue credit risk, comprising:
generating a corresponding target credit feature vector based on the credit data of the target user;
inputting the target credit feature vector into a preset credit overdue risk prediction model, and determining a credit overdue risk prediction result of the target user based on the output of the credit overdue risk prediction model;
the credit overdue risk prediction model is obtained by training based on balance historical credit data in advance, and the balance historical credit data is obtained by carrying out data balance processing on non-balance historical data acquired in advance.
2. The credit overdue risk prediction method of claim 1, further comprising, before the inputting the target credit feature vector into a preset credit overdue risk prediction model:
obtaining historical credit data and tags corresponding to a plurality of historical users, wherein the tags comprise: the first label is used for indicating that the corresponding historical user has the credit overdue risk and the second label is used for indicating that the corresponding historical user does not have the credit overdue risk;
generating a corresponding first data set according to the historical credit data and the labels corresponding to the historical users, and dividing the first data set into a training set and a testing set;
if the data in the training set is unbalanced historical data, performing data balancing processing on the training set to convert the data in the training set into balanced historical credit data and form a second data set;
applying the second data set to obtain a corresponding credit feature vector;
and training to obtain the credit overdue risk prediction model based on the credit feature vector.
3. The credit overdue risk prediction method of claim 2, further comprising, prior to said dividing the first data set into a training set and a test set:
preprocessing historical credit data corresponding to each historical user, wherein the preprocessing comprises the following steps: and performing at least one of completion processing on the missing data and deletion processing on the abnormal and invalid data.
4. The credit overdue risk prediction method of claim 2, further comprising, before the data balancing the training set to convert the data in the training set into balanced historical credit data and form a second data set if the data in the training set is unbalanced historical data:
and if the ratio of the total number of the historical users corresponding to the first label to the total number of the historical users corresponding to the second label in the training set is greater than a first threshold value or less than a second threshold value, determining that the data in the training set is unbalanced historical data, wherein the first threshold value is greater than 1, and the second threshold value is greater than 0 and less than 1.
5. The credit overdue risk prediction method of claim 2, wherein the performing data balancing on the training set to convert data in the training set to balanced historical credit data and form a second data set comprises:
determining minority class data and majority class data in the training set according to the total number of the historical users corresponding to the first label and the total number of the historical users corresponding to the second label in the training set;
performing K neighbor interpolation processing on the minority data by using a preset data balancing mode, and performing K mean clustering processing on the majority data to convert the data in the training set into balance historical credit data and form a corresponding second data set;
wherein, the data equalization mode comprises: a preset SMOTE algorithm.
6. The credit overdue risk prediction method of claim 2, wherein the applying the second data set to obtain a corresponding credit feature vector comprises:
and performing data dimension reduction and feature screening processing on the second data set to obtain credit feature vectors corresponding to the historical users and the virtual users obtained through data balance processing.
7. The method of claim 2, wherein the training of the credit overdue risk prediction model based on the credit feature vector comprises:
applying a preset random forest algorithm, and training based on the credit feature vector to obtain an initial prediction model;
and performing effect evaluation on the initial prediction model according to the evaluation indexes corresponding to the test set, and adjusting the initial prediction model based on the corresponding effect evaluation result to obtain a corresponding credit overdue risk prediction model.
8. An overdue credit risk prediction apparatus, comprising:
the vector generation module is used for generating a corresponding target credit feature vector based on the credit data of the target user;
the model prediction module is used for inputting the target credit feature vector into a preset credit overdue risk prediction model and determining a credit overdue risk prediction result of the target user based on the output of the credit overdue risk prediction model;
the credit overdue risk prediction model is obtained by training based on balance historical credit data in advance, and the balance historical credit data is obtained by carrying out data balance processing on non-balance historical data acquired in advance.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of credit overdue risk prediction of any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the credit overdue risk prediction method of any of claims 1 to 7.
CN202110187104.8A 2021-02-10 2021-02-10 Credit overdue risk prediction method and device Pending CN112785086A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110187104.8A CN112785086A (en) 2021-02-10 2021-02-10 Credit overdue risk prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110187104.8A CN112785086A (en) 2021-02-10 2021-02-10 Credit overdue risk prediction method and device

Publications (1)

Publication Number Publication Date
CN112785086A true CN112785086A (en) 2021-05-11

Family

ID=75761544

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110187104.8A Pending CN112785086A (en) 2021-02-10 2021-02-10 Credit overdue risk prediction method and device

Country Status (1)

Country Link
CN (1) CN112785086A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256402A (en) * 2021-06-03 2021-08-13 上海冰鉴信息科技有限公司 Risk control rule determination method and device and electronic equipment
CN113379534A (en) * 2021-06-11 2021-09-10 重庆农村商业银行股份有限公司 Risk assessment method, device, equipment and storage medium
CN113628026A (en) * 2021-06-30 2021-11-09 重庆度小满优扬科技有限公司 Method and device for predicting overdue risk ranking
CN113870013A (en) * 2021-10-14 2021-12-31 浙江孚临科技有限公司 Credit default prediction method based on unbalanced data
CN113887821A (en) * 2021-10-20 2022-01-04 度小满科技(北京)有限公司 Method and device for risk prediction
CN114092216A (en) * 2021-09-22 2022-02-25 金蝶征信有限公司 Enterprise credit rating method, apparatus, computer device and storage medium
CN114841801A (en) * 2022-07-04 2022-08-02 天津金城银行股份有限公司 Credit wind control method and device based on user behavior characteristics
CN116205726A (en) * 2023-04-28 2023-06-02 成都新希望金融信息有限公司 Loan risk prediction method and device, electronic equipment and storage medium
CN116739742A (en) * 2023-06-02 2023-09-12 北京百度网讯科技有限公司 Monitoring method, device, equipment and storage medium of credit wind control model
WO2023231785A1 (en) * 2022-05-31 2023-12-07 支付宝(杭州)信息技术有限公司 Data processing method, apparatus, and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150127416A1 (en) * 2013-11-01 2015-05-07 Digital Risk Analytics, LLC Systems, methods and computer readable media for multi-dimensional risk assessment
WO2019080407A1 (en) * 2017-10-25 2019-05-02 深圳壹账通智能科技有限公司 Credit evaluation method, apparatus and device, and computer readable storage medium
CN111192131A (en) * 2019-12-12 2020-05-22 上海淇玥信息技术有限公司 Financial risk prediction method and device and electronic equipment
CN111222982A (en) * 2020-01-16 2020-06-02 随手(北京)信息技术有限公司 Internet credit overdue prediction method, device, server and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150127416A1 (en) * 2013-11-01 2015-05-07 Digital Risk Analytics, LLC Systems, methods and computer readable media for multi-dimensional risk assessment
WO2019080407A1 (en) * 2017-10-25 2019-05-02 深圳壹账通智能科技有限公司 Credit evaluation method, apparatus and device, and computer readable storage medium
CN111192131A (en) * 2019-12-12 2020-05-22 上海淇玥信息技术有限公司 Financial risk prediction method and device and electronic equipment
CN111222982A (en) * 2020-01-16 2020-06-02 随手(北京)信息技术有限公司 Internet credit overdue prediction method, device, server and storage medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256402A (en) * 2021-06-03 2021-08-13 上海冰鉴信息科技有限公司 Risk control rule determination method and device and electronic equipment
CN113379534A (en) * 2021-06-11 2021-09-10 重庆农村商业银行股份有限公司 Risk assessment method, device, equipment and storage medium
CN113628026A (en) * 2021-06-30 2021-11-09 重庆度小满优扬科技有限公司 Method and device for predicting overdue risk ranking
CN114092216A (en) * 2021-09-22 2022-02-25 金蝶征信有限公司 Enterprise credit rating method, apparatus, computer device and storage medium
CN113870013A (en) * 2021-10-14 2021-12-31 浙江孚临科技有限公司 Credit default prediction method based on unbalanced data
CN113887821A (en) * 2021-10-20 2022-01-04 度小满科技(北京)有限公司 Method and device for risk prediction
WO2023231785A1 (en) * 2022-05-31 2023-12-07 支付宝(杭州)信息技术有限公司 Data processing method, apparatus, and device
CN114841801A (en) * 2022-07-04 2022-08-02 天津金城银行股份有限公司 Credit wind control method and device based on user behavior characteristics
CN116205726A (en) * 2023-04-28 2023-06-02 成都新希望金融信息有限公司 Loan risk prediction method and device, electronic equipment and storage medium
CN116739742A (en) * 2023-06-02 2023-09-12 北京百度网讯科技有限公司 Monitoring method, device, equipment and storage medium of credit wind control model

Similar Documents

Publication Publication Date Title
CN112785086A (en) Credit overdue risk prediction method and device
CN111861569B (en) Product information recommendation method and device
CN111476662A (en) Anti-money laundering identification method and device
CN111784502A (en) Abnormal transaction account group identification method and device
CN109886290B (en) User request detection method and device, computer equipment and storage medium
CN111932269B (en) Equipment information processing method and device
CN111932268B (en) Enterprise risk identification method and device
CN111582341B (en) User abnormal operation prediction method and device
CN111931189B (en) API interface reuse risk detection method, device and API service system
CN111932267A (en) Enterprise financial service risk prediction method and device
CN110796269B (en) Method and device for generating model, and method and device for processing information
CN112232947A (en) Loan risk prediction method and device
CN113011646A (en) Data processing method and device and readable storage medium
CN110930038A (en) Loan demand identification method, loan demand identification device, loan demand identification terminal and loan demand identification storage medium
CN111815169A (en) Business approval parameter configuration method and device
CN112767167A (en) Investment transaction risk trend prediction method and device based on ensemble learning
CN112766825A (en) Enterprise financial service risk prediction method and device
CN115409518A (en) User transaction risk early warning method and device
CN115114329A (en) Method and device for detecting data stream abnormity, electronic equipment and storage medium
CN117196630A (en) Transaction risk prediction method, device, terminal equipment and storage medium
CN115984853A (en) Character recognition method and device
CN112163861B (en) Transaction risk factor feature extraction method and device
CN114781368A (en) Business requirement safety processing method and device
CN114092226A (en) Method and device for recommending foreign exchange products of bank outlets
CN113393320A (en) Enterprise financial service risk prediction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination