CN112308294A - Default probability prediction method and device - Google Patents

Default probability prediction method and device Download PDF

Info

Publication number
CN112308294A
CN112308294A CN202011080647.1A CN202011080647A CN112308294A CN 112308294 A CN112308294 A CN 112308294A CN 202011080647 A CN202011080647 A CN 202011080647A CN 112308294 A CN112308294 A CN 112308294A
Authority
CN
China
Prior art keywords
time window
predicted
target
default
default probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011080647.1A
Other languages
Chinese (zh)
Other versions
CN112308294B (en
Inventor
贺欧文
卜志成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shell Time Network Technology Co ltd
Original Assignee
Beijing Shell Time Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shell Time Network Technology Co ltd filed Critical Beijing Shell Time Network Technology Co ltd
Priority to CN202011080647.1A priority Critical patent/CN112308294B/en
Publication of CN112308294A publication Critical patent/CN112308294A/en
Application granted granted Critical
Publication of CN112308294B publication Critical patent/CN112308294B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Operations Research (AREA)
  • Pure & Applied Mathematics (AREA)
  • Economics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Development Economics (AREA)
  • Algebra (AREA)
  • Game Theory and Decision Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention provides a default probability prediction method and a device, wherein the method comprises the following steps: for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window; clustering the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering result to serve as the default probability prediction result of the target to be predicted in each class; wherein the time durations of any two time windows are different. The default probability prediction method and the device provided by the embodiment of the invention can give consideration to both real-time performance and accuracy of default probability prediction and can realize more detailed default probability prediction.

Description

Default probability prediction method and device
Technical Field
The invention relates to the technical field of computers, in particular to a default probability prediction method and device.
Background
In predicting the probability of breach for a business or individual, this is typically done based on data collected over a fixed time window.
The behavior of a business (or person) at different points in time varies greatly, either for internal reasons or for external reasons. A single time window can smooth the behavior change situation of the enterprise (or individual) behavior in the time window (especially under the condition that the time period of the time window is long), and cannot reflect the real-time change of the default possibility of the enterprise (or individual); although the real-time performance is strong in the time window with short time, the default probability prediction result may have mutation, and the real default possibility of an enterprise (or an individual) cannot be reflected.
Disclosure of Invention
The embodiment of the invention provides a default probability prediction method and device, which are used for overcoming the defect that the real-time performance and the accuracy of default probability prediction are difficult to be considered in the prior art and realizing more precise default probability prediction.
The embodiment of the invention provides a default probability prediction method, which comprises the following steps:
for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window;
clustering the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering result to serve as the default probability prediction result of the target to be predicted in each class;
wherein the time durations of any two of the time windows are different.
According to the default probability prediction method of one embodiment of the present invention, the specific step of obtaining the prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window includes:
acquiring probability prediction characteristics of each time window of the target to be predicted according to basic data of each time window of the target to be predicted;
inputting the probability prediction characteristics of each time window of the target to be predicted into a regression model corresponding to each time window, and outputting the prediction result of the default probability of each time window of the target to be predicted;
and the regression model corresponding to each time window is obtained after training according to the probability prediction feature sample data of each time window and non-default or default data corresponding to the sample data.
According to the default probability prediction method of one embodiment of the present invention, the specific steps of clustering each target to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window include:
acquiring a characteristic distance between every two targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window;
and based on a clustering algorithm, obtaining the class of each target to be predicted according to the characteristic distance between every two targets to be predicted.
According to the default probability prediction method of an embodiment of the present invention, before obtaining the prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window, the method further includes:
and for each time window, performing logistic regression analysis according to the probability prediction feature sample data of each time window and non-default or default data corresponding to the sample data to obtain a regression model corresponding to each time window.
According to the default probability prediction method of an embodiment of the present invention, before the predicting the feature sample data according to the probability of each time window and the non-default or default data corresponding to the sample data for each time window to perform the logistic regression analysis, and obtaining the regression model corresponding to each time window, the method further includes:
and for each time window, determining the probability prediction characteristics of each time window according to the sample basic data of each time window based on a characteristic engineering method.
According to the default probability prediction method of one embodiment of the invention, the clustering algorithm is a K-means clustering algorithm.
According to the default probability prediction method of one embodiment of the invention, the basic data comprises at least one of personnel data, financial data, business data and industrial and commercial data.
An embodiment of the present invention further provides a default probability prediction apparatus, including:
the regression analysis module is used for acquiring a prediction result of the default probability of each time window of each target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window;
the weighted clustering module is used for clustering the targets to be predicted according to the prediction results of the default probability of each time window of the targets to be predicted and the weight corresponding to each time window, and acquiring default probability prediction results corresponding to each class according to the clustering results to serve as default probability prediction results of the targets to be predicted in each class;
wherein the time durations of any two of the time windows are different.
An embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of any of the foregoing default probability prediction methods when executing the program.
Embodiments of the present invention also provide a non-transitory computer readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing the steps of the default probability prediction method according to any one of the above.
According to the default probability prediction method and device provided by the embodiment of the invention, the prediction results of the default probabilities of a plurality of time windows are obtained according to the basic data of the plurality of time windows of the target to be predicted, clustering is carried out according to the prediction results of the default probabilities of the plurality of time windows and the weights corresponding to the time windows, the default probability prediction result corresponding to each class is obtained and is used as the default probability prediction result of the target to be predicted in each class, the behavior change conditions of the target to be predicted in different time periods can be extracted, the real-time performance and the accuracy of default probability prediction can be considered, and more fine default probability prediction can be realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart illustrating a method for default probability prediction according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a default probability prediction apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to overcome the above problems in the prior art, embodiments of the present invention provide a default probability prediction method and apparatus, and the inventive concept is to divide data acquisition into a plurality of time windows, more accurately depict the behavior of the target to be predicted in a long-term, a medium-term, and a short-term, and synthesize default probability predictions of each time window to perform comprehensive predictions of default probabilities of the target to be predicted.
Fig. 1 is a schematic flowchart of a default probability prediction method according to an embodiment of the present invention. The default probability prediction method according to the embodiment of the present invention is described below with reference to fig. 1. As shown in fig. 1, the method includes: step S101, for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window.
Wherein the time durations of any two time windows are different.
Specifically, the time window refers to a time period, which is generally a time period from a time point before the current time point to the current time point.
In order to more accurately characterize the behavior of the target to be predicted in different periods of time, such as long-term, medium-term, short-term, etc., a plurality of time windows with mutually different durations may be selected in advance.
For example, 3 time windows can be selected, the duration of each time window is 5 years, 2 years and 1 year, and the time windows correspond to a long-term time window, a medium-term time window and a short-term time window; 5 time windows may also be selected. The time lengths are respectively 8 years, 5 years, 3 years, 2 years and 1 year, and correspond to a long-term time window, a medium-short term time window and a short-term time window.
The basic data refers to data of personnel, funds, transactions, intellectual property rights and the like which have certain relevance with the risk condition of the target to be predicted.
The target to be predicted can be an entity such as a business or an individual.
The basic data may include one or more items.
For each time window, the basic data of the time window may be used as the input of the regression model corresponding to the time window, or after the basic data of the time window is subjected to data processing, appropriate data may be obtained as the input of the regression model corresponding to the time window.
Regression models are a mathematical model that quantitatively describes statistical relationships. Regression models are predictive modeling techniques that study the relationship between dependent variables (targets) and independent variables (predictors). This technique is commonly used for predictive analysis, time series modeling, and discovering causal relationships between variables.
For each time window, the regression model corresponding to the time window can output the prediction result of the default probability of the time window of the target to be predicted according to the input data.
Through the steps, the prediction result of the default probability of each target to be predicted in each time window can be obtained.
Step S102, clustering the targets to be predicted according to the prediction results of the default probability of each time window of the targets to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering results to serve as the default probability prediction result of the targets to be predicted in each class.
Specifically, the weight corresponding to each time window may be determined according to the type and the prediction requirement of the target to be predicted.
For example: for a large enterprise, the weights w1, w2 and w3 corresponding to the long-term time window, the medium-term time window and the short-term time window are 0.5, 0.3 and 0.2 respectively; for small and medium enterprises, the weights w1, w2 and w3 corresponding to the long-term time window, the medium-term time window and the short-term time window are 0.15, 0.25 and 0.6 respectively.
Based on the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, clustering the targets to be predicted by adopting any unsupervised clustering algorithm, dividing the targets to be predicted into a plurality of classes, determining the class to which each target to be predicted belongs, and obtaining a clustering result.
For each class obtained through clustering, a probability interval can be obtained by adopting methods such as mathematical statistics and the like based on the prediction result of the default probability of each time window of each target to be predicted belonging to the class and the weight corresponding to each time window, and the probability interval is used as the default probability prediction result corresponding to the class.
For example, for each time window, the upper and lower probability limits of the time window may be determined by using a method such as mathematical statistics based on the predicted result of the default probability of the time window of each object to be predicted belonging to the category, for example, the maximum and minimum values in the predicted result of the default probability of the time window of each object to be predicted may be respectively used as the upper and lower probability limits of the time window, or the upper and lower probability limits of the time window may be obtained by respectively adding and subtracting several times of standard deviation to and from the average value of the predicted result of the default probability of the time window of each object to be predicted; and acquiring the upper limit of the probability interval according to the weight and the probability upper limit corresponding to each time window, and acquiring the lower limit of the probability interval according to the weight and the probability lower limit corresponding to each time window, thereby acquiring the probability interval.
For each class obtained by clustering, after the probability interval of each time window is obtained through the steps, statistical index analysis can be carried out according to the key characteristics of each target to be predicted of the class and the time window, and the probability interval of the time window obtained before can be corrected.
And comparing the statistical indexes of the key characteristics of the time window of each object to be predicted with the overall statistical indexes of the key characteristics of the time window of all objects to be predicted, and judging whether the statistical indexes of the key characteristics of the time window of each object to be predicted obviously deviate from the overall statistical indexes.
If the deviation exists, the targets to be predicted of the class are considered to be distributed as abnormal points compared with the whole, and correction is carried out; if not, no correction is made.
If the critical characteristic is obviously higher than the upper limit, correspondingly increasing or decreasing the upper limit and the lower limit of the probability interval of the time window, which are acquired before the critical characteristic is a positive or negative indicator;
if the critical characteristic is obviously lower than the threshold, based on whether the critical characteristic is a positive or negative indicator, the upper limit and the lower limit of the probability interval of the time window acquired before the critical characteristic is correspondingly adjusted down or up.
For example, the target to be predicted is a business, and the key characteristics can be main characteristics such as the amount of registered capital, the business operation duration, the business operation performance stability and the like. The statistical indicator may be a mean or median, etc. The condition for determining a significant deviation may include whether the distribution is below or above the 3 standard deviation range of the overall distribution.
It should be noted that compared with the target to be predicted belonging to another class, the targets to be predicted belonging to the same class have higher similarity in behavior and default probability, so that the default probability prediction result corresponding to each class can be used as the default probability prediction result of the target to be predicted in each class.
According to the embodiment of the invention, the prediction results of the default probability of the multiple time windows are obtained according to the basic data of the multiple time windows of the target to be predicted, clustering is carried out according to the prediction results of the default probability of the multiple time windows and the weight corresponding to the time windows, the default probability prediction result corresponding to each class is obtained and is used as the default probability prediction result of the target to be predicted in each class, the behavior change conditions of the target to be predicted in different time periods can be extracted, the real-time performance and the accuracy of default probability prediction can be considered, and more precise default probability prediction can be realized.
Based on the content of the above embodiments, the specific step of obtaining the prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window includes: and acquiring the probability prediction characteristics of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted.
Specifically, the probabilistic predictive feature is a plurality of predetermined suitable indexes for inputting the regression model according to each index in the basic data.
Each predetermined index used for inputting the regression model may be one of the indexes in the basic data, or may be a linear or nonlinear combination of some of the indexes in the basic data.
For each time window, according to the basic data of the time window of the target to be predicted, the probability prediction characteristic of the time window of the target to be predicted can be obtained.
And inputting the probability prediction characteristics of each time window of the target to be predicted into the regression model corresponding to each time window, and outputting the prediction result of the default probability of each time window of the target to be predicted.
The regression model corresponding to each time window is obtained after training according to the probability prediction feature sample data of each time window and non-default or default data corresponding to the probability prediction feature sample data.
Specifically, for any target to be predicted, after the probability prediction features of each time window of the target to be predicted are obtained, the probability prediction features of each time window may be input into the regression model corresponding to the time window, and the prediction result of the default probability of each time window of the target to be predicted is output.
It can be understood that, before step S102, for each time window, the feature sample data and the non-default or default data corresponding to the sample data may be predicted according to the probability of the time window, and trained by a regression analysis method to obtain a regression model corresponding to the time window.
The regression model corresponding to each time window can describe the causal relationship between the probability prediction characteristics of the time window of the target to be predicted and the default probability of the time window.
The probability prediction feature sample data can be obtained according to basic data of a certain time window of the sample enterprise in a corresponding historical time period.
And the non-default or default data corresponding to the probability prediction characteristic sample data refers to whether the sample enterprise is default or not after the historical time period is over. If the default is default, the default data is default data, and the default data can be represented by 1; if not, it is non-violating data, which can be represented by 0.
According to the embodiment of the invention, the probability prediction characteristics of the time window of the target to be predicted are obtained according to the basic data of each time window of the target to be predicted, the prediction result of the default probability of the time window of the target to be predicted is obtained according to the probability prediction characteristics of the time window and the corresponding regression model, and the more accurate prediction result of the default probability of the time window can be obtained, so that the default probability prediction result of each target to be predicted can be obtained based on the prediction result of the default probability of each time window of each target to be predicted, the real-time performance and the accuracy of default probability prediction can be considered, and the finer default probability prediction can be realized.
Based on the content of each embodiment, the specific steps of clustering each target to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window include: and acquiring the characteristic distance between every two targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window.
Specifically, when clustering is performed on each target to be predicted, the target to be predicted is mapped into the feature space according to the prediction result of the default probability of each time window of each target to be predicted.
In the feature space, the feature distance between the two targets to be predicted is obtained according to the prediction result of the default probability of each time window of the two targets to be predicted and the weight corresponding to each time window.
The specific calculation formula of the characteristic distance between two targets to be predicted is as follows:
Figure BDA0002718522250000101
wherein, Xi,XjRespectively representing two targets to be predicted; k represents the kth time window; xik,XjkRespectively represent Xi,XjPredicting the default probability of the kth time window; w is akRepresenting the weight corresponding to the k time window; n represents the total number of time windows.
And based on a clustering algorithm, obtaining the class of each target to be predicted according to the characteristic distance between every two targets to be predicted.
Specifically, according to the characteristic distance between every two targets to be predicted, clustering is carried out based on any clustering algorithm, the targets to be predicted are clustered, the targets to be predicted are divided into a plurality of classes, and the class to which each target to be predicted belongs is determined.
The embodiment of the invention is based on a clustering algorithm, and the class to which each target to be predicted belongs is obtained according to the characteristic distance between every two targets to be predicted, so that a more accurate clustering result can be obtained, and thus, the default probability prediction result corresponding to each class can be obtained according to the clustering result and is used as the default probability prediction result of the target to be predicted in each class, the real-time performance and the accuracy of default probability prediction can be considered, and more precise default probability prediction can be realized.
Based on the content of the foregoing embodiments, before obtaining, for each target to be predicted, the prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window, the method further includes: and for each time window, performing logistic regression analysis according to the probability prediction characteristic sample data of each time window and non-default or default data corresponding to the probability prediction characteristic sample data to obtain a regression model corresponding to each time window.
Specifically, for each time window, a logistic regression analysis may be performed on the probability prediction feature sample data of the time window and non-default or default data corresponding to the probability prediction feature sample data to obtain a regression model corresponding to the time window. The regression model is a logistic regression model.
Logistic regression, also known as Logistic regression analysis, is a generalized linear regression analysis model.
The Logistic regression model can be used to predict how likely a certain condition will occur under different independent variables.
According to the embodiment of the invention, the logistic regression analysis is carried out according to the probability prediction characteristic sample data of each time window of each sample enterprise and the non-default or default data corresponding to the probability prediction characteristic sample data to obtain the regression model corresponding to the time window, so that the default probability of the target to be predicted can be predicted more accurately, and the more accurate default probability prediction result of each target to be predicted can be obtained based on the default probability prediction result of each time window of each target to be predicted and the weight corresponding to each time window.
Based on the content of the foregoing embodiments, for each time window, performing logistic regression analysis according to the probability prediction feature sample data of each time window and non-default or default data corresponding to the sample data, and before obtaining the regression model corresponding to each time window, the method further includes: and for each time window, determining the probability prediction characteristics of each time window according to the sample basic data of each time window based on a characteristic engineering method.
Specifically, each index in the basis can be screened and/or combined by a feature selection method and/or a feature dimension reduction method in a feature engineering method, and a plurality of suitable indexes for inputting the regression model are determined to be used as probability prediction features.
The feature selection can be realized by one or more combination of various filtering methods, various packaging methods and various embedding methods.
The feature dimensionality reduction can be performed by methods such as Principal Component Analysis (PCA) or Linear Discriminant Analysis (LDA).
The embodiment of the invention determines the probability prediction characteristics based on the characteristic engineering method, can reduce the number of the probability prediction characteristics, can reduce the data volume of the probability prediction and can improve the accuracy and efficiency of the probability prediction on the premise of furthest reserving the characteristics related to the default possibility.
Based on the content of the above embodiments, the clustering algorithm is a K-means clustering algorithm.
Specifically, a K-means clustering algorithm (K-means clustering algorithm) may be adopted to cluster the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window.
A data point set and the required clustering number K are given, K is specified in advance, and the K-means clustering algorithm repeatedly divides data into K clusters according to a certain distance function. Wherein K is a positive integer.
According to the embodiment of the invention, the targets to be predicted are clustered through the K-means clustering algorithm, and the class to which each target to be predicted belongs is determined, so that the default probability prediction result corresponding to each class can be obtained according to the clustering result, and can be used as the default probability prediction result of the target to be predicted in each class, and a more accurate default probability prediction result can be obtained.
Based on the content of the above embodiments, the basic data includes at least one of personnel data, financial data, business data, and business data.
Specifically, when the target to be predicted is an enterprise, the basic data may include at least one of personnel data, financial data, business data, and business data.
Personnel data, which may include the number of employees, the age of the legal, the marital status of the legal, and the like.
The financial data can comprise loan application amount, repayment records, default records and the like.
Business data, which may include performance, market value, and capital movement, among others.
The business data may include registered capital, established time, and illegal violations, among others.
According to the embodiment of the invention, at least one of the employee data, the financial data, the operation data and the industrial and commercial data is selected as the basic data, so that a more accurate default probability prediction result can be obtained based on the basic data of each time window.
The default probability prediction device provided by the embodiment of the present invention is described below, and the default probability prediction device described below and the default probability prediction method described above may be referred to in correspondence with each other.
Fig. 2 is a schematic structural diagram of a default probability prediction apparatus according to an embodiment of the present invention. Based on the content of the above embodiments, as shown in fig. 2, the apparatus includes a regression analysis module 201 and a weighted clustering module 202, wherein:
the regression analysis module 201 is configured to, for each target to be predicted, obtain a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted, and the regression model corresponding to each time window;
the weighted clustering module 202 is configured to cluster the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and obtain the default probability prediction result corresponding to each class according to the clustering result, where the default probability prediction result is used as the default probability prediction result of the target to be predicted in each class;
wherein the time durations of any two time windows are different.
Specifically, the regression analysis module 201 is electrically connected to the weighted clustering module 202.
For each time window, the regression analysis module 201 may use the basic data of the time window as the input of the regression model corresponding to the time window, or may obtain suitable data after performing data processing on the basic data of the time window as the input of the regression model corresponding to the time window; the regression model corresponding to the time window can output the prediction result of the default probability of the time window of the target to be predicted according to the input data.
The weighted clustering module 202 clusters the targets to be predicted by adopting any clustering algorithm based on the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, divides each target to be predicted into a plurality of classes, determines the class to which each target to be predicted belongs, and obtains a clustering result.
For each class obtained by clustering, the weighted clustering module 202 may obtain a probability interval as a default probability prediction result corresponding to the class by using methods such as mathematical statistics based on a prediction result of the default probability of each time window of each target to be predicted belonging to the class and a weight corresponding to each time window.
The default probability prediction apparatus provided in the embodiments of the present invention is configured to execute the default probability prediction method provided in each of the embodiments of the present invention, and specific methods and processes for implementing corresponding functions by each module included in the default probability prediction apparatus are detailed in the embodiments of the default probability prediction method, and are not described herein again.
The default probability prediction device is used in the default probability prediction methods of the foregoing embodiments. Therefore, the description and definition in the default probability prediction method in the foregoing embodiments can be used for understanding the execution modules in the embodiments of the present invention.
According to the embodiment of the invention, the prediction results of the default probability of the multiple time windows are obtained according to the basic data of the multiple time windows of the target to be predicted, clustering is carried out according to the prediction results of the default probability of the multiple time windows and the weight corresponding to the time windows, the default probability prediction result corresponding to each class is obtained and is used as the default probability prediction result of the target to be predicted in each class, the behavior change conditions of the target to be predicted in different time periods can be extracted, the real-time performance and the accuracy of default probability prediction can be considered, and more precise default probability prediction can be realized.
Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor)301, a memory (memory)302, and a bus 303; wherein, the processor 301 and the memory 302 complete the communication with each other through the bus 303; processor 301 is configured to invoke computer program instructions stored in memory 302 and executable on processor 301 to perform a method for default probability prediction for each of the method embodiments described above, the method comprising: for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window; clustering the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering result to serve as the default probability prediction result of the target to be predicted in each class; wherein the time durations of any two time windows are different.
Furthermore, the logic instructions in the memory 302 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can execute the method for predicting probability of default provided by the foregoing method embodiments, where the method includes: for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window; clustering the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering result to serve as the default probability prediction result of the target to be predicted in each class; wherein the time durations of any two time windows are different.
In yet another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the method for predicting default probability provided in the foregoing embodiments, and the method includes: for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window; clustering the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering result to serve as the default probability prediction result of the target to be predicted in each class; wherein the time durations of any two time windows are different.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for predicting a probability of breach, comprising:
for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window;
clustering the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering result to serve as the default probability prediction result of the target to be predicted in each class;
wherein the time durations of any two of the time windows are different.
2. The default probability prediction method according to claim 1, wherein the specific step of obtaining the prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window comprises:
acquiring probability prediction characteristics of each time window of the target to be predicted according to basic data of each time window of the target to be predicted;
inputting the probability prediction characteristics of each time window of the target to be predicted into a regression model corresponding to each time window, and outputting the prediction result of the default probability of each time window of the target to be predicted;
and the regression model corresponding to each time window is obtained after training according to the probability prediction feature sample data of each time window and non-default or default data corresponding to the sample data.
3. The default probability prediction method according to claim 1, wherein the specific step of clustering each target to be predicted according to the default probability prediction result of each time window of each target to be predicted and the weight corresponding to each time window comprises:
acquiring a characteristic distance between every two targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window;
and based on a clustering algorithm, obtaining the class of each target to be predicted according to the characteristic distance between every two targets to be predicted.
4. The default probability prediction method according to claim 2, wherein before obtaining the prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window, for each target to be predicted, the method further comprises:
and for each time window, performing logistic regression analysis according to the probability prediction feature sample data of each time window and non-default or default data corresponding to the sample data to obtain a regression model corresponding to each time window.
5. The default probability prediction method of claim 2, wherein before the predicting the feature sample data according to the probability of each time window and the non-default or default data corresponding to the sample data for each time window to perform the logistic regression analysis for each time window and obtain the regression model corresponding to each time window, the method further comprises:
and for each time window, determining the probability prediction characteristics of each time window according to the sample basic data of each time window based on a characteristic engineering method.
6. The default probability prediction method of claim 3, wherein the clustering algorithm is a K-means clustering algorithm.
7. The method of any one of claims 1 to 6, wherein the base data includes at least one of personnel data, financial data, business data and industrial and commercial data.
8. An apparatus for predicting a probability of breach, comprising:
the regression analysis module is used for acquiring a prediction result of the default probability of each time window of each target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window;
the weighted clustering module is used for clustering the targets to be predicted according to the prediction results of the default probability of each time window of the targets to be predicted and the weight corresponding to each time window, and acquiring default probability prediction results corresponding to each class according to the clustering results to serve as default probability prediction results of the targets to be predicted in each class;
wherein the time durations of any two of the time windows are different.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the default probability prediction method according to any one of claims 1 to 7.
10. A non-transitory computer readable storage medium, having stored thereon a computer program, which, when being executed by a processor, carries out the steps of the default probability prediction method according to any one of claims 1 to 7.
CN202011080647.1A 2020-10-10 2020-10-10 Method and device for predicting default probability Active CN112308294B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011080647.1A CN112308294B (en) 2020-10-10 2020-10-10 Method and device for predicting default probability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011080647.1A CN112308294B (en) 2020-10-10 2020-10-10 Method and device for predicting default probability

Publications (2)

Publication Number Publication Date
CN112308294A true CN112308294A (en) 2021-02-02
CN112308294B CN112308294B (en) 2024-06-14

Family

ID=74488319

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011080647.1A Active CN112308294B (en) 2020-10-10 2020-10-10 Method and device for predicting default probability

Country Status (1)

Country Link
CN (1) CN112308294B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255231A (en) * 2021-06-18 2021-08-13 腾讯科技(深圳)有限公司 Data processing method, device, equipment and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106971338A (en) * 2017-04-26 2017-07-21 北京趣拿软件科技有限公司 The method and apparatus of data assessment
CN108492001A (en) * 2018-02-13 2018-09-04 天津大学 A method of being used for guaranteed loan network risk management
CN109255506A (en) * 2018-11-22 2019-01-22 重庆邮电大学 A kind of internet finance user's overdue loan prediction technique based on big data
CN109636016A (en) * 2018-11-29 2019-04-16 深圳昆腾信息科技有限公司 A kind of Forecasting of Stock Prices method, apparatus, medium and equipment
CN109657837A (en) * 2018-11-19 2019-04-19 平安科技(深圳)有限公司 Default Probability prediction technique, device, computer equipment and storage medium
CN110058989A (en) * 2019-03-08 2019-07-26 阿里巴巴集团控股有限公司 User behavior Intention Anticipation method and apparatus
CN110147940A (en) * 2019-04-26 2019-08-20 阿里巴巴集团控股有限公司 A kind of risk control processing method, equipment, medium and device
CN110246031A (en) * 2019-06-21 2019-09-17 深圳前海微众银行股份有限公司 Appraisal procedure, system, equipment and the storage medium of business standing
WO2020088007A1 (en) * 2018-10-30 2020-05-07 阿里巴巴集团控股有限公司 Method and device for determining consumer financial default risk
CN111191825A (en) * 2019-12-20 2020-05-22 北京淇瑀信息科技有限公司 User default prediction method and device and electronic equipment
CN111192140A (en) * 2020-01-02 2020-05-22 北京明略软件***有限公司 Method and device for predicting customer default probability
CN111324862A (en) * 2020-02-10 2020-06-23 深圳华策辉弘科技有限公司 Method and system for monitoring behavior in loan

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106971338A (en) * 2017-04-26 2017-07-21 北京趣拿软件科技有限公司 The method and apparatus of data assessment
CN108492001A (en) * 2018-02-13 2018-09-04 天津大学 A method of being used for guaranteed loan network risk management
WO2020088007A1 (en) * 2018-10-30 2020-05-07 阿里巴巴集团控股有限公司 Method and device for determining consumer financial default risk
CN109657837A (en) * 2018-11-19 2019-04-19 平安科技(深圳)有限公司 Default Probability prediction technique, device, computer equipment and storage medium
CN109255506A (en) * 2018-11-22 2019-01-22 重庆邮电大学 A kind of internet finance user's overdue loan prediction technique based on big data
CN109636016A (en) * 2018-11-29 2019-04-16 深圳昆腾信息科技有限公司 A kind of Forecasting of Stock Prices method, apparatus, medium and equipment
CN110058989A (en) * 2019-03-08 2019-07-26 阿里巴巴集团控股有限公司 User behavior Intention Anticipation method and apparatus
CN110147940A (en) * 2019-04-26 2019-08-20 阿里巴巴集团控股有限公司 A kind of risk control processing method, equipment, medium and device
CN110246031A (en) * 2019-06-21 2019-09-17 深圳前海微众银行股份有限公司 Appraisal procedure, system, equipment and the storage medium of business standing
CN111191825A (en) * 2019-12-20 2020-05-22 北京淇瑀信息科技有限公司 User default prediction method and device and electronic equipment
CN111192140A (en) * 2020-01-02 2020-05-22 北京明略软件***有限公司 Method and device for predicting customer default probability
CN111324862A (en) * 2020-02-10 2020-06-23 深圳华策辉弘科技有限公司 Method and system for monitoring behavior in loan

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255231A (en) * 2021-06-18 2021-08-13 腾讯科技(深圳)有限公司 Data processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN112308294B (en) 2024-06-14

Similar Documents

Publication Publication Date Title
CN112258093B (en) Data processing method and device for risk level, storage medium and electronic equipment
WO2007106787A2 (en) Methods and systems for characteristic leveling
CN108573358A (en) A kind of overdue prediction model generation method and terminal device
CN110766481A (en) Client data processing method and device, electronic equipment and computer readable medium
CN115983900A (en) Method, apparatus, device, medium, and program product for constructing user marketing strategy
CN117235608B (en) Risk detection method, risk detection device, electronic equipment and storage medium
CN112308293B (en) Method and device for predicting default probability
CN114118570A (en) Service data prediction method and device, electronic equipment and storage medium
CN112308294B (en) Method and device for predicting default probability
US20210357699A1 (en) Data quality assessment for data analytics
RU2632124C1 (en) Method of predictive assessment of multi-stage process effectiveness
CN112348685A (en) Credit scoring method, device, equipment and storage medium
CN112712270B (en) Information processing method, device, equipment and storage medium
CN115600818A (en) Multi-dimensional scoring method and device, electronic equipment and storage medium
CN112446505B (en) Meta learning modeling method and device, electronic equipment and storage medium
CN113850483A (en) Enterprise credit risk rating system
CN114443409A (en) Payment business system monitoring method, device and equipment and computer storage medium
CN115146890A (en) Enterprise operation risk warning method and device, computer equipment and storage medium
Addabbo et al. Children capabilities and family characteristics in Italy
EP4372593A1 (en) Method and system for anonymizsing data
US11688113B1 (en) Systems and methods for generating a single-index model tree
CN117743945A (en) Policy risk level classification method and device, electronic equipment and storage medium
CN115131138A (en) Credit assessment method, device, equipment and medium based on enterprise financial stability
CN117764708A (en) Method and device for predicting default
CN116645015A (en) Model construction method, patent value evaluation method, system, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant