CN112308294A - Default probability prediction method and device - Google Patents
Default probability prediction method and device Download PDFInfo
- Publication number
- CN112308294A CN112308294A CN202011080647.1A CN202011080647A CN112308294A CN 112308294 A CN112308294 A CN 112308294A CN 202011080647 A CN202011080647 A CN 202011080647A CN 112308294 A CN112308294 A CN 112308294A
- Authority
- CN
- China
- Prior art keywords
- time window
- predicted
- target
- default
- default probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 238000004422 calculation algorithm Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 12
- 238000007477 logistic regression Methods 0.000 claims description 12
- 238000003064 k means clustering Methods 0.000 claims description 7
- 238000000611 regression analysis Methods 0.000 claims description 7
- 238000012407 engineering method Methods 0.000 claims description 5
- 238000012549 training Methods 0.000 claims description 3
- 230000006399 behavior Effects 0.000 description 9
- 230000007774 longterm Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000001364 causal effect Effects 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Strategic Management (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Operations Research (AREA)
- Pure & Applied Mathematics (AREA)
- Economics (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Development Economics (AREA)
- Algebra (AREA)
- Game Theory and Decision Science (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the invention provides a default probability prediction method and a device, wherein the method comprises the following steps: for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window; clustering the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering result to serve as the default probability prediction result of the target to be predicted in each class; wherein the time durations of any two time windows are different. The default probability prediction method and the device provided by the embodiment of the invention can give consideration to both real-time performance and accuracy of default probability prediction and can realize more detailed default probability prediction.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a default probability prediction method and device.
Background
In predicting the probability of breach for a business or individual, this is typically done based on data collected over a fixed time window.
The behavior of a business (or person) at different points in time varies greatly, either for internal reasons or for external reasons. A single time window can smooth the behavior change situation of the enterprise (or individual) behavior in the time window (especially under the condition that the time period of the time window is long), and cannot reflect the real-time change of the default possibility of the enterprise (or individual); although the real-time performance is strong in the time window with short time, the default probability prediction result may have mutation, and the real default possibility of an enterprise (or an individual) cannot be reflected.
Disclosure of Invention
The embodiment of the invention provides a default probability prediction method and device, which are used for overcoming the defect that the real-time performance and the accuracy of default probability prediction are difficult to be considered in the prior art and realizing more precise default probability prediction.
The embodiment of the invention provides a default probability prediction method, which comprises the following steps:
for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window;
clustering the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering result to serve as the default probability prediction result of the target to be predicted in each class;
wherein the time durations of any two of the time windows are different.
According to the default probability prediction method of one embodiment of the present invention, the specific step of obtaining the prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window includes:
acquiring probability prediction characteristics of each time window of the target to be predicted according to basic data of each time window of the target to be predicted;
inputting the probability prediction characteristics of each time window of the target to be predicted into a regression model corresponding to each time window, and outputting the prediction result of the default probability of each time window of the target to be predicted;
and the regression model corresponding to each time window is obtained after training according to the probability prediction feature sample data of each time window and non-default or default data corresponding to the sample data.
According to the default probability prediction method of one embodiment of the present invention, the specific steps of clustering each target to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window include:
acquiring a characteristic distance between every two targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window;
and based on a clustering algorithm, obtaining the class of each target to be predicted according to the characteristic distance between every two targets to be predicted.
According to the default probability prediction method of an embodiment of the present invention, before obtaining the prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window, the method further includes:
and for each time window, performing logistic regression analysis according to the probability prediction feature sample data of each time window and non-default or default data corresponding to the sample data to obtain a regression model corresponding to each time window.
According to the default probability prediction method of an embodiment of the present invention, before the predicting the feature sample data according to the probability of each time window and the non-default or default data corresponding to the sample data for each time window to perform the logistic regression analysis, and obtaining the regression model corresponding to each time window, the method further includes:
and for each time window, determining the probability prediction characteristics of each time window according to the sample basic data of each time window based on a characteristic engineering method.
According to the default probability prediction method of one embodiment of the invention, the clustering algorithm is a K-means clustering algorithm.
According to the default probability prediction method of one embodiment of the invention, the basic data comprises at least one of personnel data, financial data, business data and industrial and commercial data.
An embodiment of the present invention further provides a default probability prediction apparatus, including:
the regression analysis module is used for acquiring a prediction result of the default probability of each time window of each target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window;
the weighted clustering module is used for clustering the targets to be predicted according to the prediction results of the default probability of each time window of the targets to be predicted and the weight corresponding to each time window, and acquiring default probability prediction results corresponding to each class according to the clustering results to serve as default probability prediction results of the targets to be predicted in each class;
wherein the time durations of any two of the time windows are different.
An embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of any of the foregoing default probability prediction methods when executing the program.
Embodiments of the present invention also provide a non-transitory computer readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing the steps of the default probability prediction method according to any one of the above.
According to the default probability prediction method and device provided by the embodiment of the invention, the prediction results of the default probabilities of a plurality of time windows are obtained according to the basic data of the plurality of time windows of the target to be predicted, clustering is carried out according to the prediction results of the default probabilities of the plurality of time windows and the weights corresponding to the time windows, the default probability prediction result corresponding to each class is obtained and is used as the default probability prediction result of the target to be predicted in each class, the behavior change conditions of the target to be predicted in different time periods can be extracted, the real-time performance and the accuracy of default probability prediction can be considered, and more fine default probability prediction can be realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart illustrating a method for default probability prediction according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a default probability prediction apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to overcome the above problems in the prior art, embodiments of the present invention provide a default probability prediction method and apparatus, and the inventive concept is to divide data acquisition into a plurality of time windows, more accurately depict the behavior of the target to be predicted in a long-term, a medium-term, and a short-term, and synthesize default probability predictions of each time window to perform comprehensive predictions of default probabilities of the target to be predicted.
Fig. 1 is a schematic flowchart of a default probability prediction method according to an embodiment of the present invention. The default probability prediction method according to the embodiment of the present invention is described below with reference to fig. 1. As shown in fig. 1, the method includes: step S101, for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window.
Wherein the time durations of any two time windows are different.
Specifically, the time window refers to a time period, which is generally a time period from a time point before the current time point to the current time point.
In order to more accurately characterize the behavior of the target to be predicted in different periods of time, such as long-term, medium-term, short-term, etc., a plurality of time windows with mutually different durations may be selected in advance.
For example, 3 time windows can be selected, the duration of each time window is 5 years, 2 years and 1 year, and the time windows correspond to a long-term time window, a medium-term time window and a short-term time window; 5 time windows may also be selected. The time lengths are respectively 8 years, 5 years, 3 years, 2 years and 1 year, and correspond to a long-term time window, a medium-short term time window and a short-term time window.
The basic data refers to data of personnel, funds, transactions, intellectual property rights and the like which have certain relevance with the risk condition of the target to be predicted.
The target to be predicted can be an entity such as a business or an individual.
The basic data may include one or more items.
For each time window, the basic data of the time window may be used as the input of the regression model corresponding to the time window, or after the basic data of the time window is subjected to data processing, appropriate data may be obtained as the input of the regression model corresponding to the time window.
Regression models are a mathematical model that quantitatively describes statistical relationships. Regression models are predictive modeling techniques that study the relationship between dependent variables (targets) and independent variables (predictors). This technique is commonly used for predictive analysis, time series modeling, and discovering causal relationships between variables.
For each time window, the regression model corresponding to the time window can output the prediction result of the default probability of the time window of the target to be predicted according to the input data.
Through the steps, the prediction result of the default probability of each target to be predicted in each time window can be obtained.
Step S102, clustering the targets to be predicted according to the prediction results of the default probability of each time window of the targets to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering results to serve as the default probability prediction result of the targets to be predicted in each class.
Specifically, the weight corresponding to each time window may be determined according to the type and the prediction requirement of the target to be predicted.
For example: for a large enterprise, the weights w1, w2 and w3 corresponding to the long-term time window, the medium-term time window and the short-term time window are 0.5, 0.3 and 0.2 respectively; for small and medium enterprises, the weights w1, w2 and w3 corresponding to the long-term time window, the medium-term time window and the short-term time window are 0.15, 0.25 and 0.6 respectively.
Based on the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, clustering the targets to be predicted by adopting any unsupervised clustering algorithm, dividing the targets to be predicted into a plurality of classes, determining the class to which each target to be predicted belongs, and obtaining a clustering result.
For each class obtained through clustering, a probability interval can be obtained by adopting methods such as mathematical statistics and the like based on the prediction result of the default probability of each time window of each target to be predicted belonging to the class and the weight corresponding to each time window, and the probability interval is used as the default probability prediction result corresponding to the class.
For example, for each time window, the upper and lower probability limits of the time window may be determined by using a method such as mathematical statistics based on the predicted result of the default probability of the time window of each object to be predicted belonging to the category, for example, the maximum and minimum values in the predicted result of the default probability of the time window of each object to be predicted may be respectively used as the upper and lower probability limits of the time window, or the upper and lower probability limits of the time window may be obtained by respectively adding and subtracting several times of standard deviation to and from the average value of the predicted result of the default probability of the time window of each object to be predicted; and acquiring the upper limit of the probability interval according to the weight and the probability upper limit corresponding to each time window, and acquiring the lower limit of the probability interval according to the weight and the probability lower limit corresponding to each time window, thereby acquiring the probability interval.
For each class obtained by clustering, after the probability interval of each time window is obtained through the steps, statistical index analysis can be carried out according to the key characteristics of each target to be predicted of the class and the time window, and the probability interval of the time window obtained before can be corrected.
And comparing the statistical indexes of the key characteristics of the time window of each object to be predicted with the overall statistical indexes of the key characteristics of the time window of all objects to be predicted, and judging whether the statistical indexes of the key characteristics of the time window of each object to be predicted obviously deviate from the overall statistical indexes.
If the deviation exists, the targets to be predicted of the class are considered to be distributed as abnormal points compared with the whole, and correction is carried out; if not, no correction is made.
If the critical characteristic is obviously higher than the upper limit, correspondingly increasing or decreasing the upper limit and the lower limit of the probability interval of the time window, which are acquired before the critical characteristic is a positive or negative indicator;
if the critical characteristic is obviously lower than the threshold, based on whether the critical characteristic is a positive or negative indicator, the upper limit and the lower limit of the probability interval of the time window acquired before the critical characteristic is correspondingly adjusted down or up.
For example, the target to be predicted is a business, and the key characteristics can be main characteristics such as the amount of registered capital, the business operation duration, the business operation performance stability and the like. The statistical indicator may be a mean or median, etc. The condition for determining a significant deviation may include whether the distribution is below or above the 3 standard deviation range of the overall distribution.
It should be noted that compared with the target to be predicted belonging to another class, the targets to be predicted belonging to the same class have higher similarity in behavior and default probability, so that the default probability prediction result corresponding to each class can be used as the default probability prediction result of the target to be predicted in each class.
According to the embodiment of the invention, the prediction results of the default probability of the multiple time windows are obtained according to the basic data of the multiple time windows of the target to be predicted, clustering is carried out according to the prediction results of the default probability of the multiple time windows and the weight corresponding to the time windows, the default probability prediction result corresponding to each class is obtained and is used as the default probability prediction result of the target to be predicted in each class, the behavior change conditions of the target to be predicted in different time periods can be extracted, the real-time performance and the accuracy of default probability prediction can be considered, and more precise default probability prediction can be realized.
Based on the content of the above embodiments, the specific step of obtaining the prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window includes: and acquiring the probability prediction characteristics of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted.
Specifically, the probabilistic predictive feature is a plurality of predetermined suitable indexes for inputting the regression model according to each index in the basic data.
Each predetermined index used for inputting the regression model may be one of the indexes in the basic data, or may be a linear or nonlinear combination of some of the indexes in the basic data.
For each time window, according to the basic data of the time window of the target to be predicted, the probability prediction characteristic of the time window of the target to be predicted can be obtained.
And inputting the probability prediction characteristics of each time window of the target to be predicted into the regression model corresponding to each time window, and outputting the prediction result of the default probability of each time window of the target to be predicted.
The regression model corresponding to each time window is obtained after training according to the probability prediction feature sample data of each time window and non-default or default data corresponding to the probability prediction feature sample data.
Specifically, for any target to be predicted, after the probability prediction features of each time window of the target to be predicted are obtained, the probability prediction features of each time window may be input into the regression model corresponding to the time window, and the prediction result of the default probability of each time window of the target to be predicted is output.
It can be understood that, before step S102, for each time window, the feature sample data and the non-default or default data corresponding to the sample data may be predicted according to the probability of the time window, and trained by a regression analysis method to obtain a regression model corresponding to the time window.
The regression model corresponding to each time window can describe the causal relationship between the probability prediction characteristics of the time window of the target to be predicted and the default probability of the time window.
The probability prediction feature sample data can be obtained according to basic data of a certain time window of the sample enterprise in a corresponding historical time period.
And the non-default or default data corresponding to the probability prediction characteristic sample data refers to whether the sample enterprise is default or not after the historical time period is over. If the default is default, the default data is default data, and the default data can be represented by 1; if not, it is non-violating data, which can be represented by 0.
According to the embodiment of the invention, the probability prediction characteristics of the time window of the target to be predicted are obtained according to the basic data of each time window of the target to be predicted, the prediction result of the default probability of the time window of the target to be predicted is obtained according to the probability prediction characteristics of the time window and the corresponding regression model, and the more accurate prediction result of the default probability of the time window can be obtained, so that the default probability prediction result of each target to be predicted can be obtained based on the prediction result of the default probability of each time window of each target to be predicted, the real-time performance and the accuracy of default probability prediction can be considered, and the finer default probability prediction can be realized.
Based on the content of each embodiment, the specific steps of clustering each target to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window include: and acquiring the characteristic distance between every two targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window.
Specifically, when clustering is performed on each target to be predicted, the target to be predicted is mapped into the feature space according to the prediction result of the default probability of each time window of each target to be predicted.
In the feature space, the feature distance between the two targets to be predicted is obtained according to the prediction result of the default probability of each time window of the two targets to be predicted and the weight corresponding to each time window.
The specific calculation formula of the characteristic distance between two targets to be predicted is as follows:
wherein, Xi,XjRespectively representing two targets to be predicted; k represents the kth time window; xik,XjkRespectively represent Xi,XjPredicting the default probability of the kth time window; w is akRepresenting the weight corresponding to the k time window; n represents the total number of time windows.
And based on a clustering algorithm, obtaining the class of each target to be predicted according to the characteristic distance between every two targets to be predicted.
Specifically, according to the characteristic distance between every two targets to be predicted, clustering is carried out based on any clustering algorithm, the targets to be predicted are clustered, the targets to be predicted are divided into a plurality of classes, and the class to which each target to be predicted belongs is determined.
The embodiment of the invention is based on a clustering algorithm, and the class to which each target to be predicted belongs is obtained according to the characteristic distance between every two targets to be predicted, so that a more accurate clustering result can be obtained, and thus, the default probability prediction result corresponding to each class can be obtained according to the clustering result and is used as the default probability prediction result of the target to be predicted in each class, the real-time performance and the accuracy of default probability prediction can be considered, and more precise default probability prediction can be realized.
Based on the content of the foregoing embodiments, before obtaining, for each target to be predicted, the prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window, the method further includes: and for each time window, performing logistic regression analysis according to the probability prediction characteristic sample data of each time window and non-default or default data corresponding to the probability prediction characteristic sample data to obtain a regression model corresponding to each time window.
Specifically, for each time window, a logistic regression analysis may be performed on the probability prediction feature sample data of the time window and non-default or default data corresponding to the probability prediction feature sample data to obtain a regression model corresponding to the time window. The regression model is a logistic regression model.
Logistic regression, also known as Logistic regression analysis, is a generalized linear regression analysis model.
The Logistic regression model can be used to predict how likely a certain condition will occur under different independent variables.
According to the embodiment of the invention, the logistic regression analysis is carried out according to the probability prediction characteristic sample data of each time window of each sample enterprise and the non-default or default data corresponding to the probability prediction characteristic sample data to obtain the regression model corresponding to the time window, so that the default probability of the target to be predicted can be predicted more accurately, and the more accurate default probability prediction result of each target to be predicted can be obtained based on the default probability prediction result of each time window of each target to be predicted and the weight corresponding to each time window.
Based on the content of the foregoing embodiments, for each time window, performing logistic regression analysis according to the probability prediction feature sample data of each time window and non-default or default data corresponding to the sample data, and before obtaining the regression model corresponding to each time window, the method further includes: and for each time window, determining the probability prediction characteristics of each time window according to the sample basic data of each time window based on a characteristic engineering method.
Specifically, each index in the basis can be screened and/or combined by a feature selection method and/or a feature dimension reduction method in a feature engineering method, and a plurality of suitable indexes for inputting the regression model are determined to be used as probability prediction features.
The feature selection can be realized by one or more combination of various filtering methods, various packaging methods and various embedding methods.
The feature dimensionality reduction can be performed by methods such as Principal Component Analysis (PCA) or Linear Discriminant Analysis (LDA).
The embodiment of the invention determines the probability prediction characteristics based on the characteristic engineering method, can reduce the number of the probability prediction characteristics, can reduce the data volume of the probability prediction and can improve the accuracy and efficiency of the probability prediction on the premise of furthest reserving the characteristics related to the default possibility.
Based on the content of the above embodiments, the clustering algorithm is a K-means clustering algorithm.
Specifically, a K-means clustering algorithm (K-means clustering algorithm) may be adopted to cluster the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window.
A data point set and the required clustering number K are given, K is specified in advance, and the K-means clustering algorithm repeatedly divides data into K clusters according to a certain distance function. Wherein K is a positive integer.
According to the embodiment of the invention, the targets to be predicted are clustered through the K-means clustering algorithm, and the class to which each target to be predicted belongs is determined, so that the default probability prediction result corresponding to each class can be obtained according to the clustering result, and can be used as the default probability prediction result of the target to be predicted in each class, and a more accurate default probability prediction result can be obtained.
Based on the content of the above embodiments, the basic data includes at least one of personnel data, financial data, business data, and business data.
Specifically, when the target to be predicted is an enterprise, the basic data may include at least one of personnel data, financial data, business data, and business data.
Personnel data, which may include the number of employees, the age of the legal, the marital status of the legal, and the like.
The financial data can comprise loan application amount, repayment records, default records and the like.
Business data, which may include performance, market value, and capital movement, among others.
The business data may include registered capital, established time, and illegal violations, among others.
According to the embodiment of the invention, at least one of the employee data, the financial data, the operation data and the industrial and commercial data is selected as the basic data, so that a more accurate default probability prediction result can be obtained based on the basic data of each time window.
The default probability prediction device provided by the embodiment of the present invention is described below, and the default probability prediction device described below and the default probability prediction method described above may be referred to in correspondence with each other.
Fig. 2 is a schematic structural diagram of a default probability prediction apparatus according to an embodiment of the present invention. Based on the content of the above embodiments, as shown in fig. 2, the apparatus includes a regression analysis module 201 and a weighted clustering module 202, wherein:
the regression analysis module 201 is configured to, for each target to be predicted, obtain a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted, and the regression model corresponding to each time window;
the weighted clustering module 202 is configured to cluster the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and obtain the default probability prediction result corresponding to each class according to the clustering result, where the default probability prediction result is used as the default probability prediction result of the target to be predicted in each class;
wherein the time durations of any two time windows are different.
Specifically, the regression analysis module 201 is electrically connected to the weighted clustering module 202.
For each time window, the regression analysis module 201 may use the basic data of the time window as the input of the regression model corresponding to the time window, or may obtain suitable data after performing data processing on the basic data of the time window as the input of the regression model corresponding to the time window; the regression model corresponding to the time window can output the prediction result of the default probability of the time window of the target to be predicted according to the input data.
The weighted clustering module 202 clusters the targets to be predicted by adopting any clustering algorithm based on the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, divides each target to be predicted into a plurality of classes, determines the class to which each target to be predicted belongs, and obtains a clustering result.
For each class obtained by clustering, the weighted clustering module 202 may obtain a probability interval as a default probability prediction result corresponding to the class by using methods such as mathematical statistics based on a prediction result of the default probability of each time window of each target to be predicted belonging to the class and a weight corresponding to each time window.
The default probability prediction apparatus provided in the embodiments of the present invention is configured to execute the default probability prediction method provided in each of the embodiments of the present invention, and specific methods and processes for implementing corresponding functions by each module included in the default probability prediction apparatus are detailed in the embodiments of the default probability prediction method, and are not described herein again.
The default probability prediction device is used in the default probability prediction methods of the foregoing embodiments. Therefore, the description and definition in the default probability prediction method in the foregoing embodiments can be used for understanding the execution modules in the embodiments of the present invention.
According to the embodiment of the invention, the prediction results of the default probability of the multiple time windows are obtained according to the basic data of the multiple time windows of the target to be predicted, clustering is carried out according to the prediction results of the default probability of the multiple time windows and the weight corresponding to the time windows, the default probability prediction result corresponding to each class is obtained and is used as the default probability prediction result of the target to be predicted in each class, the behavior change conditions of the target to be predicted in different time periods can be extracted, the real-time performance and the accuracy of default probability prediction can be considered, and more precise default probability prediction can be realized.
Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor)301, a memory (memory)302, and a bus 303; wherein, the processor 301 and the memory 302 complete the communication with each other through the bus 303; processor 301 is configured to invoke computer program instructions stored in memory 302 and executable on processor 301 to perform a method for default probability prediction for each of the method embodiments described above, the method comprising: for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window; clustering the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering result to serve as the default probability prediction result of the target to be predicted in each class; wherein the time durations of any two time windows are different.
Furthermore, the logic instructions in the memory 302 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can execute the method for predicting probability of default provided by the foregoing method embodiments, where the method includes: for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window; clustering the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering result to serve as the default probability prediction result of the target to be predicted in each class; wherein the time durations of any two time windows are different.
In yet another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the method for predicting default probability provided in the foregoing embodiments, and the method includes: for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window; clustering the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering result to serve as the default probability prediction result of the target to be predicted in each class; wherein the time durations of any two time windows are different.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A method for predicting a probability of breach, comprising:
for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window;
clustering the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering result to serve as the default probability prediction result of the target to be predicted in each class;
wherein the time durations of any two of the time windows are different.
2. The default probability prediction method according to claim 1, wherein the specific step of obtaining the prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window comprises:
acquiring probability prediction characteristics of each time window of the target to be predicted according to basic data of each time window of the target to be predicted;
inputting the probability prediction characteristics of each time window of the target to be predicted into a regression model corresponding to each time window, and outputting the prediction result of the default probability of each time window of the target to be predicted;
and the regression model corresponding to each time window is obtained after training according to the probability prediction feature sample data of each time window and non-default or default data corresponding to the sample data.
3. The default probability prediction method according to claim 1, wherein the specific step of clustering each target to be predicted according to the default probability prediction result of each time window of each target to be predicted and the weight corresponding to each time window comprises:
acquiring a characteristic distance between every two targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window;
and based on a clustering algorithm, obtaining the class of each target to be predicted according to the characteristic distance between every two targets to be predicted.
4. The default probability prediction method according to claim 2, wherein before obtaining the prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window, for each target to be predicted, the method further comprises:
and for each time window, performing logistic regression analysis according to the probability prediction feature sample data of each time window and non-default or default data corresponding to the sample data to obtain a regression model corresponding to each time window.
5. The default probability prediction method of claim 2, wherein before the predicting the feature sample data according to the probability of each time window and the non-default or default data corresponding to the sample data for each time window to perform the logistic regression analysis for each time window and obtain the regression model corresponding to each time window, the method further comprises:
and for each time window, determining the probability prediction characteristics of each time window according to the sample basic data of each time window based on a characteristic engineering method.
6. The default probability prediction method of claim 3, wherein the clustering algorithm is a K-means clustering algorithm.
7. The method of any one of claims 1 to 6, wherein the base data includes at least one of personnel data, financial data, business data and industrial and commercial data.
8. An apparatus for predicting a probability of breach, comprising:
the regression analysis module is used for acquiring a prediction result of the default probability of each time window of each target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window;
the weighted clustering module is used for clustering the targets to be predicted according to the prediction results of the default probability of each time window of the targets to be predicted and the weight corresponding to each time window, and acquiring default probability prediction results corresponding to each class according to the clustering results to serve as default probability prediction results of the targets to be predicted in each class;
wherein the time durations of any two of the time windows are different.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the default probability prediction method according to any one of claims 1 to 7.
10. A non-transitory computer readable storage medium, having stored thereon a computer program, which, when being executed by a processor, carries out the steps of the default probability prediction method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011080647.1A CN112308294B (en) | 2020-10-10 | 2020-10-10 | Method and device for predicting default probability |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011080647.1A CN112308294B (en) | 2020-10-10 | 2020-10-10 | Method and device for predicting default probability |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112308294A true CN112308294A (en) | 2021-02-02 |
CN112308294B CN112308294B (en) | 2024-06-14 |
Family
ID=74488319
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011080647.1A Active CN112308294B (en) | 2020-10-10 | 2020-10-10 | Method and device for predicting default probability |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112308294B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255231A (en) * | 2021-06-18 | 2021-08-13 | 腾讯科技(深圳)有限公司 | Data processing method, device, equipment and storage medium |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106971338A (en) * | 2017-04-26 | 2017-07-21 | 北京趣拿软件科技有限公司 | The method and apparatus of data assessment |
CN108492001A (en) * | 2018-02-13 | 2018-09-04 | 天津大学 | A method of being used for guaranteed loan network risk management |
CN109255506A (en) * | 2018-11-22 | 2019-01-22 | 重庆邮电大学 | A kind of internet finance user's overdue loan prediction technique based on big data |
CN109636016A (en) * | 2018-11-29 | 2019-04-16 | 深圳昆腾信息科技有限公司 | A kind of Forecasting of Stock Prices method, apparatus, medium and equipment |
CN109657837A (en) * | 2018-11-19 | 2019-04-19 | 平安科技(深圳)有限公司 | Default Probability prediction technique, device, computer equipment and storage medium |
CN110058989A (en) * | 2019-03-08 | 2019-07-26 | 阿里巴巴集团控股有限公司 | User behavior Intention Anticipation method and apparatus |
CN110147940A (en) * | 2019-04-26 | 2019-08-20 | 阿里巴巴集团控股有限公司 | A kind of risk control processing method, equipment, medium and device |
CN110246031A (en) * | 2019-06-21 | 2019-09-17 | 深圳前海微众银行股份有限公司 | Appraisal procedure, system, equipment and the storage medium of business standing |
WO2020088007A1 (en) * | 2018-10-30 | 2020-05-07 | 阿里巴巴集团控股有限公司 | Method and device for determining consumer financial default risk |
CN111191825A (en) * | 2019-12-20 | 2020-05-22 | 北京淇瑀信息科技有限公司 | User default prediction method and device and electronic equipment |
CN111192140A (en) * | 2020-01-02 | 2020-05-22 | 北京明略软件***有限公司 | Method and device for predicting customer default probability |
CN111324862A (en) * | 2020-02-10 | 2020-06-23 | 深圳华策辉弘科技有限公司 | Method and system for monitoring behavior in loan |
-
2020
- 2020-10-10 CN CN202011080647.1A patent/CN112308294B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106971338A (en) * | 2017-04-26 | 2017-07-21 | 北京趣拿软件科技有限公司 | The method and apparatus of data assessment |
CN108492001A (en) * | 2018-02-13 | 2018-09-04 | 天津大学 | A method of being used for guaranteed loan network risk management |
WO2020088007A1 (en) * | 2018-10-30 | 2020-05-07 | 阿里巴巴集团控股有限公司 | Method and device for determining consumer financial default risk |
CN109657837A (en) * | 2018-11-19 | 2019-04-19 | 平安科技(深圳)有限公司 | Default Probability prediction technique, device, computer equipment and storage medium |
CN109255506A (en) * | 2018-11-22 | 2019-01-22 | 重庆邮电大学 | A kind of internet finance user's overdue loan prediction technique based on big data |
CN109636016A (en) * | 2018-11-29 | 2019-04-16 | 深圳昆腾信息科技有限公司 | A kind of Forecasting of Stock Prices method, apparatus, medium and equipment |
CN110058989A (en) * | 2019-03-08 | 2019-07-26 | 阿里巴巴集团控股有限公司 | User behavior Intention Anticipation method and apparatus |
CN110147940A (en) * | 2019-04-26 | 2019-08-20 | 阿里巴巴集团控股有限公司 | A kind of risk control processing method, equipment, medium and device |
CN110246031A (en) * | 2019-06-21 | 2019-09-17 | 深圳前海微众银行股份有限公司 | Appraisal procedure, system, equipment and the storage medium of business standing |
CN111191825A (en) * | 2019-12-20 | 2020-05-22 | 北京淇瑀信息科技有限公司 | User default prediction method and device and electronic equipment |
CN111192140A (en) * | 2020-01-02 | 2020-05-22 | 北京明略软件***有限公司 | Method and device for predicting customer default probability |
CN111324862A (en) * | 2020-02-10 | 2020-06-23 | 深圳华策辉弘科技有限公司 | Method and system for monitoring behavior in loan |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255231A (en) * | 2021-06-18 | 2021-08-13 | 腾讯科技(深圳)有限公司 | Data processing method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112308294B (en) | 2024-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112258093B (en) | Data processing method and device for risk level, storage medium and electronic equipment | |
WO2007106787A2 (en) | Methods and systems for characteristic leveling | |
CN108573358A (en) | A kind of overdue prediction model generation method and terminal device | |
CN110766481A (en) | Client data processing method and device, electronic equipment and computer readable medium | |
CN115983900A (en) | Method, apparatus, device, medium, and program product for constructing user marketing strategy | |
CN117235608B (en) | Risk detection method, risk detection device, electronic equipment and storage medium | |
CN112308293B (en) | Method and device for predicting default probability | |
CN114118570A (en) | Service data prediction method and device, electronic equipment and storage medium | |
CN112308294B (en) | Method and device for predicting default probability | |
US20210357699A1 (en) | Data quality assessment for data analytics | |
RU2632124C1 (en) | Method of predictive assessment of multi-stage process effectiveness | |
CN112348685A (en) | Credit scoring method, device, equipment and storage medium | |
CN112712270B (en) | Information processing method, device, equipment and storage medium | |
CN115600818A (en) | Multi-dimensional scoring method and device, electronic equipment and storage medium | |
CN112446505B (en) | Meta learning modeling method and device, electronic equipment and storage medium | |
CN113850483A (en) | Enterprise credit risk rating system | |
CN114443409A (en) | Payment business system monitoring method, device and equipment and computer storage medium | |
CN115146890A (en) | Enterprise operation risk warning method and device, computer equipment and storage medium | |
Addabbo et al. | Children capabilities and family characteristics in Italy | |
EP4372593A1 (en) | Method and system for anonymizsing data | |
US11688113B1 (en) | Systems and methods for generating a single-index model tree | |
CN117743945A (en) | Policy risk level classification method and device, electronic equipment and storage medium | |
CN115131138A (en) | Credit assessment method, device, equipment and medium based on enterprise financial stability | |
CN117764708A (en) | Method and device for predicting default | |
CN116645015A (en) | Model construction method, patent value evaluation method, system, terminal and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |