CN112308294A

CN112308294A - Default probability prediction method and device

Info

Publication number: CN112308294A
Application number: CN202011080647.1A
Authority: CN
Inventors: 贺欧文; 卜志成
Original assignee: Beijing Shell Time Network Technology Co ltd
Current assignee: Beijing Shell Time Network Technology Co ltd
Priority date: 2020-10-10
Filing date: 2020-10-10
Publication date: 2021-02-02
Anticipated expiration: 2040-10-10
Also published as: CN112308294B

Abstract

The embodiment of the invention provides a default probability prediction method and a device, wherein the method comprises the following steps: for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window; clustering the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering result to serve as the default probability prediction result of the target to be predicted in each class; wherein the time durations of any two time windows are different. The default probability prediction method and the device provided by the embodiment of the invention can give consideration to both real-time performance and accuracy of default probability prediction and can realize more detailed default probability prediction.

Description

Default probability prediction method and device

Technical Field

The invention relates to the technical field of computers, in particular to a default probability prediction method and device.

Background

In predicting the probability of breach for a business or individual, this is typically done based on data collected over a fixed time window.

The behavior of a business (or person) at different points in time varies greatly, either for internal reasons or for external reasons. A single time window can smooth the behavior change situation of the enterprise (or individual) behavior in the time window (especially under the condition that the time period of the time window is long), and cannot reflect the real-time change of the default possibility of the enterprise (or individual); although the real-time performance is strong in the time window with short time, the default probability prediction result may have mutation, and the real default possibility of an enterprise (or an individual) cannot be reflected.

Disclosure of Invention

The embodiment of the invention provides a default probability prediction method and device, which are used for overcoming the defect that the real-time performance and the accuracy of default probability prediction are difficult to be considered in the prior art and realizing more precise default probability prediction.

The embodiment of the invention provides a default probability prediction method, which comprises the following steps:

for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window;

clustering the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering result to serve as the default probability prediction result of the target to be predicted in each class;

wherein the time durations of any two of the time windows are different.

According to the default probability prediction method of one embodiment of the present invention, the specific step of obtaining the prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window includes:

acquiring probability prediction characteristics of each time window of the target to be predicted according to basic data of each time window of the target to be predicted;

inputting the probability prediction characteristics of each time window of the target to be predicted into a regression model corresponding to each time window, and outputting the prediction result of the default probability of each time window of the target to be predicted;

and the regression model corresponding to each time window is obtained after training according to the probability prediction feature sample data of each time window and non-default or default data corresponding to the sample data.

According to the default probability prediction method of one embodiment of the present invention, the specific steps of clustering each target to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window include:

acquiring a characteristic distance between every two targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window;

and based on a clustering algorithm, obtaining the class of each target to be predicted according to the characteristic distance between every two targets to be predicted.

According to the default probability prediction method of an embodiment of the present invention, before obtaining the prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window, the method further includes:

and for each time window, performing logistic regression analysis according to the probability prediction feature sample data of each time window and non-default or default data corresponding to the sample data to obtain a regression model corresponding to each time window.

According to the default probability prediction method of an embodiment of the present invention, before the predicting the feature sample data according to the probability of each time window and the non-default or default data corresponding to the sample data for each time window to perform the logistic regression analysis, and obtaining the regression model corresponding to each time window, the method further includes:

and for each time window, determining the probability prediction characteristics of each time window according to the sample basic data of each time window based on a characteristic engineering method.

According to the default probability prediction method of one embodiment of the invention, the clustering algorithm is a K-means clustering algorithm.

According to the default probability prediction method of one embodiment of the invention, the basic data comprises at least one of personnel data, financial data, business data and industrial and commercial data.

An embodiment of the present invention further provides a default probability prediction apparatus, including:

the regression analysis module is used for acquiring a prediction result of the default probability of each time window of each target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window;

the weighted clustering module is used for clustering the targets to be predicted according to the prediction results of the default probability of each time window of the targets to be predicted and the weight corresponding to each time window, and acquiring default probability prediction results corresponding to each class according to the clustering results to serve as default probability prediction results of the targets to be predicted in each class;

wherein the time durations of any two of the time windows are different.

An embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of any of the foregoing default probability prediction methods when executing the program.

Embodiments of the present invention also provide a non-transitory computer readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing the steps of the default probability prediction method according to any one of the above.

According to the default probability prediction method and device provided by the embodiment of the invention, the prediction results of the default probabilities of a plurality of time windows are obtained according to the basic data of the plurality of time windows of the target to be predicted, clustering is carried out according to the prediction results of the default probabilities of the plurality of time windows and the weights corresponding to the time windows, the default probability prediction result corresponding to each class is obtained and is used as the default probability prediction result of the target to be predicted in each class, the behavior change conditions of the target to be predicted in different time periods can be extracted, the real-time performance and the accuracy of default probability prediction can be considered, and more fine default probability prediction can be realized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flow chart illustrating a method for default probability prediction according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a default probability prediction apparatus according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to overcome the above problems in the prior art, embodiments of the present invention provide a default probability prediction method and apparatus, and the inventive concept is to divide data acquisition into a plurality of time windows, more accurately depict the behavior of the target to be predicted in a long-term, a medium-term, and a short-term, and synthesize default probability predictions of each time window to perform comprehensive predictions of default probabilities of the target to be predicted.

Fig. 1 is a schematic flowchart of a default probability prediction method according to an embodiment of the present invention. The default probability prediction method according to the embodiment of the present invention is described below with reference to fig. 1. As shown in fig. 1, the method includes: step S101, for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window.

Wherein the time durations of any two time windows are different.

Specifically, the time window refers to a time period, which is generally a time period from a time point before the current time point to the current time point.

In order to more accurately characterize the behavior of the target to be predicted in different periods of time, such as long-term, medium-term, short-term, etc., a plurality of time windows with mutually different durations may be selected in advance.

For example, 3 time windows can be selected, the duration of each time window is 5 years, 2 years and 1 year, and the time windows correspond to a long-term time window, a medium-term time window and a short-term time window; 5 time windows may also be selected. The time lengths are respectively 8 years, 5 years, 3 years, 2 years and 1 year, and correspond to a long-term time window, a medium-short term time window and a short-term time window.

The basic data refers to data of personnel, funds, transactions, intellectual property rights and the like which have certain relevance with the risk condition of the target to be predicted.

The target to be predicted can be an entity such as a business or an individual.

The basic data may include one or more items.

For each time window, the basic data of the time window may be used as the input of the regression model corresponding to the time window, or after the basic data of the time window is subjected to data processing, appropriate data may be obtained as the input of the regression model corresponding to the time window.

Regression models are a mathematical model that quantitatively describes statistical relationships. Regression models are predictive modeling techniques that study the relationship between dependent variables (targets) and independent variables (predictors). This technique is commonly used for predictive analysis, time series modeling, and discovering causal relationships between variables.

For each time window, the regression model corresponding to the time window can output the prediction result of the default probability of the time window of the target to be predicted according to the input data.

Through the steps, the prediction result of the default probability of each target to be predicted in each time window can be obtained.

Step S102, clustering the targets to be predicted according to the prediction results of the default probability of each time window of the targets to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering results to serve as the default probability prediction result of the targets to be predicted in each class.

Specifically, the weight corresponding to each time window may be determined according to the type and the prediction requirement of the target to be predicted.

For example: for a large enterprise, the weights w1, w2 and w3 corresponding to the long-term time window, the medium-term time window and the short-term time window are 0.5, 0.3 and 0.2 respectively; for small and medium enterprises, the weights w1, w2 and w3 corresponding to the long-term time window, the medium-term time window and the short-term time window are 0.15, 0.25 and 0.6 respectively.

Based on the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, clustering the targets to be predicted by adopting any unsupervised clustering algorithm, dividing the targets to be predicted into a plurality of classes, determining the class to which each target to be predicted belongs, and obtaining a clustering result.

For each class obtained through clustering, a probability interval can be obtained by adopting methods such as mathematical statistics and the like based on the prediction result of the default probability of each time window of each target to be predicted belonging to the class and the weight corresponding to each time window, and the probability interval is used as the default probability prediction result corresponding to the class.

For example, for each time window, the upper and lower probability limits of the time window may be determined by using a method such as mathematical statistics based on the predicted result of the default probability of the time window of each object to be predicted belonging to the category, for example, the maximum and minimum values in the predicted result of the default probability of the time window of each object to be predicted may be respectively used as the upper and lower probability limits of the time window, or the upper and lower probability limits of the time window may be obtained by respectively adding and subtracting several times of standard deviation to and from the average value of the predicted result of the default probability of the time window of each object to be predicted; and acquiring the upper limit of the probability interval according to the weight and the probability upper limit corresponding to each time window, and acquiring the lower limit of the probability interval according to the weight and the probability lower limit corresponding to each time window, thereby acquiring the probability interval.

For each class obtained by clustering, after the probability interval of each time window is obtained through the steps, statistical index analysis can be carried out according to the key characteristics of each target to be predicted of the class and the time window, and the probability interval of the time window obtained before can be corrected.

And comparing the statistical indexes of the key characteristics of the time window of each object to be predicted with the overall statistical indexes of the key characteristics of the time window of all objects to be predicted, and judging whether the statistical indexes of the key characteristics of the time window of each object to be predicted obviously deviate from the overall statistical indexes.

If the deviation exists, the targets to be predicted of the class are considered to be distributed as abnormal points compared with the whole, and correction is carried out; if not, no correction is made.

If the critical characteristic is obviously higher than the upper limit, correspondingly increasing or decreasing the upper limit and the lower limit of the probability interval of the time window, which are acquired before the critical characteristic is a positive or negative indicator;

if the critical characteristic is obviously lower than the threshold, based on whether the critical characteristic is a positive or negative indicator, the upper limit and the lower limit of the probability interval of the time window acquired before the critical characteristic is correspondingly adjusted down or up.

For example, the target to be predicted is a business, and the key characteristics can be main characteristics such as the amount of registered capital, the business operation duration, the business operation performance stability and the like. The statistical indicator may be a mean or median, etc. The condition for determining a significant deviation may include whether the distribution is below or above the 3 standard deviation range of the overall distribution.

It should be noted that compared with the target to be predicted belonging to another class, the targets to be predicted belonging to the same class have higher similarity in behavior and default probability, so that the default probability prediction result corresponding to each class can be used as the default probability prediction result of the target to be predicted in each class.

According to the embodiment of the invention, the prediction results of the default probability of the multiple time windows are obtained according to the basic data of the multiple time windows of the target to be predicted, clustering is carried out according to the prediction results of the default probability of the multiple time windows and the weight corresponding to the time windows, the default probability prediction result corresponding to each class is obtained and is used as the default probability prediction result of the target to be predicted in each class, the behavior change conditions of the target to be predicted in different time periods can be extracted, the real-time performance and the accuracy of default probability prediction can be considered, and more precise default probability prediction can be realized.

Based on the content of the above embodiments, the specific step of obtaining the prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window includes: and acquiring the probability prediction characteristics of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted.

Specifically, the probabilistic predictive feature is a plurality of predetermined suitable indexes for inputting the regression model according to each index in the basic data.

Each predetermined index used for inputting the regression model may be one of the indexes in the basic data, or may be a linear or nonlinear combination of some of the indexes in the basic data.

For each time window, according to the basic data of the time window of the target to be predicted, the probability prediction characteristic of the time window of the target to be predicted can be obtained.

And inputting the probability prediction characteristics of each time window of the target to be predicted into the regression model corresponding to each time window, and outputting the prediction result of the default probability of each time window of the target to be predicted.

The regression model corresponding to each time window is obtained after training according to the probability prediction feature sample data of each time window and non-default or default data corresponding to the probability prediction feature sample data.

Specifically, for any target to be predicted, after the probability prediction features of each time window of the target to be predicted are obtained, the probability prediction features of each time window may be input into the regression model corresponding to the time window, and the prediction result of the default probability of each time window of the target to be predicted is output.

It can be understood that, before step S102, for each time window, the feature sample data and the non-default or default data corresponding to the sample data may be predicted according to the probability of the time window, and trained by a regression analysis method to obtain a regression model corresponding to the time window.

The regression model corresponding to each time window can describe the causal relationship between the probability prediction characteristics of the time window of the target to be predicted and the default probability of the time window.

The probability prediction feature sample data can be obtained according to basic data of a certain time window of the sample enterprise in a corresponding historical time period.

And the non-default or default data corresponding to the probability prediction characteristic sample data refers to whether the sample enterprise is default or not after the historical time period is over. If the default is default, the default data is default data, and the default data can be represented by 1; if not, it is non-violating data, which can be represented by 0.

According to the embodiment of the invention, the probability prediction characteristics of the time window of the target to be predicted are obtained according to the basic data of each time window of the target to be predicted, the prediction result of the default probability of the time window of the target to be predicted is obtained according to the probability prediction characteristics of the time window and the corresponding regression model, and the more accurate prediction result of the default probability of the time window can be obtained, so that the default probability prediction result of each target to be predicted can be obtained based on the prediction result of the default probability of each time window of each target to be predicted, the real-time performance and the accuracy of default probability prediction can be considered, and the finer default probability prediction can be realized.

Based on the content of each embodiment, the specific steps of clustering each target to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window include: and acquiring the characteristic distance between every two targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window.

Specifically, when clustering is performed on each target to be predicted, the target to be predicted is mapped into the feature space according to the prediction result of the default probability of each time window of each target to be predicted.

In the feature space, the feature distance between the two targets to be predicted is obtained according to the prediction result of the default probability of each time window of the two targets to be predicted and the weight corresponding to each time window.

The specific calculation formula of the characteristic distance between two targets to be predicted is as follows:

wherein, X_i，X_jRespectively representing two targets to be predicted; k represents the kth time window; x_ik，X_jkRespectively represent X_i，X_jPredicting the default probability of the kth time window; w is a_kRepresenting the weight corresponding to the k time window; n represents the total number of time windows.

Specifically, according to the characteristic distance between every two targets to be predicted, clustering is carried out based on any clustering algorithm, the targets to be predicted are clustered, the targets to be predicted are divided into a plurality of classes, and the class to which each target to be predicted belongs is determined.

The embodiment of the invention is based on a clustering algorithm, and the class to which each target to be predicted belongs is obtained according to the characteristic distance between every two targets to be predicted, so that a more accurate clustering result can be obtained, and thus, the default probability prediction result corresponding to each class can be obtained according to the clustering result and is used as the default probability prediction result of the target to be predicted in each class, the real-time performance and the accuracy of default probability prediction can be considered, and more precise default probability prediction can be realized.

Based on the content of the foregoing embodiments, before obtaining, for each target to be predicted, the prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window, the method further includes: and for each time window, performing logistic regression analysis according to the probability prediction characteristic sample data of each time window and non-default or default data corresponding to the probability prediction characteristic sample data to obtain a regression model corresponding to each time window.

Specifically, for each time window, a logistic regression analysis may be performed on the probability prediction feature sample data of the time window and non-default or default data corresponding to the probability prediction feature sample data to obtain a regression model corresponding to the time window. The regression model is a logistic regression model.

Logistic regression, also known as Logistic regression analysis, is a generalized linear regression analysis model.

The Logistic regression model can be used to predict how likely a certain condition will occur under different independent variables.

According to the embodiment of the invention, the logistic regression analysis is carried out according to the probability prediction characteristic sample data of each time window of each sample enterprise and the non-default or default data corresponding to the probability prediction characteristic sample data to obtain the regression model corresponding to the time window, so that the default probability of the target to be predicted can be predicted more accurately, and the more accurate default probability prediction result of each target to be predicted can be obtained based on the default probability prediction result of each time window of each target to be predicted and the weight corresponding to each time window.

Based on the content of the foregoing embodiments, for each time window, performing logistic regression analysis according to the probability prediction feature sample data of each time window and non-default or default data corresponding to the sample data, and before obtaining the regression model corresponding to each time window, the method further includes: and for each time window, determining the probability prediction characteristics of each time window according to the sample basic data of each time window based on a characteristic engineering method.

Specifically, each index in the basis can be screened and/or combined by a feature selection method and/or a feature dimension reduction method in a feature engineering method, and a plurality of suitable indexes for inputting the regression model are determined to be used as probability prediction features.

The feature selection can be realized by one or more combination of various filtering methods, various packaging methods and various embedding methods.

The feature dimensionality reduction can be performed by methods such as Principal Component Analysis (PCA) or Linear Discriminant Analysis (LDA).

The embodiment of the invention determines the probability prediction characteristics based on the characteristic engineering method, can reduce the number of the probability prediction characteristics, can reduce the data volume of the probability prediction and can improve the accuracy and efficiency of the probability prediction on the premise of furthest reserving the characteristics related to the default possibility.

Based on the content of the above embodiments, the clustering algorithm is a K-means clustering algorithm.

Specifically, a K-means clustering algorithm (K-means clustering algorithm) may be adopted to cluster the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window.

A data point set and the required clustering number K are given, K is specified in advance, and the K-means clustering algorithm repeatedly divides data into K clusters according to a certain distance function. Wherein K is a positive integer.

According to the embodiment of the invention, the targets to be predicted are clustered through the K-means clustering algorithm, and the class to which each target to be predicted belongs is determined, so that the default probability prediction result corresponding to each class can be obtained according to the clustering result, and can be used as the default probability prediction result of the target to be predicted in each class, and a more accurate default probability prediction result can be obtained.

Based on the content of the above embodiments, the basic data includes at least one of personnel data, financial data, business data, and business data.

Specifically, when the target to be predicted is an enterprise, the basic data may include at least one of personnel data, financial data, business data, and business data.

Personnel data, which may include the number of employees, the age of the legal, the marital status of the legal, and the like.

The financial data can comprise loan application amount, repayment records, default records and the like.

Business data, which may include performance, market value, and capital movement, among others.

The business data may include registered capital, established time, and illegal violations, among others.

According to the embodiment of the invention, at least one of the employee data, the financial data, the operation data and the industrial and commercial data is selected as the basic data, so that a more accurate default probability prediction result can be obtained based on the basic data of each time window.

The default probability prediction device provided by the embodiment of the present invention is described below, and the default probability prediction device described below and the default probability prediction method described above may be referred to in correspondence with each other.

Fig. 2 is a schematic structural diagram of a default probability prediction apparatus according to an embodiment of the present invention. Based on the content of the above embodiments, as shown in fig. 2, the apparatus includes a regression analysis module 201 and a weighted clustering module 202, wherein:

the regression analysis module 201 is configured to, for each target to be predicted, obtain a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted, and the regression model corresponding to each time window;

the weighted clustering module 202 is configured to cluster the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and obtain the default probability prediction result corresponding to each class according to the clustering result, where the default probability prediction result is used as the default probability prediction result of the target to be predicted in each class;

wherein the time durations of any two time windows are different.

Specifically, the regression analysis module 201 is electrically connected to the weighted clustering module 202.

For each time window, the regression analysis module 201 may use the basic data of the time window as the input of the regression model corresponding to the time window, or may obtain suitable data after performing data processing on the basic data of the time window as the input of the regression model corresponding to the time window; the regression model corresponding to the time window can output the prediction result of the default probability of the time window of the target to be predicted according to the input data.

The weighted clustering module 202 clusters the targets to be predicted by adopting any clustering algorithm based on the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, divides each target to be predicted into a plurality of classes, determines the class to which each target to be predicted belongs, and obtains a clustering result.

For each class obtained by clustering, the weighted clustering module 202 may obtain a probability interval as a default probability prediction result corresponding to the class by using methods such as mathematical statistics based on a prediction result of the default probability of each time window of each target to be predicted belonging to the class and a weight corresponding to each time window.

The default probability prediction apparatus provided in the embodiments of the present invention is configured to execute the default probability prediction method provided in each of the embodiments of the present invention, and specific methods and processes for implementing corresponding functions by each module included in the default probability prediction apparatus are detailed in the embodiments of the default probability prediction method, and are not described herein again.

The default probability prediction device is used in the default probability prediction methods of the foregoing embodiments. Therefore, the description and definition in the default probability prediction method in the foregoing embodiments can be used for understanding the execution modules in the embodiments of the present invention.

Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor)301, a memory (memory)302, and a bus 303; wherein, the processor 301 and the memory 302 complete the communication with each other through the bus 303; processor 301 is configured to invoke computer program instructions stored in memory 302 and executable on processor 301 to perform a method for default probability prediction for each of the method embodiments described above, the method comprising: for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window; clustering the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering result to serve as the default probability prediction result of the target to be predicted in each class; wherein the time durations of any two time windows are different.

Furthermore, the logic instructions in the memory 302 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can execute the method for predicting probability of default provided by the foregoing method embodiments, where the method includes: for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window; clustering the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering result to serve as the default probability prediction result of the target to be predicted in each class; wherein the time durations of any two time windows are different.

In yet another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the method for predicting default probability provided in the foregoing embodiments, and the method includes: for each target to be predicted, obtaining a prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window; clustering the targets to be predicted according to the prediction result of the default probability of each time window of each target to be predicted and the weight corresponding to each time window, and acquiring the default probability prediction result corresponding to each class according to the clustering result to serve as the default probability prediction result of the target to be predicted in each class; wherein the time durations of any two time windows are different.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for predicting a probability of breach, comprising:

wherein the time durations of any two of the time windows are different.

2. The default probability prediction method according to claim 1, wherein the specific step of obtaining the prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window comprises:

3. The default probability prediction method according to claim 1, wherein the specific step of clustering each target to be predicted according to the default probability prediction result of each time window of each target to be predicted and the weight corresponding to each time window comprises:

4. The default probability prediction method according to claim 2, wherein before obtaining the prediction result of the default probability of each time window of the target to be predicted according to the basic data of each time window of the target to be predicted and the regression model corresponding to each time window, for each target to be predicted, the method further comprises:

5. The default probability prediction method of claim 2, wherein before the predicting the feature sample data according to the probability of each time window and the non-default or default data corresponding to the sample data for each time window to perform the logistic regression analysis for each time window and obtain the regression model corresponding to each time window, the method further comprises:

6. The default probability prediction method of claim 3, wherein the clustering algorithm is a K-means clustering algorithm.

7. The method of any one of claims 1 to 6, wherein the base data includes at least one of personnel data, financial data, business data and industrial and commercial data.

8. An apparatus for predicting a probability of breach, comprising:

wherein the time durations of any two of the time windows are different.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the default probability prediction method according to any one of claims 1 to 7.

10. A non-transitory computer readable storage medium, having stored thereon a computer program, which, when being executed by a processor, carries out the steps of the default probability prediction method according to any one of claims 1 to 7.