WO2009048843A1 - Methods and systems of predicting mortgage payment risk - Google Patents
Methods and systems of predicting mortgage payment risk Download PDFInfo
- Publication number
- WO2009048843A1 WO2009048843A1 PCT/US2008/078987 US2008078987W WO2009048843A1 WO 2009048843 A1 WO2009048843 A1 WO 2009048843A1 US 2008078987 W US2008078987 W US 2008078987W WO 2009048843 A1 WO2009048843 A1 WO 2009048843A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- mortgage
- risk
- models
- historical
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/02—Banking, e.g. interest calculation or account maintenance
Definitions
- the embodiments described herein relate to determining or predicting the likelihood of payment defaults in financial transactions.
- a mortgage fraud detection system can be configured to analyze loan application data to identify applications that are being obtained using fraudulent application data.
- EPD early payment default
- a method for detecting a risk of payment default comprises receiving mortgage data associated with a mortgage application, the mortgage data associated with an applicant, determining a first score for the mortgage data based at least partly on one or more models that are based on data from a plurality of historical mortgage transactions and based on historical credit information related to the applicant, and generating data indicative of a risk of payment default based at least partly on the first score.
- a method for detecting a risk of payment default comprises receiving mortgage data associated with a plurality of mortgage applications, each mortgage application associated with an applicant; determining a plurality of first scores for the mortgage data for each of the plurality of mortgage applications based at least partly on one or more models that are based on data from a plurality of historical mortgage transactions and based on historical credit information related to the applicant; generating data indicative of a risk of payment default for each of the plurality of mortgage applications; prioritizing the mortgage applications based on the plurality of data generated; determining a plurality of second scores for the mortgage data for each of the plurality of mortgage applications based at least partly on one or more models that are based on data from a plurality of historical mortgage transactions; and generating data indicative of a risk of fraud for each of the prioritized mortgage applications based at least partly on the second score.
- a system for detecting a risk of payment default comprises a storage configured to receive mortgage data associated with a mortgage application, the mortgage applications associated with an applicant; and a processor coupled with the storage, the processor configured to determine a first score for the mortgage data based at least partly on one or more models that are based on data from a plurality of historical mortgage transactions and based on historical credit information related to the applicant, and generate data indicative of a risk of payment default based at least partly on the first score.
- a system for detecting a risk of payment default comprises a storage configured to receive mortgage data associated with a plurality of mortgage applications, each of the plurality of mortgage applications associated with an applicant; and a processor coupled with the storage, the processor configured to determine a plurality of first scores for the mortgage data for each of the plurality of mortgage applications based at least partly on one or more models that are based on data from a plurality of historical mortgage transactions and based on historical credit information related to the applicant, generate data indicative of a risk of payment default for each of the plurality of mortgage applications based at least partly on the first score, prioritize the mortgage applications based on the plurality of data generated, determine a plurality of second scores for the mortgage data for each of the plurality of mortgage applications based at least partly on one or more models that are based on data from a plurality of historical mortgage transactions, and generate data indicative of a risk of fraud for each of the prioritized mortgage applications based at least partly on the second score.
- Figure 1 is a functional block diagram illustrating a fraud detection system such as for use with a mortgage origination system in accordance with one embodiment
- Figure 2 is a functional block diagram illustrating an example of the fraud detection system of Figure 1 in more detail in accordance with one embodiment
- Figure 3 is a functional block diagram illustrating an example of loan models that can be included in the fraud detection system of Figure 2;
- Figure 4 is a functional block diagram illustrating examples of entity models that can be included in the fraud detection system of Figure 2;
- Figure 5 is a flowchart illustrating model generation and use in the fraud detection system of Figure 2;
- Figure 6 is a flowchart illustrating an example of using models in the fraud detection system of Figure 2;
- Figure 7 is a flowchart illustrating an example of generating a supervisory model in the fraud detection system of Figure 2;
- Figure 8 is a flowchart illustrating an example of generating entity models in the fraud detection system of Figure 2;
- Figure 9 is a functional block diagram illustrating a payment default detection system such as for use with a mortgage origination system.
- Figure 10 is a functional block diagram illustrating an example of an payment default detection system in more detail.
- Existing fraud detection systems can use transaction data in addition to data related to the transacting entities to identify fraud. Such systems can operate in either batch (processing transactions as a group of files at periodic times during the day) or real time mode (processing transactions one at a time, as they enter the system). However, the fraud detection capabilities of existing systems have not kept pace with either the types of fraudulent activity that have evolved or increasing processing and storage capabilities of computing systems. [0026] For example, it has been found that, as discussed with reference to some embodiments, fraud detection can be improved by using stored past transaction data in place of, or in addition to, summarized forms of past transaction data. In addition, in one embodiment, fraud detection can be improved by using statistical information that is stored according to groups of individuals that form clusters.
- fraud can be identified with reference to deviation from identified clusters.
- embodiments of mortgage fraud detection systems can use data that is stored in association with one or more entities associated with the processing of the mortgage transaction such as brokers, appraisers, or other parties to mortgage transactions.
- the entities can be real persons or can refer to business associations, e.g., a particular appraiser, or an appraisal firm.
- Fraud generally refers to any material misrepresentation associated with a loan application and can include any misrepresentation which leads to a higher probability for the resulting loan to default or become un-sellable or require discount in the secondary market.
- Mortgages can include residential, commercial, or industrial mortgages.
- mortgages can include first, second, home equity, or any other loan associated with a real property.
- it is to be recognized that other embodiments can also include fraud detection in other types of loans or financial transactions.
- Exemplary applications of fraud detection relate to credit cards, debit cards, and mortgages. Furthermore, various patterns can be detected from external sources, such as data available from a credit bureau or other data aggregator.
- FIG. 1 is a functional block diagram illustrating a fraud detection system 100 such as for use with a mortgage origination system 106.
- the system 100 can be used to analyze applications for use in evaluating applications and/or funded loans by an investment bank or as part of due diligence of a loan portfolio.
- the fraud detection system 100 can receive and store data in a storage 104.
- the storage 104 can comprise one or more database servers and any suitable configuration of volatile and persistent memory.
- the fraud detection system 100 can be configured to receive mortgage application data from the mortgage origination system 106 and provide data indicative of fraud back to the mortgage origination system 106.
- the fraud detection system 100 uses one or more models to generate the data indicative of fraud.
- data indicative of fraud can also be provided to a risk manager system 108 for further processing and for analysis by a human operator.
- the analysis system 108 can be provided in conjunction with the fraud detection system 100 or in conjunction with the mortgage origination system 106.
- a model generator 110 can provide models to the fraud detection system 100.
- the model generator 110 can provide the models periodically to the system 100, such as when new versions of the system 100 are released to a production environment.
- at least a portion of the model generator 110 can be included in the system 100 and configured to automatically update at least a portion of the models in the system 100.
- Figure 2 is a functional block diagram further illustrating an example of the fraud detection system 100.
- the system 100 can include an origination system interface 122 providing mortgage application data to a data preprocessing module 124.
- the origination system interface 122 can receive data from the mortgage origination system 106 of Figure 1.
- the origination system interface 122 can be configured to receive data associated with funded mortgages and can be configured to interface with suitable systems other than, or in addition to, mortgage origination systems.
- the system interface 122 can be configured to receive "bid tapes" or other collections of data associated with funded mortgages for use in evaluating fraud associated with a portfolio of funded loans.
- the origination system interface 122 comprises a computer network that communicates with the origination system 106 to receive applications in real time or in batches.
- the origination system interface 122 receives batches of applications via a data storage medium.
- the origination system interface 122 provides application data to the data preprocessing module 124 which formats application data into data formats used internally in the system 100.
- the origination system interface 122 can also provide data from additional sources such as credit bureaus that can be in different formats for conversion by the data preprocessing module 124 into the internal data formats of the system 100.
- the origination system interface 122 and preprocessing module 124 can also allow at least portions of a particular embodiment of the system 100 to be used to detect fraud in different types of credit applications and for different loan originators that have varying data and data formats.
- Table 1 lists examples of mortgage application data that can be used in various embodiments.
- the preprocessing module 124 can be configured to identify missing data values and provide data for those missing values to improve further processing. For example, the preprocessing module 124 can generate application data to fill missing data fields using one or more rules. Different rules can be used depending on the loan data supplier, on the particular data field, and/or on the distribution of data for a particular field. For example, for categorical fields, the most frequent value found in historical applications can be used. For numerical fields, the mean or median value of historical applications can be used. In addition, other values can be selected such as a value that is associated with the highest risk of fraud (e.g., assume the worst) or a value that is associated with the lowest risk of fraud (e.g., assume the best).
- rules can be used depending on the loan data supplier, on the particular data field, and/or on the distribution of data for a particular field. For example, for categorical fields, the most frequent value found in historical applications can be used. For numerical fields, the mean or median value of historical applications can be used. In addition, other values can be selected
- a sentinel value e.g., a specific value that is indicative of a missing value to one or more fraud models can be used (allowing the fact that particular data is missing to be associated with fraud).
- the preprocessing module 124 can also be configured to identify erroneous data or missing data. In one embodiment, the preprocessing module 124 extrapolates missing data based on data from similar applications, similar applicants, or using default data values.
- the preprocessing module 124 can perform data quality analysis such as one or more of critical error detection, anomaly detection, and data entry error detection. In one embodiment, applications failing one or more of these quality analyses can be logged to a data error log database 126.
- the preprocessing module 124 identifies applications that are missing data that the absence of which is likely to confound further processing.
- missing data can include, for example, appraisal value, borrower credit score, or loan amount.
- no further processing is performed and a log or error entry is stored to the database 126 and for provided to the loan origination system 106.
- the preprocessing module 124 identifies continuous application data values that can be indicative of data entry error or of material misrepresentations. For example, high loan or appraisal amounts (e.g., above a threshold value) can be indicative of data entry error or fraud.
- Other anomalous data can include income or age data that is outside selected ranges.
- such anomalous data can be logged and the log provided to the origination system 106.
- the fraud detection system 100 continues to process applications with anomalous data. The presence of anomalous data can be logged to the database 126 and/or included in a score output or report for the corresponding application.
- the preprocessing module 124 can be configured to identify non-continuous data such as categories or coded data that appear to have data entry errors. For example, telephone numbers or zip codes that have too many or too few digits, incomplete social security numbers, toll free numbers as home or work numbers, or other category data that fails to conform to input specifications can be logged. The presence of anomalous data can be logged to the database 126 and/or included in a score output or report for the corresponding application.
- the preprocessing module 124 can be configured to query an input history database 128 to determine if the application data is indicative of a duplicate application.
- a duplicate can indicate either resubmission of the same application fraudulently or erroneously. Duplicates can be logged. In one embodiment, no further processing of duplicates is performed. In other embodiments, processing of duplicates continues and can be noted in the final report or score. If no duplicate is found, the application data is stored to the input history database 124 to identify future duplicates.
- the data preprocessing module 124 provides application data to one or more models for fraud scoring and processing.
- application data is provided to one or more loan models 132 that generate data indicative of fraud based on application and applicant data.
- the data indicative of fraud generated by the loan models 132 can be provided to an integrator 136 that combines scores from one or more models into a final score.
- the data preprocessing module 124 can also provide application data to one or more entity models 140 that are configured to identify fraud based on data associated with entities involved in the processing of the application.
- Entity models can include models of data associated with loan brokers, loan officers or other entities involved in a loan application.
- Each of the entity models can output data to an entity scoring module 150 that is configured to provide a score and/or one or more risk indicators associated with the application data.
- the entity scoring module 150 can provide scores associated with one or more risk indicators associated with the particular entity or application.
- risk indicator refers to data values identified with respect to one or more data fields that can be indicative of fraud.
- appraisal value in combination with zip code can be a risk indicator associated with an appraiser model.
- the entity scoring module 150 provides scores and indicators to the integrator 136 to generate a combined fraud score and/or set of risk indicators.
- the selection of risk indicators are based on criteria such as domain knowledge, and/or correlation coefficients between entity scores and fraud rate, if entity fraud rate is available. Correlation coefficient between entity score s 1 for risk indicator j and entity fraud rate/is defined as
- the entity scoring model 150 combines each of the risk
- the combined score for a particular entity can be determined
- the fraud/EPD rate can be incorporated with entity committee
- the entity score S E can be calculated using one of
- the preprocessing module 124 can also provide application data to a risky file
- the risky file processing module 156 is configured to receive files from a risky files database 154. "Risky" files include portions of applications that are known to be fraudulent. It has been found that fraudulent applications are often resubmitted with only insubstantial changes in application data.
- the risky file processing module 156 compares each application to the risky files database 154 and flags applications that appear to be resubmissions of fraudulent applications.
- risky file data is provided to the integrator 136 for integration into a combined fraud score or report.
- the integrator 136 can be configured to apply weights and for processing rules to generate one or more scores and risk indicators based on the data indicative of fraud provided by one or more of the loan models 132, the entity models 140 and entity scoring modules 160, and the risky file processing module 156.
- the risk indicator 136 generates a single score indicative of fraud along with one or more risk indicators relevant for the particular application. Additional scores can also be provided with reference to each of the risk indicators.
- the integrator 136 can provide this data to a scores and risk indicators module 160 that logs the scores to an output history database 160.
- the scores and risk indicators module 160 identifies applications for further review by the risk manager 108 of Figure 1. Scores can be real or integer values.
- scores are numbers in the range of 1-999.
- thresholds are applied to one or more categories to segment scores into high and low risk categories.
- thresholds are applied to identify applications for review by the risk manager 108.
- risk indicators are represented as codes that are indicative of certain data fields or certain values for data fields. Risk indicators can provide information on the types of fraud and recommended actions. For example, risk indicators might include a credit score inconsistent with income, high risk geographic area, etc. Risk indicators can also be indicative of entity historical transactions, e.g., a broker trend that is indicative of fraud.
- a score review report module 162 can generate a report ir, or,e or more formats based on scores and risk indicators provided by the scores and risk indicators module 160.
- the score review report module 162 identifies loan applications for review by the risk manager 108 of Figure 1.
- One embodiment desirably improves the efficiency of the risk manager 108 by identifying applications with the highest fraud scores or with particular risk indicators for review thereby reducing the number of applications that need to be reviewed.
- a billing process 166 can be configured to generate billing information based on the results in the output history.
- the model generator 110 receives application data, entity data, and data on fraudulent and non-fraudulent applications and generates and updates models such as the entity models 140 either periodically or as new data is received.
- Figure 3 is a functional block diagram illustrating an example of the loan models
- the loan models 132 can include one or more supervised models 170 and high risk rules models 172.
- Supervised models 170 are models that are generated based on training or data analysis that is based on historical transactions or applications that have been identified as fraudulent or non-fraudulent. Examples of implementations of supervised models 170 include scorecards, naive Bayesian, decision trees, logistic regression, and neural networks. Particular embodiments can include one or more such supervised models 170.
- the high risk rules models 172 can include expert systems, decision trees, and for classification and regression tree (CART) models.
- the high risk rules models 172 can include rules or trees that identify particular data patterns that are indicative of fraud.
- the high risk rules model 172 is used to generate scores and/or risk indicators.
- the rules, including selected data fields and condition parameters, are developed using the historical data used to develop the loan model 170.
- a set of high risk rule models 172 can be selected to include rules that have low firing rate and high hit rate.
- s m ' le when a rule i is fired, it outputs a score: s m ' le .
- the score represents the fraud risk associated to the rule.
- the loan models 170 and 172 are updated when new versions of the system 100 are released into operation.
- the supervised models 170 and the high risk rules models. 172 are updated automatically.
- the supervised models 170 and the high risk rules models 172 can also be updated such as when new or modified data features'or other model parameters are received.
- Figure 4 is a functional block diagram illustrating examples of the entity models
- fraud detection performance can be increased by including models that operate on entities associated with a mortgage transaction that are in addition to the mortgage applicant. Scores for a number of different types of entities are calculated based on historical transaction data.
- the entity models can include one or more of an account executive model 142, a broker model 144, a loan officer model 146, and an appraiser (or appraisal) model 148.
- Embodiments can also include other entities associated with a transaction such as the lender.
- an unsupervised model e.g., a clustering model such as k-means, is applied to risk indicators for historical transactions for each entity.
- a score for each risk indicator, for each entity is calculated based on the relation of the particular entity to the clusters across the data set for the particular risk indicator.
- a risk indicator that is a single value, e.g., loan value for a broker
- the difference between the loan value of each loan of the broker and the mean assuming a simple Gaussian distribution of loan .values
- the standard deviation of the loan values over the entire set of historical loans for all brokers might be used as a risk indicator for that risk indicator score.
- Embodiments that include more sophisticated clustering algorithms such as k-means can be used along with multidimensional risk indicators to provide for more powerful entity scores.
- FIG. 5 is a flowchart illustrating a method 300 of operation of the fraud detection system 100.
- the method 300 begins at a block 302 in which the supervised model is generated.
- the supervised models 170 are generated based on training or data analysis that is based on historical transactions or applications that have been identified as fraudulent or non-fraudulent. Further details of generating supervised models are discussed with reference to Figure 7.
- the system 100 generates one or more unsupervised entity models such as the account executive model 142, the broker model 144, the loan officer model 146, or the appraiser (or appraisal) model 148. Further details of generating unsupervised models are discussed with reference to Figure 8. Proceeding to a block 306, the system 100 applies application data to models such as supervised models 132 and entity models 150. The functions of block 306 can be repeated for each loan application that is to be processed. Further detail of applying data to the models is described with reference to Figure 6. [0057] In one embodiment, the model generator 110 generates and/or updates models as new data is received or at specified intervals such as nightly or weekly.
- FIG. 6 is a flowchart illustrating an example of a method of performing the functions of the block 306 of Figure 5 of using models in the fraud detection system 100 to process a loan application.
- the function 306 begins at a block 322 in which the origination system interface 122 receives loan application data.
- the data preprocessing module 124 preprocesses the application 324 as discussed with reference to Figure 2.
- the application data is applied to the supervised loan models 170 which provide a score indicative of the relative likelihood or probability of fraud to the integrator 136.
- the supervised loan models 170 can also provide risk indicators.
- the high risk rules model 172 is applied to the application to generate one or more risk indicators, and/or additional scores indicative of fraud.
- the application data is applied to one or more of the entity models 140 to generate additional scores and risk indicators associated with the corresponding entities of the models 140 associated with the transaction.
- the integrator 136 calculates a weighted score and risk indicators based on scores and risk indicators from the supervised loan model 170, the high risk rules model 172, and scores of entity models 140.
- the integrator 136 includes an additional model, e.g., a trained supervised model that combines the various scores, weights, and risk factors provided by the models 170, 172, and 140.
- the scores and risk indicators module 160 and the score review report module 162 generate a report providing a weighted score along with one or more selected risk indicators.
- the selected risk indicators can include explanations of potential types of frauds and recommendations for action.
- FIG. 7 is a flowchart illustrating an example of a method of performing the block 302 of Figure 5 of generating the loan models 132 in the fraud detection system 100.
- Supervised learning algorithms identify a relationship between input features and target variables based on training data.
- the target variables comprise the probability of fraud.
- the models used can depend on the size of the data and how complex a problem is. For example, if the fraudulent exemplars in historical data are less than about 5000 in number, smaller and simpler models can be used, so robust model parameter estimation can be supported by the data size.
- the method 302 begins at a block 340 in which the model generator 110 receives historical mortgage data.
- the model generator 110 can extract and convert client historical data according to internal development data specifications, perform data analysis to determine data quality and availability, and rectify anomalies, such as missing data, invalid data, or possible data entry errors similar to that described above with reference to preprocessing module 124 of Figure 2.
- the model generator 110 can perform feature extraction including identifying predictive input variables for fraud detection models.
- the model generator 110 can use domain knowledge and mathematical equations applied to single or combined raw input data fields to identify predictive features.
- Raw data fields can be combined and transformed into discriminative features.
- Feature extraction can be performed based on the types of models for which the features are to be used. For example, linear models such as. logistic regression and linear regression, work best when the relationships between input features and the target are linear.
- the model generator 110 selects features from a library of features for use in particular models. The selection of features can be determined by availability of data fields, and the usefulness of a feature for the particular data set and problem. Embodiments can use techniques such as filter and wrapper approaches, including information theory, stepwise regression, sensitivity analysis, data mining, or other data driven techniques for feature selection.
- the model generator 110 can segment the data into subsets to better model input data. For example, if subsets of a data set are identified with significantly distinct behavior, special models designed especially for these subsets normally outperform a general fit-all model.
- a prior knowledge of data can be used to segment the data for generation of models.
- data is segregated geographically so that, for example, regional differences in home prices and lending practices do not confound fraud detection.
- data driven techniques e.g., unsupervised techniques such as clustering, are used to identify data segments that can benefit from a separate supervised model.
- the model generator 110 identifies a portion of the applications in the received application data (or segment of that data) that were fraudulent. In one embodiment, the origination system interface 122 provides this labeling. Moving to a block 344, the model generator 110 identifies a portion of the applications that were non-fraudulent. Next at a block 346, the model generator 110 generates a model such as the supervised model 170 using a supervised learning algorithm to generate a model that distinguishes the fraudulent from the non- fraudulent transactions. In one embodiment, CART or other suitable model generation algorithms are applied to at least a portion of the data to generate the high risk rules models 172. [0066] In one embodiment, historical data is split into multiple non-overlapped data sets.
- the data can be split into three sets, training set 1, training set 2, and validation.
- the training set 1 is used to train the neural network.
- the training set 2 is used during training to ensure the learning converge properly and to reduce over-fitting to the training set 1.
- the validation set is used to evaluate the trained model performance.
- Supervised models can include one or more of scorecards, naive Bayesian, decision trees, logistic regression, and neural networks.
- FIG 8 is a flowchart illustrating an example of a method of performing the block 304 of Figure 5 of generating entity models 140 in the fraud detection system 100.
- the method 304 begins at a block 360 in which the model generator 110 receives historical mortgage applications.
- the model generator 110 can perform various processing functions such as described above with reference to the block 340 of Figure 7.
- the model generator 110 receives data related to mortgage processing related entities such as an account executive, a broker, a loan officer, or an appraiser.
- the model generator 110 selects risk indicators comprising one or more of the input data fields. In one embodiment, expert input is used to select the risk indicators for each type of entity to be modeled.
- the model generator 110 performs an unsupervised clustering algorithm such as k-means for each risk indicator for each type of entity.
- the model generator 1 10 calculates scores for risk indicators for each received historical loan based on the data distance from data clusters identified by the clustering algorithm. For example, in a simple one cluster model where the data is distributed in a normal or Gaussian distribution, the distance can be a distance from the mean value. The distance/score can be adjusted based on the distribution of data for the risk indicator, e.g., based on the standard deviation in a simple normal distribution.
- EPD Early Payment Default
- a weighted average of each of the applications associated with each entity can be used.
- Other embodiments can use other models.
- EPD Early Payment Default
- certain embodiments descried herein are directed to detecting EPD instead of, or in addition to fraud.
- various embodiments of an early payment default (EPD) alert system are described herein.
- such a system can employ statistical pattern recognition to generate a score designed to assess the risk of early payment default in mortgage applications and loans.
- such a system can use advanced analytic scoring technology that enables mortgage lenders, investment banks, and servicers to score and identify each loan's early payment default risk in real-time during the underwriting process, before a new loan is funded, before it is purchased on the secondary market, and during the loan servicing cycle.
- advanced analytic scoring technology that enables mortgage lenders, investment banks, and servicers to score and identify each loan's early payment default risk in real-time during the underwriting process, before a new loan is funded, before it is purchased on the secondary market, and during the loan servicing cycle.
- Such an EPD alert can provide a sophisticated, analytics-based solution to help curtail the growing problem of early mortgage defaults.
- an EPD alert system as described herein uses pattern recognition technology to find early payment default risk based on historical patterns of both performing and non-performing mortgage loans from the a database of historical loans. These analytic models accurately predict the likelihood of a loan defaulting early, resulting in financial loss to the lender or investor.
- the systems and methods related to EPD alert systems can be described in a similar fashion as the embodiments for detecting mortgage related fraud described above. For example, a process similar to that of figure 6 can be employed in an EPD alert system, wherein steps 326, 328 and 330 would be customized and directed to detecting early payment default.
- an EPD alert system can be a complementary system used with systems and methods for detecting fraud, credit, and compliance risk, such as those described above, used during the loan underwriting, loan trading due diligence, and servicing processes to specifically identify early default risk.
- One embodiment of an EPD alert system can include a model configured to detect early payment default and improve the identification of EPD risk over traditional credit scores.
- Lenders can score new loan applications and select the highest scoring, e.g., within a specified cutoff, of applications for a further targeted review. For example, when used by investment banks in evaluating loan pools for purchase on the secondary market, one embodiment of an EPD alert system uses the limited bid tape data to identify high risk loans, which are then selected for further due diligence review.
- An EPD alert system as described herein can provide both a risk score and specific risk indicators that help guide and expedite the investigative process.
- EPD alert system can be used independently or in conjunction with a score indicative of the likelihood of fraud to identify both mortgage fraud and EPD risk prior to funding or loan purchase, or during servicing of the loan.
- the EPD alert system is part of a suite of fraud detection and risk management software designed to provide analytic solutions to the mortgage industry.
- One such system is described in co-pending U.S. Patent Application No. 11/526208, filed September 26, 2006, which is hereby incorporated by reference.
- an EPD model can be generated using a supervised learning model (step 302) that uses examples of loans with and without early payment default to effectively learn how to generate a score that represents the likelihood of a loan defaulting during a particular portion of the life of the loan, e.g., the first, second or third payment, without anticipated cure.
- the score can be a value in a particular range, e.g., 1-999 with high scores indicating highest risk of payment default. It should be noted that the payment default can be early payment default, or a default that occurs over a longer period.
- One embodiment provides an operational workflow that uses both scores in a cascading risk management process for a more comprehensive assessment of fraud and early payment default risk. Studies with lenders determined that additional savings result from the combined model approach. For example, with reference to figure 3, such a cascaded approach can involve receiving mortgage data (step 322) and preprocessing an application (step 324).
- loan data can be processed using EPD loan models 932 (see figure 10) instead of supervised loan models 132 in order to detect the potential for EPD.
- High risk rules can be matched to the application in step 328 based on the modeling of step 326 and a score can be determined for entity models in step 330, if entity models are being used.
- a calculated weighted score can be determined for EPD and a report can be generated in step 334.
- the fraud analysis process described above can be performed, either after or in parallel with the process for EPD detection.
- the results of the EPD process can be used to prioritize applications for the fraud analysis.
- the codes in table 2 can be used, for example, to identify applications that are a high risk for EPD that are also a high risk for fraud. These applications can then be prioritized for the fraud process.
- Fraud misrepresentation of a loan is typically detected through a review of the loan file, with occasional use of external data. Risk of early payment default, on the other hand, requires research into the financial viability of the applicant. Thus, the income stability and accuracy/existence of assets should be confirmed. Total debt should be reviewed to see if there are obvious expenses missing, or indications that debt is rising. Analysis of the credit report can provide insight into the trend of the debt-to-income ratio, and provide an indication that the applicant's financial viability might be worsening.
- risk factors can be included in the supervisory models 170 used for EPD detection native to fraud detection. Those factors can broadly be defined as: borrower's risk, geographic risk, borrower's affordability, and property valuation risk.
- Borrower's risk can include information such as a credit score, payment history, employment information, tenure in current employment position, debt, income, occupancy, etc.
- This information can be used to evaluate the risk factors associated with the borrower. For example, if the buyer has a risky credit score or employment, then they may be a higher risk for
- EPD and the EPD models 932 can take this into account as can the weighting factors applied by, e.g., integrator 936.
- Property appraisal information and the geographic location of the property can also be used to determine the EPD risk.
- the property may be overvalued relative to other properties in the area and/or the area may have a high rate of defaults.
- such information can be used in models 932 to determine a geographic risk factor and/or a property valuation risk factor.
- These factors can be associated with alerts that can be output by the scores and risk indicators block 960. Table 2 lists example alert codes some of which can be associated with these and other risk factors.
- An EPD alert score can be used alone or in combination with a mortgage fraud score to identify loans (or applications) for further review.
- the EPD alert score suggested areas to begin a user's loan investigation.
- risk managers are provided a way to use a fraud checklist and verifications to confirm if fraud exists. Presence of fraud provides assurance in the decision to not fund or purchase the loan due to the immediate confirmation of the problem.
- the remaining loans can then be researched if they contain high scoring EPD alerts, and a determination made about whether the applicant shows indications they will not be able to make their payments.
- An EPD alert model can have some common variables with the mortgage fraud detection system, but each model should have specific variables to predict the targeted outcome. While fraud and EPD can both occur on the same loan, it has been found that other fraud behavior and credit risk-specific EPD mean that the performance is not the same. Each model can contribute uniquely to the total risk assessment.
- Embodiments can use a timeframe for the target definition of EPD that is selected to most closely match what lenders typically measure, and that are associated with the most issues for EPD whether the loan is held or sold.
- the model targets detecting early payment default in the first few months after funding. But it will be understood that the probability for payment default can be detected for any time period after loan funding or adjustments.
- the EPD model is based on data combining multiple sources of information, e.g., which contain loan application and performance data from lenders that is broader than the mortgage loan data typically available through the credit bureau.
- An EPD model as described herein can target detection of early payment default in mortgage loans, in contrast to credit models that can have broader targets such as delinquency in 24 months within all credit relationships, or bankruptcy. Given the high impact of early payment defaults on lenders, such an EPD model is better suited to mortgage use.
- an EPD alert system as described herein can be designed to prevent early payment defaults that can result in severe loss scenarios such as foreclosure or repurchase.
- the models used target mortgage-related behavior, rather than a broad credit risk model that detects problems such as bankruptcy or charge-off.
- Feature Extraction is the process of designing predictive input variables for fraud and EPD detection models. Feature extraction can be performed using other models, alone or with input from human analysts. These predictive features are derived from the raw data fields in the loan application data and are calculated on each loan during model processing. The quality of the predictive features is measured by their ability to separate good from bad loans. The final models make use of such identified features to improve predictive performance.
- EPD model described herein incorporates variables related to the presence of a co-borrower on the loan.
- the system outputs risk indicators, which can be used for further analysis of the loan. The risk indicators represent the factors that contribute the most to the level of early payment default risk for each loan. In one embodiment, the risk indicators are statistically derived.
- cut-offs can be established for the models.
- the system can also perform a historical evaluation to help establish the appropriate strategy.
- a cascading risk management approach can assist in the operational efficiency of implementing the fraud + EPD risk assessment.
- the EPD alert system can produce a result in real time or in a report that can be accessed in a batch mode.
- the model output file can contain a combination of the results of applying loan data to both the fraud detection system and the EPD alert system in a single file.
- the scores enable a focused investigation of the risk.
- the indicators and suggested actions help tailor the loan review for efficiency, based on the factors contributing to the risk. Lenders can use cascading risk decision criteria to streamline the review for fraud and EPD risk assessment provided by the models.
- Lenders can use the EPD score to determine the loans with highest risk of early payment default. Based on the results of an investigation, lender policies and fair lending guidelines, a loan application could have additional conditions placed on it, or be declined.
- an EPD alert system such as system 901 (see figure 9) can process mortgage data and provide an EPD risk alert and likely reason for the alert.
- the alert can, in certain embodiments also be used in the loan processing or to prioritize certain applications for fraud analysis.
- Example risk alerts can include whether the applicant's patent score falls within a range that correlates with a high level of defaults, whether the income level is in a range that correlates with a high level of defaults, whether the property zip code is in an area with high defaults, etc.
- FIG. 9 is a functional block diagram illustrating a system 900 for EPD detection such as for use with a mortgage origination system 906.
- the system 900 can be used to analyze applications for use in evaluating applications and/or funded loans by an investment bank or servicer, or as part of due diligence of a loan portfolio.
- System 900 can comprise a mortgage origination system configured to provide mortgage data related to mortgage applications; a risk detection system 902, which can be configured to detect fraud risk associated with the mortgage applications as described with respect to figures 1 and 2; a EPD alert system 901, which can be configured to assess the EPD risk for the mortgage applications; and a model generator 910 that can generate models for use by risk detection system 902 and EPD alert system 901.
- the EPD alert system 901 can receive and store data in a storage 904.
- the storage 904 can comprise one or more database servers and any suitable configuration of volatile and persistent memory.
- the system 901 can be configured to receive mortgage application data from the mortgage origination system 906 and provide data indicative of fraud and EPD alerts back to the mortgage origination system 906.
- the system 901 uses one or more models to generate the data indicative of fraud and EPD risk.
- data indicative of fraud and EPD risk can also be provided to a risk manager system 908 for further processing and/or analysis by a human operator.
- the analysis system 908 can be provided in conjunction with the system 901 or in conjunction with the mortgage origination system 906.
- a fraud or other risk detection system 902 can be used in conjunction with, or share databases and other system components with, the EPD alert system 901.
- Figure 10 is a functional block diagram further illustrating an example of the EPD alert system 901.
- system 901 is similar to system 100, with EPD models 932 replacing the loan models 132 and the introduction of credit data 925.
- the system 901 can include an origination system interface 922 providing mortgage application data to a data preprocessing module 924.
- a credit data system 925 can be configured to receive applicant credit data from one or more credit bureaus or from the lender such as via the loan origination system interface 922 to store and provide that data to the EPD alert system 901.
- the origination system interface 922 can receive data from the mortgage origination system 906 of Figure 9.
- the origination system interface 922 can be configured to receive data associated with funded mortgages and can be configured to interface with suitable systems other than, or in addition to, mortgage origination systems.
- the system interface 922 can be configured to receive '"bid tapes" or other collections of data associated with funded mortgages for use in evaluating EPD risk associated with a portfolio of funded loans.
- the origination system interface 922 can comprise a computer network that communicates with the origination system 906 to receive applications in real time or in batches.
- the origination system interface 922 receives batches of applications via a data storage medium.
- the origination system interface 922 can provide application data to the data preprocessing module 924, which formats application data into data formats used internally in the system 901.
- the origination system interface 922 can also provide data from additional sources such as the lender or directly from credit bureaus that can be in different formats for conversion by the data preprocessing module 924 into the internal data formats of the system 901.
- the origination system interface 922 and preprocessing module 924 also allow at least portions of a particular embodiment of the system 901 to be used to score EPD risk in different types of mortgage applications and for different loan originators that have varying data and data formats. Table 1 lists examples of mortgage application data that can be used in various embodiments.
- the preprocessing module 924 can be configured to identify missing data values and provide data for those missing values to improve further processing. For example, the preprocessing module 924 can generate application data to fill missing data fields using one or more rules. Different rules can be used depending on the loan data supplier, on the particular data field, and/or on the distribution of data for a particular field. For example, for categorical fields, the most frequent value found in historical applications can be used. For numerical fields, the mean or median value of historical applications can be used.
- the preprocessing module 924 can also be configured to identify erroneous data or missing data. In one embodiment, the preprocessing module 924 extrapolates missing data based on data from similar applications, or using default data values. The preprocessing module 924 can perform data quality analysis such as one or more of critical error detection, anomaly detection, and data entry error detection.
- the data preprocessing module 924 can provide application data to one or more models for EPD risk scoring and processing.
- application data is provided to one or more EPD models 932 that generate data indicative of EPD risk based on application and applicant data.
- the data indicative of EPD risk generated by the EPD models 932 can be provided to an integrator 936 that combines scores from one or more models into a final score.
- the data preprocessing module 924 can also provide application data to one or more entity models 940 that are configured to identify EPD risk based on data associated with entities involved in the processing of the application.
- Entity models can include models of data associated with loan brokers, loan officers or other entities involved in a loan application. More examples of such entity models 940 are illustrated with reference to Figure 4.
- Each of the entity models can output data to an entity scoring module 950 that is configured to provide a score and/or one or more risk indicators associated with the application data.
- risk indicator refers to data values identified with respect to one or more data fields that can be indicative of EPD risk.
- the entity scoring module 950 can provide scores associated with one or more risk indicators associated with the particular entity or application.
- appraisal value in combination with zip code can be a risk indicator associated with an EPD model.
- the entity scoring module 950 provides scores and indicators to the integrator 936 to generate a combined EPD risk score and/or set of risk indicators.
- the integrator 936 can be configured to apply weights and/or processing rules to generate one or more scores and risk indicators based on the data indicative of EPD risk provided by one or more of the loan models 932, the entity models 940 and entity scoring modules 960.
- the risk indicator 936 can generate a single score indicative of EPD risk along with one or more risk indicators relevant for the particular application.
- Additional scores can also be provided with reference to each of the risk indicators.
- the integrator 936 can provide this data to a scores and risk indicators module 960 that logs the scores to an output history database 960.
- the scores and risk indicators module 960 can identify applications for further review by the risk manager 908 of Figure 9. Scores can be real or integer values.
- scores are numbers in the range of 1-999.
- thresholds are applied to one or more categories to segment scores into high and low risk categories.
- thresholds are applied to identify applications for review by the risk manager 908.
- risk indicators are represented as codes that are indicative of certain data fields or certain values for data fields. Risk indicators can provide information on the types of EPD risk and recommended actions. For example, risk indicators might include a credit score that falls within high % of default ranges, a high risk of default geographic area, etc. Risk indicators can also be indicative of entity historical transactions, e.g., a CLTV percentage that is indicative of EPD risk.
- a score review report module 962 can generate a report in one or more formats based on scores and risk indicators provided by the scores and risk indicators module 960.
- the score review report module 962 identifies loan applications for review by the risk manager 908 of Figure 9.
- One embodiment desirably improves the efficiency of the risk manager 908 by identifying applications with the highest EPD risk scores or with particular risk indicators for review thereby reducing the number of applications that need to be reviewed.
- a billing process 966 can be configured to generate billing information based on the results in the output history.
- Score review report module 962 can output a score report in several formats.
- the report can include information related to the fraud score as well as the
- EPD alert score In other embodiments, only information related to the EPD alert score can be output. In either case, and depending on the embodiment, only the score results, e.g., including risk codes, likely reason codes, suggested action codes, etc., can be output, while in other embodiments this information can be combined with the input information, e.g., from table 1 as well.
- the model generator 910 receives application data, entity data, and data on EPD and non-EPD applications and generates and updates models such as the entity models 940 either periodically or as new data is received.
- embodiments can combine the functions identified with various blocks of Figures 9 and 10 with those of a mortgage fraud detection system 100.
- the score review report generator 962 can output reports that include both EPD risk information and data indicative of fraud.
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- a general purpose processor can be a microprocessor, but in the alternative, the processor can be any conventional processor, controller, microcontroller, or state machine.
- a processor can also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
- a software module can reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
- An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium.
- the storage medium can be integral to the processor.
- the processor and the storage medium can reside in an ASIC.
- the ASIC can reside in a user terminal.
- the processor and the storage medium can reside as discrete components in a user terminal.
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Marketing (AREA)
- Finance (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Technology Law (AREA)
- Data Mining & Analysis (AREA)
- Development Economics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB1006199A GB2467665A (en) | 2007-10-05 | 2008-10-06 | Methods and systems of predicting mortgage payment risk |
AU2008311048A AU2008311048A1 (en) | 2007-10-05 | 2008-10-06 | Methods and systems of predicting mortgage payment risk |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US97803307P | 2007-10-05 | 2007-10-05 | |
US60/978,033 | 2007-10-05 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2009048843A1 true WO2009048843A1 (en) | 2009-04-16 |
Family
ID=40549513
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2008/078987 WO2009048843A1 (en) | 2007-10-05 | 2008-10-06 | Methods and systems of predicting mortgage payment risk |
Country Status (3)
Country | Link |
---|---|
AU (1) | AU2008311048A1 (en) |
GB (1) | GB2467665A (en) |
WO (1) | WO2009048843A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019194679A1 (en) * | 2018-04-06 | 2019-10-10 | ABN AMRO Bank N .V. | Systems and methods for detecting fraudulent transactions |
US20200242626A1 (en) * | 2015-06-16 | 2020-07-30 | Palantir Technologies Inc. | Fraud lead detection system for efficiently processing database-stored data and automatically generating natural language explanatory information of system results for display in interactive user interfaces |
US11301861B2 (en) * | 2020-02-05 | 2022-04-12 | Capital One Services, Llc | System and method for modifying payment processing times upon suspicion of fraud |
US20230088436A1 (en) * | 2016-03-25 | 2023-03-23 | State Farm Mutual Automobile Insurance Company | Reducing false positives using customer feedback and machine learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6185543B1 (en) * | 1998-05-15 | 2001-02-06 | Marketswitch Corp. | Method and apparatus for determining loan prepayment scores |
US20040010443A1 (en) * | 2002-05-03 | 2004-01-15 | May Andrew W. | Method and financial product for estimating geographic mortgage risk |
US20040138993A1 (en) * | 1995-09-12 | 2004-07-15 | Defrancesco James | Computer implemented automated credit application analysis and decision routing system |
-
2008
- 2008-10-06 GB GB1006199A patent/GB2467665A/en not_active Withdrawn
- 2008-10-06 AU AU2008311048A patent/AU2008311048A1/en not_active Abandoned
- 2008-10-06 WO PCT/US2008/078987 patent/WO2009048843A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040138993A1 (en) * | 1995-09-12 | 2004-07-15 | Defrancesco James | Computer implemented automated credit application analysis and decision routing system |
US6185543B1 (en) * | 1998-05-15 | 2001-02-06 | Marketswitch Corp. | Method and apparatus for determining loan prepayment scores |
US20040010443A1 (en) * | 2002-05-03 | 2004-01-15 | May Andrew W. | Method and financial product for estimating geographic mortgage risk |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200242626A1 (en) * | 2015-06-16 | 2020-07-30 | Palantir Technologies Inc. | Fraud lead detection system for efficiently processing database-stored data and automatically generating natural language explanatory information of system results for display in interactive user interfaces |
US20230088436A1 (en) * | 2016-03-25 | 2023-03-23 | State Farm Mutual Automobile Insurance Company | Reducing false positives using customer feedback and machine learning |
US11978064B2 (en) | 2016-03-25 | 2024-05-07 | State Farm Mutual Automobile Insurance Company | Identifying false positive geolocation-based fraud alerts |
US11989740B2 (en) * | 2016-03-25 | 2024-05-21 | State Farm Mutual Automobile Insurance Company | Reducing false positives using customer feedback and machine learning |
WO2019194679A1 (en) * | 2018-04-06 | 2019-10-10 | ABN AMRO Bank N .V. | Systems and methods for detecting fraudulent transactions |
US11301861B2 (en) * | 2020-02-05 | 2022-04-12 | Capital One Services, Llc | System and method for modifying payment processing times upon suspicion of fraud |
Also Published As
Publication number | Publication date |
---|---|
AU2008311048A1 (en) | 2009-04-16 |
GB2467665A (en) | 2010-08-11 |
GB201006199D0 (en) | 2010-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7966256B2 (en) | Methods and systems of predicting mortgage payment risk | |
US7587348B2 (en) | System and method of detecting mortgage related fraud | |
US8639618B2 (en) | System and method of detecting and assessing multiple types of risks related to mortgage lending | |
US20220122171A1 (en) | Client server system for financial scoring with cash transactions | |
Haughwout et al. | Juvenile delinquent mortgages: Bad credit or bad economy? | |
US20150026039A1 (en) | System and method for predicting consumer credit risk using income risk based credit score | |
US8812384B2 (en) | Systems and methods for underlying asset risk monitoring for investment securities | |
US7885892B1 (en) | Method and system for assessing repurchase risk | |
US20090150312A1 (en) | Systems And Methods For Analyzing Disparate Treatment In Financial Transactions | |
Nichols et al. | Borrower self-selection, underwriting costs, and subprime mortgage credit supply | |
WO2001050315A2 (en) | Methods and apparatus for automated underwriting of segmentable portfolio assets | |
Li et al. | Chinese corporate distress prediction using LASSO: The role of earnings management | |
US7739189B1 (en) | Method and system for detecting loan fraud | |
Belsky et al. | Understanding the boom and bust in nonprime mortgage lending | |
WO2009048843A1 (en) | Methods and systems of predicting mortgage payment risk | |
AU2014200174A1 (en) | Methods and systems of predicting mortgage payment risk | |
Nguyen | How is credit scoring used to predict default in China? | |
Hoechstoetter et al. | Recovery rate modelling of non-performing consumer credit using data mining algorithms | |
Γιαννούλη | Research topics on credit risk management | |
Ntuli | Predicting financial distress in South Africa: the role of macroeconomic factors | |
Verma | Predicting Default of Indian Corporate Sector | |
Kula | Determinants of default in P2P property-backed loans | |
ElMasry | MACHINE LEARNING APPROACH FOR CREDIT | |
Dorothea | The use of credit scoring systems in measuring credibility: the case of greek companies | |
Marouani | Predicting Default Probability Using Delinquency: The Case of French Small Businesses |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08837094 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2008311048 Country of ref document: AU |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 1006199 Country of ref document: GB Kind code of ref document: A Free format text: PCT FILING DATE = 20081006 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1006199.2 Country of ref document: GB |
|
ENP | Entry into the national phase |
Ref document number: 2008311048 Country of ref document: AU Date of ref document: 20081006 Kind code of ref document: A |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 08837094 Country of ref document: EP Kind code of ref document: A1 |