WO2022038641A1 - A system and method for multi-data risk assessment of msmes. - Google Patents

A system and method for multi-data risk assessment of msmes. Download PDF

Info

Publication number
WO2022038641A1
WO2022038641A1 PCT/IN2021/050802 IN2021050802W WO2022038641A1 WO 2022038641 A1 WO2022038641 A1 WO 2022038641A1 IN 2021050802 W IN2021050802 W IN 2021050802W WO 2022038641 A1 WO2022038641 A1 WO 2022038641A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
msmes
module
variable
risk assessment
Prior art date
Application number
PCT/IN2021/050802
Other languages
French (fr)
Inventor
Jinand Vikasbhai SHAH
Original Assignee
Online Psb Loans Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Online Psb Loans Limited filed Critical Online Psb Loans Limited
Publication of WO2022038641A1 publication Critical patent/WO2022038641A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Definitions

  • Outlier module identifies the data values that have a significant impact on the mean.
  • the outlier module is a data point that’s totally different from the remaining data.
  • the discretization technique is the process of transferring continuous functions, models, variables, and equations into discrete counterparts and is an essential preprocessing technique used in various knowledge discovery and data mining tasks. This process is usually suitable for numerical evaluation and implementation on digital device. It also improved value spread in the skewed variables; its main aim is to transform a set of continuous attributes into discrete ones, by associating categorical values to intervals and thus transforming numerical data into categorical data.

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The present invention relates to the method of multi-data driven risk propensity model that provides an objective risk assessment of MSMEs. The present invention generally provides a system which is highly defined by the distinctively designed completely automated process that estimates to provide in depth information to lenders and various other stakeholders, financial institutions, creditors, other businesses, and institutions in assessing, analyzing and predicting financial strength of MSMEs before making important decisions in giving out loan to MSMEs or doing other business transactions with the MSMEs. The present invention is a distinctive, robust, insightful system/model built with more than fifty parameters covering business & financial aspects where a top distinctive score is formulated which is not only limited to credit history or loan payments but also works for MSMEs without any loan history. The present invention is developed with the pace of disruptive technologies for faster, more accurate, uniform, and timely credit decisions to create an effective and efficient credit risk evaluation system and methods for MSMEs.

Description

A SYSTEM AND METHOD FOR MULTI-DATA RISK ASSESSMENT OF MSMEs.
Title to the invention:
The present invention relates to a system and method for multi-data risk assessment of MSMEs and more particularly relates to the multi-data driven risk propensity model that provides an objective risk assessment of MSMEs and method thereof.
Background of the Invention
The Micro, Small and Medium Enterprises (MSMEs) are contributing to economic development worldwide. The ability of MSMEs to develop, grow, sustain, and strengthen themselves is heavily dependent on their capacity to access and manage finance. The ability to access finance is a major factor affecting the growth and success of MSMEs. MSME client needs financial assistance to run their business operations and approach fund providers/lenders/banks for credit. Research shows that the main challenge for getting loans is information asymmetry between external creditors and MSMEs themselves. This makes MSMEs investment riskier and more vulnerable for the investors.
Banks in general perceive MSMEs to have high credit risk as compared to big corporations. Banks consider managerial character, capacity, environmental condition, and collateral guarantee before making loan decisions. Each bank adopts a separate credit rating / scoring procedure and disclosure requirements for sanctioning loans. MSME clients find themselves spending a significant amount of time and effort while approaching the various banks for credit.
Furthermore, assessing the strength and weaknesses of MSMEs is one of the most challenging tasks in not only in banking but overall business environment. The difficulties stem from fragmented financial data, the strength of risk models, length of the process, and broader issues such as the tension between sales and credit. The competitive lending environment, regulatory requirements, different geographies, business environment and positions in the economic and credit cycles also have an impact. An incorrect credit decision endangers bank’s financial capability ending up in steep decline in the margin of profits. Further, multiple large and small businesses transact with MSMEs for supplying or buying goods and services. To understand the risk profiling/score of the MSMEs would be very critical while making various business decision (e.g., Length of credit period to be allowed, volume / value of transaction to start with, long term business perspective etc.)
Various prior art has been disclosed describing to credit risk assessment and scoring for business and entities. The prior art document US 2003/0229580 Al discloses a method to establish or enhance a business credit score. One embodiment describes a method of verifying the existence of business credit scores obtained from well-known credit agencies bureaus. In the absence of a credit score, a method is described to verify and eliminate any discrepancies related to the business information in public records prior to engaging in the generation of a credit score. In establishing or improving a credit score, one embodiment describes a method of facilitating desirable payment transaction experiences with vendors capable of extending a line of credit to businesses without requiring any personal guarantee. In addition, these vendors are qualified to provide reliable reporting of payment experiences to the credit agencies bureaus. One embodiment of the method facilitates receiving lines of credit, such as credit cards, from retail businesses not requiring a personal guarantee but which do require a credit rating.
Business credit bureaus are known in the art. Known companies employ top- down models, evaluating factors such as pay experiences, debt, demographics, and data-science derived algorithmic methods to produce a credit score for a person or a company. These systems are often ineffective, however, at assigning an accurate business credit score to a small business. Most of the data relied upon by these systems is provided by a small number of large companies who report pay experience data into the credit bureaus. If a small firm is not transacting business with one of the large firms reporting pay experience data, the business credit bureau will often have insufficient data to assign a meaningful business credit score.
Hence, there arises a need to better monitor MSMEs’ creditworthiness and risk profiling in a more systematic and preemptive manner by providing effective solution to these underwriting challenges and business decisions.
Object of the Invention
The main object of the present invention is to provide multi-data driven risk propensity model that provides an objective risk assessment of MSMEs and method thereof.
Another object of the present invention is to provide a smart, adaptive, selflearning and optimizing quantitative model for computation of the credit worthiness and overall risk profiling of the respective user interface.
Yet, another object of the present invention is to provide a multi-data-based risk propensity model that provides each organization a multi-variate, unbiased and comparable score in real time indicating its financial health and providing its risk assessment.
Another object of the present invention is to provide high interpretability of results, accuracy, and simplicity enabled trend-based analysis of financial health and industry clusters’ health trajectory over time.
Further, object of the present invention is to provide a robust, distinctive, insightful, and efficient centrally focused autonomous system to eradicate and reduce the financial risk and for analyzing financial strength of user interface for faster and better credit decisions.
Yet another object of the present invention is to provide various computation or other operations that can be performed on a real time basis that in turn assists and will have a better reference in assessing company’s credit risk and therefore the creditors, investors, lenders, fund providers and other stakeholders will have precise model in assessing the risk.
Other objects and advantages of the present disclosure will be more apparent from the following description which is not intended to limit the scope of the present disclosure.
Summary of the Invention:
The present invention relates to the method of multi-data driven risk propensity model that provides an objective risk assessment of MSMEs. The present invention generally relates to a novel, advanced, distinctive, and extremely powerful risk propensity, risk assessment model, a computing system which is highly defined by the distinctively designed completely automated process that estimates to provide lenders and various other stakeholders in analyzing financial strength of user interface before making important decisions in giving out loan to user interface or doing other business transactions with the MSME. The present invention provides a system which can provide completely configurable assessment method that provides in depth information to financial institutions, creditors, fund providers, other businesses, and institutions in assessing and predicting the user interface financial health before considering loan, investment, or other business decision in relation to the company. The present invention is developed with the pace of disruptive technologies for faster, more accurate, uniform, and timely credit decisions to create an effective, insightful, and efficient credit risk evaluation system and methods for MSMEs.
Brief Description of the Drawings
Other objects, advantages and novel features of the invention will become apparent from the following detailed description of the present embodiment when taken in conjunction with the accompanying drawings.
Fig. 1 illustrates functional block diagram of the present invention.
Fig. 2 is a flowchart that illustrates the general steps of processing a loan application of the present invention.
Fig 3 depicts a statistical modelling process that illustrates the final refined model, which forms the part of the cloud platform.
Detailed Description of the Invention
The technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all the embodiments.
The present invention provides totally configurable assessment method that provides in depth information to financial institutions, creditors, investors, fund providers, lenders, other businesses, or institutions in assessing and predicting the user interface financial health before considering loan/investment or other business decision in relation to the company.
In another embodiment, the present invention includes a central processing unit having a processor. The invention includes plurality of program modules executable on a processor shown as direct lending central processing unit. Central processing unit is typically a computer with a hardware compatible operating system.
Before discussing specific embodiments, it is to be noted that the term “central unit” may include microprocessor, motherboard, drive (e.g., Blu-ray, CD- ROM, DVD, floppy drive, hard drive, and SSD), Fan (heat sink), modem, and monitors. The ‘receiving unit’ can include a central processing unit (“CPU”), at least one read-only memory (“ROM”), at least one random access memory (“RAM”), storage devices and Bluetooth.
The “storing unit” includes at least one hard disk drive (“HDD”), at least one solid state devices (“SDD”), Floppy disk, Flash memory devices, at least one network card and one or more input/output (“I/O”) device(s). Further the ‘assessment unit’ means hardware that refers to the physical parts of the computer and related devices mouse, keyboard, monitor and scanner.
The “communication interface” includes transmission media, routers, repeaters, gateways, network adapters and cables. The network is placed in Local Area Network (L.A.N.), Wide Area Network (W.A.N.) and/or Virtual Private Network (V.P.N) in which network server is receiving and giving data to and/or from user interface or company.
The term “user interfaces (5, 6)” shall generally refer to bank interface (6) and MSME, Company or Entity (5) interface.
An object of the present invention is to provide a machine learning based credit risk assessment rating method for loan users with fast, comprehensive, and convenient data changing. Further, the present invention, involves leveraging data mining techniques to extract features from raw data.
FIG. 1 depicts a block diagram of the present invention. As shown in Fig. 1, the system comprises a central unit (1) for storing information, transmit and receive data from the borrower side and lender side. In alternative configurations, different and/or additional modules can be included in the system environment. The central unit (1) is associated with a loan application receiving unit (2) for receiving request from the client, a storing unit (3) which store the different data of the borrowers and the user interface (5, 6). An assessment system (4) to generate the assessment scores for the user interface (5, 6). When the user interface (5, 6) applies for the loan, the user interface (5, 6) may use the assessment score, which is calculated by the assessment unit (4). The assessment unit (4) receives information about the user interface (5, 6) from the one or more financial database. The assessment unit (4) uses the received information to determine the assessment score of user interface (5, 6).
The borrowers request with other details like GST, ITR, Bank Statement and other basic details are received through the receiving unit (2).
The data received from the receiving unit (2) is stored in the storing unit (3). Further, the storing unit (3) stores the different stated and baselined data sources, parameters which might have been dynamically accumulated both internationally as well as domestically. Domestic data are received from the different data source that includes, but not limited to: GST, ITR, MCA, bank statements, commercial bureau, individual bureau, nature of business activities, geography, and many more in addition to syndicated data. It identifies and creates meaningful parameters calculation that contributes to determining credit worthiness of an entity from different sources of data and all the parameters are merged into single data source. A data merging module merges parameters of the dataset received from public database, private database, and proprietary data into single data source. Further, the data merging module is required when the raw data is stored in multiple files or data tables that must be analysed all at once. Further, the parameter is a function argument that could have a possible range of values. In the present invention, the specific model is used in the function and requires parameters to generate prediction on new data. In the present invention to build a data centric, unbiased target definition, the pools of parameters with the highest influence on risk profile were identified and the multi-variable driven composite measure was created. By using the identified variables, a set of multiple composite parameters modules were created with a view to reflect bankers’ conditions to assess credit worthiness. These composite parameters modules were then applied in a hierarchical fashion from most stringent to least stringent condition to identify ‘Bad’ cases. The condition of the composite parameters modules is brought up in the hierarchical order by using the feature engineering going from most stringent on top.
According to the present invention, the assessment unit (4) generates assessment score to reduce the financial risk. The data assessment score is generated with centrally focused autonomous process which is optimal for standard decisions and modeling oriented parameters are created using different variable parameters such as a variable processing modules, missing data module for lack of information for some observation within the variable i.e. the numerical missing variables and the categorical missing variables, Outlier module for unusual or unexpected values using a discretization technique, Cardinality module for number of different labels i.e. rare label encoding and categorical variable encoding in the categorical variables, Linear model assumption is used to meet the assumption and distributions for normal, skewed or others, Feature magnitude module for scale of different features i.e. a feature scaling and a feature selection, and is characterized in terms of its risk detection capacity which aids in developing a modelled Risk Profile to an exceptionally high statistical accuracy giving more impact and accuracy in determining financial health of the user interface (5, 6). Further, the assessment unit (4) continuously moderates the score of the risk assessment and making it comparable across user interface (5, 6) enabling clustering and advanced research and analysis to provide industry centric insights into user interface (5, 6) credit worthiness for industry clusters of interest. In the present invention, all the different variables preprocessing modules are identified within the data set to determine the specific technique to be deployed for a given variable. The variable preprocessing modules present in data sets comprise a numerical variable, a categorical variable, a discrete variable, and a date/time variable. The numerical variables are a data variable that takes on any value within a finite or infinite interval. The numerical variable can also be called a continuous variable because it exhibits the features of continuous data. The term numerical variable refers to anything represented by numbers, and data is often required to be represented numerically for machine learning to use the data to make predictions, and numerical variables' values are numbers. While the categorical variable means the value that assumes a limited and fixed set of possible values allowing data unit to be assigned to a broad category for classification. The categorical variables also determine the values of the categorical variable which are selected from a group of categories, also called labels.
As further, the discrete variable is the variable that can only take on a finite number of values, and these values are integers which mean numbers that are not fractions but counts. The date/time variable refers to a particular type of categorical variable that accepts dates or times as values. The date/time variables can contain only date, only time, or both date and time. These variables preprocessing modules are used in the training and predication phase of machine learning to test and predict data labels. They are also essential for all types of feature extraction and model training.
Furthermore, a common mistake is beginning predictive modeling with focus on data that is currently available rather than missing data. As a result, when the data is received from the receiving unit (2), there is a possibility that missing values will be introduced into the data set. Missing data is often associated with negative information in credit risk assessments. Therefore, when capturing the financial history of customers, it is best to treat missing values not as random. Accordingly, it includes missing values through a data imputation to represent inherent insights. The data imputation is the act of replacing missing data with statistical estimates of the missing values. The aim of imputation technique is to produce a complete dataset to use in the process of training machine learning models.
The missing imputation data module in the present invention identifies numerical missing variables and categorical missing variables. The numerical missing variable is used to impute the missing imputation data module and includes a mean/median imputation along with missing indicator and an end of tail imputation. The mean/median imputation consists of replacing all occurrences of missing values within the variable by the mean or median of that variable. This method is suitable for the numerical missing variables, and it can be used when data is missing completely at random and no more than 5% of the numerical missing variable has missing data. The mean or median value should be used to replace missing values and it should only be calculated in the train set. This method can be utilized in production, i.e., during model deployment, easy to implement and it also provides a quick way to retrieve comprehensive datasets. For the end of tail imputation, it automatically selects arbitrary values at the end of a variable distribution and is roughly equivalent to the arbitrary value imputation. The values should be calculated only on the train set to replace missing data.
Furthermore, whether distribution of data is normal or skewed, some machine learning models are designed to operate best by some distribution assumptions. The distribution of data in to the data set can help to identify which machine learning model is best to use. So, if the numerical missing variable has the normal distribution, the mean and median are approximately the same or if the numerical missing variable has the skewed distribution, then the median is a better representation.
Simultaneously, the categorical missing variable of the missing imputation data module is used for adding “missing” category imputation. This method consists of covering missing data as an additional label or category of the categorical missing variable. According to the present invention, the categorical missing variables are the most widely used method for missing imputation data module by filling the missing observations with a missing category and it creates a new label or category. The missing data imputation module provides a complete collection of data in datasets and makes no assumptions about the data.
There are large numbers of data values that act as outlier module. Outlier module identifies the data values that have a significant impact on the mean. The outlier module is a data point that’s totally different from the remaining data. To reduce the sensitivity of certain variables in modeling due to outlying values, the outlier module uses a discretization technique. The discretization technique is the process of transferring continuous functions, models, variables, and equations into discrete counterparts and is an essential preprocessing technique used in various knowledge discovery and data mining tasks. This process is usually suitable for numerical evaluation and implementation on digital device. It also improved value spread in the skewed variables; its main aim is to transform a set of continuous attributes into discrete ones, by associating categorical values to intervals and thus transforming numerical data into categorical data.
Next, in the cardinality module for number of different labels in the categorical variables includes a rare label encoding and a categorical variable encoding. The rare label encoding are those that appear only in a small percentage of the observations in a dataset. The rare labels may cause some issues, especially with over-fitting and generalization; it tends to appear only on the train set, causing overfitting or only on the test set, leading to the model not being able to score appropriately. Most problems can be prevented by grouping those rare labels into a new category, such as other or rare. According to the present invention, any value less than 2% is considered rare and is grouped into the new category called “Rare” or “Other”. Furthermore, the aim of the categorical variable encoding is to produce variables which can use to train machine learning models and build predictive features from categories. The categorical encoding variable also refers to replacing the category strings by a number representation for model training and has several techniques for the data transformation. In the present invention, a Weight of Evidence (WOE) technique is used to encode the categorical variable for classification. The WOE technique is the natural logarithm (In) of the probability that the target equals l(one) divided by the probability of the target equals 0(zero). Therefore, to determine that which one is more predictive, it is possible to use the WOE technique to create a monotonic relationship between the target and the variables. It also arranges the categories on a “logistic” scale, which is natural for logistic regression, and because they are on the same scale it compares with the transformed variable.
In another embodiment, a top distinctive score is formulated by the numerical system which is not only limited to credit history or loan payments but, also works for MSMEs without any loan history and is built with more than fifty parameters covering business & financial aspects and tracked over specific time periods reflecting a user interface (5, 6) or industry cluster's health trajectory providing timely industry pulse to the bankers or lenders by enabling them to analyzing customers’ data in a better way to provide loans efficiently as well use it for other business transactions.
The manifestation of the heavy computations is done using Python with data extracts and feature engineering done leveraging MySQL Database. The process involves assessment of variables, EDA as required, Model Build, Feature Engineering and Refinement, Model Tune and Alignment, Roll Out and Sub-models as required. The feature engineering process is key to understanding what data is available to use in the machine learning. It refers to a process of selecting modeling (such as deep learning, decision trees or regression). The process involves a combination of data analysis, applying rules of thumb and judgment and it is useful to improve the performance of machine learning however, before choosing machine learning model, one must select features that are important to present invention final model build. The feature magnitude module is also necessary to test models and improving their accuracy. The feature magnitude module is used to scale the different feature in the feature scaling module and the feature selection module.
The feature scaling module normalizes the range of independent variables in data, i.e., the method to set the feature value range within a similar scale. The most common technique of the feature scaling module is normalization and standardization.
In another embodiment, the unique credit risk system is intended to deliver sharp insights over the credit lifecycle, from pre-disbursal to post-disbursal assessment, vide the score generated the outline of the score is defined based on the score matrix and any score that does not fall within the acceptable/appropriate range serves as an Early Warning Indicator that can influence a credit decision to anticipate the event of default.
The system of scoring mechanism provides insights of key business matrices driving the score to facilitate decision making and is intended to deliver sharp insights of strengths, weaknesses, areas for improvement etc. to the MSMEs to understand their existing risk profiling as well as the future progress. It therefore helps each MSME themselves to understand in detail their own position and get insights of the areas that needs further improvement as well as areas that they need to maintain which are currently their strengths.
Fig. 2 is a flowchart that illustrates the general steps of processing a loan application of the present invention. It is a robust, distinctive model built as per above workflow and will be leveraging Processing Power, Memory and Hard Disk for Data Processing, Data Store and Statistical Processing on Compatible Windows / Linux Hardware. Fig. 2 shows a method of assessing credit risk of the user interface (5, 6) in an on-line lending environment when executed by the processor. The instructions comprising: a request being received from loan applicant to allow remotely and receive data from the loan applicant by receiving unit (2). After receiving the application, the central data (1) accesses at least one database for information relevant to the loan applicant's identity and for information relevant to the loan applicant's ability and willingness to repay the loan data sources that the borrowing party is associated with and save the data in storage data unit (3). The loan applicant's identity being verified by comparing the information received from the loan applicant with the information received from said at least one database relevant to the applicant's financial data. After verifying all the parameters, a base assessment score being calculated by the assessment unit (4) for the borrowing party's profile. The borrowing party's user profile data and credit risk score is stored in the storing unit (3). Based on the underwriting score, the system determines in real time manner and without human assistance whether the loan applicant's request is approved or rejected.
Fig 3 depicts a statistical modelling process that illustrates how the final refined model forms the part of the cloud platform. As shown in the Fig 3, to develop a credit scoring system/ for evaluating credit applications, a statistical model is created. For creating the model various predefined modelling-oriented parameters are created/selected using different variable parameters (7) and after cross validation (8) of these parameters the best parameters (9) are selected. These selected parameters are used to retrain/refine the model (13) which has been developed/created using the training data (11) and test data (12) available in Data set (10)/Data Lake. Repeating the above process for various user data available, the model is refined, and final evaluated model (14) forms the part of cloud platform and is used to generate the score for various loan applications.
To develop the credit scoring system for evaluating credit applications, the statistical model created may have embedded the feature selection module. The best parameters (9) are selected after cross-validation (8) of the variable parameters (7). So, the model (13) may perform its best by giving a set of input features in the feature selection module. The feature selection module selects those features which ones are more appropriate in the data set (10) that contributes most to the variable parameters (7) or output in which they are interested in. Having irrelevant features in the data set (10) with predictions can decrease the accuracy of the models (13) and make the model (13) learn based on irrelevant features. The feature selection module model provides easier interpretation, enabling diagnostics; shorter training time; enhanced generalization by reducing over fitting; reduce data errors during model use variance, correlation and Lasso methods are used for selecting best parameters (9).
Furthermore, to reduce the model (13) over engineering, present invention split the model (13) into two parts: Existing credit customer model and new to credit customer model. The existing credit customer model is for customers with past credit history and the new to credit customer model is for customers without credit history. According to the present invention, the method selected for the existing credit customer model is a Gradient Boosted Machine model (GBM), which has the best performance in the terms of both high accuracy and efficiency. It has higher precision, recall and F-l score as compared to LRM (Logistic Regression Model).
The Gradient Boosted Machine model (GBM) is a method that generates the predictions out of the data. It provides higher performance on test data (12) as compared to LRM. The GBM model ensembles several weak classifiers together to form strong and effective classifier. The method contains rounds of iteration that entrust higher weight to the negative samples and lower weight to effective classifier. It’s a method of merging all weak classifier together to have the model (13) with better performance. Other advantage of GBM model (Gradient Boosted Machine) over the LRM is that it requires less feature engineering steps. The GBM models are trained on best parameters (9) that maximize the performance of individual classifier. Specifically, in the present invention the XGBoost algorithm of GBM model provides more regularized model formalization to control over-fitting for better performance. After selection of optimal model, the model evaluation (14) is done on test data (12). Now, for the new to credit customer model according to the present invention, the performance of LRM does not match the model robustness criteria stated by the present invention. As a result, for the new to credit customer model, the GBM model was applied, which has higher precision, recall, and F-l score as compared to the LRM (Logistic Regression Model).
In the present invention, to verify the effectiveness of GBM model, it uses LRM for comparison to test and contrast the accurate classification of the dataset (10). The results of GBM and LRM models are compared by accuracy, precision, recall and Fl score. From the dataset (10), the GBM model has brought a substantial improvement than the LRM in terms of four matrices i.e., 4% higher accuracy, 9% higher precision, 7% higher recall and 8% higher Fl score in GBM model.
According to the present invention, the better and improved performance about the dataset (10) in terms of four matrices demonstrates that the present invention GBM model has a higher accuracy rate and stronger generalization ability. In case of GBM model 69% customers have score higher than the 65 and this seems that in case of GBM model some variables have significantly higher contribution in the output.
The main advantage of the present invention is to enhance the quality and reliability of financial information enabling more lending to the sector and to reduce the loan repayment defaults by evaluating and analyzing the creditworthiness of the user interface (5, 6). The present invention provides a robust and efficient credit risk evaluation mechanism to eradicate and reduce the financial risk. The present invention provides improved credit risk system generated with centrally focused autonomous process, which implements dynamic, transparent, and actionable credit analytics framework for analyzing financial strength of user interface (5, 6) for faster and better credit decisions. The invention has been explained in relation to specific embodiment. It is understood that the foregoing description is only illustrative of the present invention, and it is not intended that the invention be limited or restrictive thereto. Many other specific embodiments of the present invention will be apparent to one skilled in the art from the foregoing disclosure.

Claims

We Claim:
1. A system for multi-data risk assessment of MSMEs comprises a central unit (1) connected to a receiving unit (2) and a transmitting unit, an assessment unit (4) and a data storing unit (3) associated to the central unit (1), a communication interface for communicating with the central unit (1) and user interfaces (5, 6); the receiving unit (2) receives and collects dataset about a company; the assessment unit (4) assess data by a data merging module and a multiple composite parameters module, the data merging module merges parameters of the dataset received from public database, private database and proprietary data into single data source; the multiple composite parameter module has variables processing module, missing imputation data module, outlier module, cardinality module and feature magnitude module; the assessment unit (4) receives information to determine an assessment score for the user interface (5, 6) to reduce a financial risk; the assessment score is generated with centrally autonomous process which is optimal for a modeling oriented parameters; the storing unit (3) stores the data assessed by the assessment unit (4); the transmitting unit transfers the data through communication interface to the user interfaces (5, 6).
2. The system for multi-data risk assessment of MSMEs as claimed in claim 1 wherein the receiving unit (2) receives and collects dataset about a company from public database, private database, and proprietary data.
3. The system for multi-data risk assessment of MSMEs as claimed in claim 1 wherein the variables processing module determines a numerical variable, a categorical variable, a discrete variable and a date and time variable.
4. The system for multi-data risk assessment of MSMEs as claimed in claim 3 wherein the numerical variable determines a data variable that takes on any value within a finite or infinite interval and values of the numerical variables are numbers.
5. The system for multi-data risk assessment of MSMEs as claimed in claim 3 wherein the categorical variable determines the value that assumes a limited and fixed number of possible values that allow a data unit to be assigned to a broad category for classification.
6. The system for multi-data risk assessment of MSMEs as claimed in claim 3 wherein the discrete variable determines the variable that takes on a finite number of values only.
7. The system for multi-data risk assessment of MSMEs as claimed in claim 3 wherein the date and time variable include a particular type of the categorical variable that takes dates or time as the values.
8. The system for multi-data risk assessment of MSMEs as claimed in claim 1 wherein the missing imputation data module determines a numerical missing variable and a categorical missing variable.
9. The system for multi-data risk assessment of MSMEs as claimed in claim 8 wherein the numerical missing variable includes a mean/median imputation along with missing indicator and an end of tail imputation.
10. The system for multi-data risk assessment of MSMEs as claimed in claim 8 wherein the categorical missing variable consists of adding a missing category imputation.
11. The system for multi-data risk assessment of MSMEs as claimed in claim 1 wherein the outlier module identifies data values that have significant impact on mean by using a discretization technique.
12. The system for multi-data risk assessment of MSMEs as claimed in claim 1 wherein the cardinality module determines the number of different labels in the categorical variables consists of a rare label encoding and a categorical variable encoding.
13. The system for multi-data risk assessment of MSMEs as claimed in claim 12 wherein the categorical variable encoding replaces the category string by a number representation for the model training by a Weight of Evidence (WOE) technique.
14. The system for multi-data risk assessment of MSMEs as claimed in claim 1 wherein the feature magnitude module includes a feature scaling module and a feature selection module.
15. The system for multi-data risk assessment of MSMEs as claimed in claim 14 wherein the feature scaling module standardizes range of independent variable data by a standardization technique.
16. The system for multi-data risk assessment of MSMEs as claimed in claim 14 wherein the feature selection module includes the decision about variable parameters (7) and all the available features should be included in the model (13).
17. The system for multi-data risk assessment of MSMEs as claimed in claim 1 wherein the user interface (5, 6) includes bank interface and MSME company interface.
18. The system for multi-data risk assessment of MSMEs as claimed in claim 1 wherein the communication interface includes network that contain transmission media, routers, repeaters, gateways, network adapters and cables and it is placed in Local Area Network (L.A.N.), Wide Area Network (W.A.N.) and/or Virtual Private Network (V.P.N)) in which network server is receiving and giving data to and/or from user interface (5, 6).
19. The system for multi-data risk assessment of MSMEs as claimed in claim 1, wherein the assessment unit (4) generates the assessment score by a XGBoost algorithm of Gradient Boosted Machine Model (GBM).
21
20. A method for multi-data risk assessment of MSMEs comprises following steps: a) connecting a central unit (1) to a receiving unit (2) and a transmitting unit; b) receiving and collecting dataset about a company from the receiving unit (2); c) assessing data by data merging module, and multiple composite parameters module for an assessment unit (4); d) merging parameters of the dataset received from public database, private database and proprietary data by the data merging module into single data source; e) determining data by a multiple composite parameter module; f) determining an assessment score for user interface (5, 6) to reduce a financial risk in the assessment unit (4) by receiving information; g) generating the assessment score by centrally autonomous process; h) storing the data assessed by the assessment unit (4) in a storing unit (3); i) transferring the data through communication interface to the user interface (5, 6) in the transmitting unit.
21. The method for multi-data risk assessment of MSMEs as claimed in claim 20 wherein in step (e), the multiple composite parameter module includes variables processing module, missing imputation data module, outlier module, cardinality module and feature magnitude module.
22. The method for multi-data risk assessment of MSMEs as claimed in claim 21, wherein the variables processing module determines a numerical variable, a categorical variable, a discrete variable and a date and time variable.
22
23. The method for multi-data risk assessment of MSMEs as claimed in claim 21, wherein the missing imputation data module determines a numerical missing variable and a categorical missing variable.
24. The method for multi-data risk assessment of MSMEs as claimed in claim 21, wherein the outlier module identifies data values that have significant impact on mean by using a discretization technique.
25. The method for multi-data risk assessment of MSMEs as claimed in claim 21, wherein the cardinality module determines the number of different labels in the categorical variables consists of a rare label encoding and a categorical variable encoding.
26. The method for multi-data risk assessment of MSMEs as claimed in claim 21, wherein feature magnitude module includes a feature scaling module and a feature selection module.
23
PCT/IN2021/050802 2020-08-21 2021-08-20 A system and method for multi-data risk assessment of msmes. WO2022038641A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN202021036080 2020-08-21
IN202021036080 2020-08-21

Publications (1)

Publication Number Publication Date
WO2022038641A1 true WO2022038641A1 (en) 2022-02-24

Family

ID=80322934

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2021/050802 WO2022038641A1 (en) 2020-08-21 2021-08-20 A system and method for multi-data risk assessment of msmes.

Country Status (1)

Country Link
WO (1) WO2022038641A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115471056A (en) * 2022-08-31 2022-12-13 鼎翰文化股份有限公司 Data transmission method and data transmission system
CN116862643A (en) * 2023-06-25 2023-10-10 福建润楼数字科技有限公司 Automatic wind control feature screening method for multi-channel fund integration credit business
CN117670525A (en) * 2023-12-22 2024-03-08 广东金融学院 Enterprise credit assessment system based on big data analysis

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020107872A1 (en) * 2018-11-29 2020-06-04 平安科技(深圳)有限公司 Company risk analyzing method, apparatus, computer device, and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020107872A1 (en) * 2018-11-29 2020-06-04 平安科技(深圳)有限公司 Company risk analyzing method, apparatus, computer device, and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YIMENG-ZHANG: "A Short Guide for Feature Engineering and Feature Selection", GITHUB.COM/, 15 December 2018 (2018-12-15), XP055908290, Retrieved from the Internet <URL:https://github.com/ashishpate126/Amazing-Feature-Engineering/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md> *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115471056A (en) * 2022-08-31 2022-12-13 鼎翰文化股份有限公司 Data transmission method and data transmission system
CN116862643A (en) * 2023-06-25 2023-10-10 福建润楼数字科技有限公司 Automatic wind control feature screening method for multi-channel fund integration credit business
CN117670525A (en) * 2023-12-22 2024-03-08 广东金融学院 Enterprise credit assessment system based on big data analysis

Similar Documents

Publication Publication Date Title
Barboza et al. Machine learning models and bankruptcy prediction
Veganzones et al. Corporate failure prediction models in the twenty-first century: a review
García et al. An insight into the experimental design for credit risk and corporate bankruptcy prediction systems
Teles et al. Artificial neural network and Bayesian network models for credit risk prediction
WO2022038641A1 (en) A system and method for multi-data risk assessment of msmes.
Callejón et al. A System of Insolvency Prediction for industrial companies using a financial alternative model with neural networks
Ala’raj et al. A deep learning model for behavioural credit scoring in banks
Abdou et al. Prediction of financial strength ratings using machine learning and conventional techniques
Radovanovic et al. The evaluation of bankruptcy prediction models based on socio-economic costs
Kumar et al. Credit score prediction system using deep learning and k-means algorithms
Sylvester Walusala et al. A hybrid machine learning approach for credit scoring using PCA and logistic regression
Ndayisenga Bank loan approval prediction using machine learning techniques
Pradnyana et al. Loan Default Prediction in Microfinance Group Lending with Machine Learning
Khiem Tran et al. Towards Improved Bankruptcy Prediction: Utilizing Variational Autoencoder Latent Representations in a Norwegian Context
Caplescu et al. Will they repay their debt? Identification of borrowers likely to be charged off
Salihu et al. A review of algorithms for credit risk analysis
Zand Towards intelligent risk-based customer segmentation in banking
Lombardo et al. Deep Learning with Multi-Head Recurrent Neural Networks for Bankruptcy Prediction with Time Series Accounting Data
Doumpos et al. Data analytics for developing and validating credit models
Jacobs Jr Benchmarking alternative interpretable machine learning models for corporate probability of default
Maurya et al. A Decision Tree Classifier Based Ensemble Approach to Credit Score Classification
Kanimozhi et al. Predicting Mortgage-Backed Securities Prepayment Risk Using Machine Learning Models
Simão Machine Learning applied to credit risk assessment: Prediction of loan defaults
Prabagar et al. Data mining based framework for organisational financial management
YESHAMBEL A LOAN DEFAULT PREDICTION MODEL FOR ACSI: A DATA MINING APPROACH

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21857951

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21857951

Country of ref document: EP

Kind code of ref document: A1