CN113657990A - Ant-lion algorithm optimized NARX neural network risk prediction system and method - Google Patents

Ant-lion algorithm optimized NARX neural network risk prediction system and method Download PDF

Info

Publication number
CN113657990A
CN113657990A CN202110894613.4A CN202110894613A CN113657990A CN 113657990 A CN113657990 A CN 113657990A CN 202110894613 A CN202110894613 A CN 202110894613A CN 113657990 A CN113657990 A CN 113657990A
Authority
CN
China
Prior art keywords
data
neural network
narx neural
customer
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110894613.4A
Other languages
Chinese (zh)
Inventor
李兰
江远强
李晓萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baiweijinke Shanghai Information Technology Co ltd
Original Assignee
Baiweijinke Shanghai Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baiweijinke Shanghai Information Technology Co ltd filed Critical Baiweijinke Shanghai Information Technology Co ltd
Priority to CN202110894613.4A priority Critical patent/CN113657990A/en
Publication of CN113657990A publication Critical patent/CN113657990A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention discloses a system and a method for predicting risk of an ant lion algorithm optimized NARX neural network, wherein the system comprises a user side and a server side, the user side comprises an information acquisition module and a risk prediction initiating module, and the server side comprises an information processing module, a database and a risk prediction module; the information acquisition module is used for acquiring data of a customer by a user and integrating the data into customer data; the risk prediction initiating module is used for initiating a risk prediction application request by a user; the information processing module is used for acquiring client data, storing the client data in a database, acquiring and auditing the risk prediction application request and generating auditing information transmitted to the client and the risk prediction module; a database for storing customer data; the risk prediction module is used for acquiring the audit information, acquiring client data in the database according to the audit information, and performing risk prediction on the client data to obtain client overdue risk prediction data; the method comprises steps A1-A6.

Description

Ant-lion algorithm optimized NARX neural network risk prediction system and method
Technical Field
The invention belongs to the technical field of internet finance, and particularly relates to a system and a method for predicting risk of an optimized NARX neural network by using ant lion algorithm.
Background
In order to meet the credit wind control requirements of different stages, a financial institution generally needs to score the risk of a financial user by using an application scoring card before the credit, a behavior scoring card in the credit and a collection scoring card after the credit. The credit-in-credit behavior scoring card model is a scoring model which predicts the default risk of a customer according to historical behavior characteristic data of the customer during credit to evaluate the repayment capacity, repayment willingness and the like of the customer according to various behaviors of the financial customer during account use, monitors the credit-in-credit behavior according to default probability and dynamically predicts the credit-in-credit risk of the financial customer.
The credit behavior scoring card makes more use of the client repayment behavior, which is generally time sequence data and shows different behaviors according to the time lapse. However, in the prior art, the conventional logistic regression, XGboost and LightGBM cannot well process time sequence data, and the intra-lending risk prediction method based on the BP, RBF and other neural networks needs a large amount of data calculation and consumes a long time, and the conventional intra-lending risk prediction method cannot meet the requirements on accuracy and efficiency of the intra-lending risk prediction process.
The NARX (Nonlinear Auto-Regression with External input) neural network is called as a Nonlinear autoregressive network with External input, is a dynamic feedforward neural network, the output result of the network depends on the current input and the past output result, and due to the existence of delay feedback, the network has a memory function on historical state information and can well reflect the time-varying characteristic of a time sequence. The NARX neural network can not only predict the value of the input signal at the next moment, but also be used for nonlinear filtering to model a nonlinear dynamic system. Compared with the traditional Recurrent Neural Network (RNN), the NARX neural network can show better effects in the aspects of learning ability, convergence rate, generalization performance, prediction accuracy and the like, has the characteristics of nonlinear mapping ability, good robustness, self-adaptability, self-learning property and the like, and has the convergence rate and normalization superior to those of other neural networks, so that the defect of long time consumption of neural network operation is relieved to a certain extent, the accuracy and the high efficiency of the demand of loan overdue prediction are met, and the method is very suitable for predicting the overdue risk of clients.
However, like other neural networks, the NARX neural network is susceptible to the initial value, and in the prior art, algorithms such as inheritance, particle swarm, ant colony and the like are mainly used for optimizing the initial value of the network, but the problems of local optimization, low convergence speed and the like are easily caused, so that how to enable the NARX neural network to jump out of the local optimization, improve the convergence speed, and how to realize the balance between global exploration and local development capacity is still a difficult point.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a system and a method for predicting the risk of the Anto-lion-algorithm optimized NARX neural network aiming at the defects in the prior art, wherein the NARX neural network can show better effects in the aspects of learning ability, convergence speed, generalization performance, prediction accuracy and the like, the accuracy and the high efficiency of the financial overdue risk prediction requirement are met, and the method are very suitable for predicting the repayment of customers.
The invention provides an ant lion algorithm optimized NARX neural network risk prediction system, which comprises a user side and a server side, wherein the user side comprises an information acquisition module and a risk prediction initiating module, and the server side comprises an information processing module, a database and a risk prediction module;
the information acquisition module is used for acquiring customer data by a user and integrating the customer data into customer data;
the risk prediction initiating module is used for initiating a risk prediction application request by a user;
the information processing module is used for acquiring client data, storing the client data in a database, acquiring and auditing a risk prediction application request and generating auditing information transmitted to the client and the risk prediction module;
the database is used for storing customer data;
the risk prediction module is used for acquiring audit information, acquiring client data in the database according to the audit information, and performing risk prediction on the client data to obtain client overdue risk prediction data;
the risk prediction module carries out risk prediction on the client data to obtain client overdue risk prediction data and comprises the following steps:
step A1, obtaining loan data required by modeling from customer data as a customer loan data sample, performing labeling processing on each customer in the customer loan data sample to obtain corresponding characteristic data of the customer, marking the risk level of the customer according to the repayment record of the customer, extracting the characteristic data in the customer loan data sample according to characteristic classification and associating the characteristic data with the risk level of the customer to form a sample data set;
step A2, preprocessing the sample data set and according to the following steps: 3, dividing the ratio into a training set and a test set, and carrying out normalization processing on the training set and the test set;
a3, establishing a NARX neural network prediction model, and training the NARX neural network prediction model by using a training set;
a4, optimizing the NARX neural network prediction model by adopting an improved ant lion algorithm and optimizing the NARX neural network prediction model by utilizing a specified network performance evaluation function;
a5, carrying out prediction performance test on the NARX neural network prediction model by using a test set to obtain prediction performance test data, comparing the prediction performance test data with a genetic algorithm and a particle swarm algorithm, comparing the obtained overdue probability value with a corresponding actual sample according to the fraud probability value of a fraudulent user of the test set, judging the stability of the overdue prediction model and formulating offset;
and A6, acquiring historical behavior characteristic data of the inventory lending user through the NARX neural network prediction model and outputting overdue risk prediction data of the client.
The ant lion algorithm optimized NARX neural network risk prediction system comprises:
the client loan data sample in the step A1 is 10000 loan data with repayment records and the first loan application time is within 6-12 months;
the client loan data sample comprises multi-dimensional data of a client, wherein the multi-dimensional data comprises client attribute data, client loan data and user platform operation behavior data;
labeling each customer in the customer loan data sample according to a repayment record;
the original data comprises service types and historical behavior characteristic data of clients;
in step a2, the sample data set is preprocessed by feature screening, missing completion, and abnormal value processing.
The ant lion algorithm optimized NARX neural network risk prediction system comprises:
the step a3 of establishing the NARX neural network prediction model specifically includes the following steps:
step B1: network initialization, namely determining the number of nodes of an input layer, the number of nodes of an output layer and the number of nodes of a hidden layer of an NARX neural network according to the number of labels in a client loan data sample and the number of dimensions influencing overdue risks, determining learning rate and activation functions of neurons, and initializing connection weights, hidden layer offsets and output layer offsets among the output layer, the output layer and the hidden layer;
step B2: calculating hidden layer node output H according to the following formula;
Figure BDA0003197385820000041
wherein HjOutput for the jth hidden layer node;
f () is an activation function of the hidden layer node, and a tanh function is selected;
p is 1,2, …, m is the delay of the external input variable;
q is 1,2, …, n is the delay of the output feedback signal;
w is a connection weight;
Wjpis the connection weight between the jth hidden layer node and the external input variable with the delay step length of p;
Wjqis the connection weight between the jth hidden layer node and the output feedback signal with the delay step length of q;
x (t) is the value of the external input variable at time t;
x (t-p) is a network input delay parameter;
x (t-q) is an external feedback delay parameter;
bjis the offset of the jth hidden layer node;
step B3: calculating the final output y (t +1) of the NARX neural network according to the following formula;
Figure BDA0003197385820000042
where y (t) is the value of the target quantity at time t;
x (t) is the value of the external input variable at time t;
n is the number of input neurons which is the number of features of the input sample;
m is the number of neurons in the hidden layer;
w is a connection weight;
s is the number of hidden layer nodes;
in step a3, training the weights and offset values of the NARX neural network prediction model using the training set by using the confidence domain method specifically includes the following steps:
step C1: setting a region with the maximum displacement as a radius, and searching an optimal point of an objective function in the region;
step C2: if the objective function value is increased, adjusting the area range to continue solving;
if the objective function value is reduced, the iterative computation is continued according to the rule.
The ant lion algorithm optimized NARX neural network risk prediction system comprises:
in the step a4, a modified ant lion algorithm is adopted to optimize the weight and the offset value of the NARX neural network prediction model, and the method specifically comprises the following steps:
initializing the weight value and the offset value of the NARX neural network, determining the topological structure of the NARX neural network and the number of nodes of each layer according to the training set, wherein the dimension of an individual to be optimized is (n +1) x m, and n is the characteristic number of an input sample, namelyInputting neuron number, m is hidden layer neuron number, coding the weight and offset value of NARX neural network as the position vector of ant lion population, determining the position of each ant lion in population dimension, setting population initial scale as P and maximum scale as Pmax,
Figure BDA0003197385820000051
for each individual ant lion and representing a NARX neural network structure, the expression is as follows:
Figure BDA0003197385820000052
wherein, wijIs represented by being located at [ -1,1 [)]The weight between the ith hidden layer neuron and the jth input neuron;
birepresents being located at [0, 1]]The offset value of the ith hidden layer neuron in between;
the improved ant lion algorithm is to update an ant walking boundary mode, and the definition formula of the ant walking boundary mode is as follows:
Figure BDA0003197385820000061
wherein gamma is a shrinkage adjustment coefficient; λ is a scale factor; t is the current iteration number; and T is the maximum iteration number.
In a second aspect, a method for predicting risk of a ant lion algorithm optimized NARX neural network comprises the following steps:
step A1, obtaining loan data required by modeling from customer data as a customer loan data sample, performing labeling processing on each customer in the customer loan data sample to obtain corresponding characteristic data of the customer, marking the risk level of the customer according to the repayment record of the customer, extracting the characteristic data in the customer loan data sample according to characteristic classification and associating the characteristic data with the risk level of the customer to form a sample data set;
step A2, preprocessing the sample data set and according to the following steps: 3, dividing the ratio into a training set and a test set, and carrying out normalization processing on the training set and the test set;
a3, establishing a NARX neural network prediction model, and training the NARX neural network prediction model by using a training set;
a4, optimizing the NARX neural network prediction model by adopting an improved ant lion algorithm and optimizing the NARX neural network prediction model by utilizing a specified network performance evaluation function;
a5, carrying out prediction performance test on the NARX neural network prediction model by using a test set to obtain prediction performance test data, comparing the prediction performance test data with a genetic algorithm and a particle swarm algorithm, comparing the obtained overdue probability value with a corresponding actual sample according to the fraud probability value of a fraudulent user of the test set, judging the stability of the overdue prediction model and formulating offset;
and A6, acquiring historical behavior characteristic data of the inventory lending user through the NARX neural network prediction model and outputting overdue risk prediction data of the client.
The ant lion algorithm optimized NARX neural network risk prediction method comprises the following steps:
the client loan data sample in the step A1 is 10000 loan data with repayment records and the first loan application time is within 6-12 months;
the client loan data sample comprises multi-dimensional data of a client, wherein the multi-dimensional data comprises client attribute data, client loan data and user platform operation behavior data;
labeling each customer in the customer loan data sample according to a repayment record;
the original data comprises service types and historical behavior characteristic data of clients;
in step a2, the sample data set is preprocessed by feature screening, missing completion, and abnormal value processing.
The ant lion algorithm optimized NARX neural network risk prediction method comprises the following steps:
the step a3 of establishing the NARX neural network prediction model specifically includes the following steps:
step B1: network initialization, namely determining the number of nodes of an input layer, the number of nodes of an output layer and the number of nodes of a hidden layer of an NARX neural network according to the number of labels in a client loan data sample and the number of dimensions influencing overdue risks, determining learning rate and activation functions of neurons, and initializing connection weights, hidden layer offsets and output layer offsets among the output layer, the output layer and the hidden layer;
step B2: calculating hidden layer node output H according to the following formula;
Figure BDA0003197385820000071
wherein HjOutput for the jth hidden layer node;
f () is an activation function of the hidden layer node, and a tanh function is selected;
p is 1,2, …, m is the delay of the external input variable;
q is 1,2, …, n is the delay of the output feedback signal;
w is a connection weight;
Wjpis the connection weight between the jth hidden layer node and the external input variable with the delay step length of p;
Wjqis the connection weight between the jth hidden layer node and the output feedback signal with the delay step length of q;
x (t) is the value of the external input variable at time t;
x (t-p) is a network input delay parameter;
x (t-q) is an external feedback delay parameter;
bjis the offset of the jth hidden layer node;
step B3: calculating the final output y (t +1) of the NARX neural network according to the following formula;
Figure BDA0003197385820000081
where y (t) is the value of the target quantity at time t;
x (t) is the value of the external input variable at time t;
n is the number of input neurons which is the number of features of the input sample;
m is the number of neurons in the hidden layer;
w is a connection weight;
s is the number of hidden layer nodes;
in step a3, a confidence domain method is used to train a weight and an offset of the NARX neural network prediction model using a training set, which specifically includes the following steps:
step C1: setting a region with the maximum displacement as a radius, and searching an optimal point of an objective function in the region;
step C2: if the objective function value is increased, adjusting the area range to continue solving;
if the objective function value is reduced, the iterative computation is continued according to the rule.
The ant lion algorithm optimized NARX neural network risk prediction method comprises the following steps:
in the step a4, a modified ant lion algorithm is adopted to optimize the weight and the offset value of the NARX neural network prediction model, and the method specifically comprises the following steps:
initializing a weight value and an offset value of a NARX neural network, determining a topology structure of the NARX neural network and the number of nodes of each layer according to a training set, wherein the dimension of an individual to be optimized is (n +1) x m, n is the characteristic number of an input sample, namely the number of input neurons, m is the number of hidden layer neurons, the weight value and the offset value of the NARX neural network are used as position vectors of ant lion populations to be coded, the position of each ant lion in the population dimension is determined, the initial size of the population is set to be P, the maximum size is Pmax, each ant lion individual represents a NARX neural network structure, and the expression is as follows:
Figure BDA0003197385820000082
wherein, wijIs represented by being located at [ -1,1 [)]The weight between the ith hidden layer neuron and the jth input neuron;
birepresents being located at [0, 1]]The ith hidden layer nerveAn offset value of the element;
the improved ant lion algorithm is to update an ant walking boundary mode, and the definition formula of the ant walking boundary mode is as follows:
Figure BDA0003197385820000091
wherein gamma is a shrinkage adjustment coefficient; λ is a scale factor; t is the current iteration number; and T is the maximum iteration number.
In a third aspect, an electronic device, comprising: a memory and a processor, the processor and the memory being connected;
the memory is used for storing programs;
the processor calls a program stored in the memory to perform the method of any of the second aspects.
In a fourth aspect, a computer-readable storage medium, wherein a computer program is stored thereon, which, when executed by a computer, performs the method of any of the second aspects.
Compared with the prior art, the invention has the following advantages:
(1) compared with the traditional Recurrent Neural Network (RNN), the NARX neural network can show better effects in the aspects of learning ability, convergence rate, generalization performance, prediction accuracy and the like, meets the accuracy and high efficiency of the prediction requirement of financial overdue risks, and is very suitable for predicting the repayment of customers;
(2) compared with optimization algorithms such as heredity and particle swarm, the ant lion algorithm has relatively good optimization efficiency and convergence precision, and the good exploration performance of the algorithm on a search space is ensured through mechanisms such as the random selection of ant lions, the random walk of ants, the self-adaptive reduction boundary of traps and the like, so that the fast optimization efficiency of the ant lion algorithm is realized;
(3) the ant lion algorithm is improved by adopting the continuous boundary contraction factor, updating the dynamic weight coefficient and adding the counter adjustment factor to adjust the step length, so that the optimization performance and the convergence efficiency of the ant lion algorithm are improved;
(4) the method has the advantages that the overdue risk model of the NARX neural network is optimized based on the improved ant lion algorithm, accuracy and efficiency requirements of the loan risk prediction process of the internet financial platform are met, different loan risk management and control measures can be implemented, suggestion processing measures are generated for high-probability default users, and loan loss is reduced.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
Fig. 1 is a block diagram of a risk prediction system for an ant lion algorithm optimized NARX neural network according to the present invention.
Fig. 2 is a flowchart of the method for ant lion algorithm optimization NARX neural network risk prediction according to the present invention.
The system comprises a user side 100, a server side 200, an information acquisition module 101, a risk prediction initiating module 102, an information processing module 103, a database 104 and a risk prediction module 105.
Detailed Description
Example 1:
a risk prediction system for optimizing a NARX neural network by using an ant lion algorithm comprises a user side 100 and a server side 200, wherein the user side 100 comprises an information acquisition module 101 and a risk prediction initiating module 102, and the server side 200 comprises an information processing module 103, a database 104 and a risk prediction module 105;
the information acquisition module 101 is used for acquiring customer data by a user and integrating the customer data into customer data;
a risk prediction initiating module 102, configured to initiate a risk prediction application request by a user;
the information processing module 103 is configured to acquire and store the client data in the database 104, and also acquire and review the risk prediction application request and generate review information transmitted to the client 100 and the risk prediction module 105;
a database 104 for storing customer data;
the risk prediction module 105 is used for acquiring audit information, acquiring client data in the database 104 according to the audit information, and performing risk prediction on the client data to obtain overdue risk prediction data of the client;
the risk prediction module 105 carries out risk prediction on the client data to obtain client overdue risk prediction data, and the risk prediction module comprises the following steps:
step A1, obtaining loan data required by modeling from customer data as a customer loan data sample, performing labeling processing on each customer in the customer loan data sample to obtain corresponding characteristic data of the customer, marking the risk level of the customer according to the repayment record of the customer, extracting the characteristic data in the customer loan data sample according to characteristic classification and associating the characteristic data with the risk level of the customer to form a sample data set;
step A2, preprocessing the sample data set and according to the following steps: 3, dividing the ratio into a training set and a test set, and carrying out normalization processing on the training set and the test set;
a3, establishing a NARX neural network prediction model, and training the NARX neural network prediction model by using a training set;
a4, optimizing the NARX neural network prediction model by adopting an improved ant lion algorithm and optimizing the NARX neural network prediction model by utilizing a specified network performance evaluation function;
a5, carrying out prediction performance test on the NARX neural network prediction model by using a test set to obtain prediction performance test data, comparing the prediction performance test data with a Genetic Algorithm (GA) and a Particle Swarm Optimization (PSO), comparing the obtained overdue probability value with a corresponding actual sample according to the fraud probability value of a fraud user of the test set, judging the stability of the overdue prediction model and formulating offset;
the method can also be used as a comparison algorithm for optimizing the NARX neural network parameters with an ant colony Algorithm (ACO), an ant lion Algorithm (ALO) and an improved ant lion algorithm (IALO) so as to verify the superior performance of the ant lion algorithm for optimizing the NARX neural network parameters;
genetic Algorithm (GA) parameters were set as: the population number N is 20, the crossing rate pc is 0.8, and the variation rate pm is 0.15;
particle swarm algorithm (PSO) parameters were set to: the number of particles N is 20, the update speed c1 is c2 is 2, and the weight w is 0.6;
other parameters of the ant colony Algorithm (ACO) are set as: pheromone increasing strength Q is equal to 1, pheromone volatilization coefficient Rho is equal to 0.8, and ant crawling speed V is equal to 0.3;
ant lion Algorithm (ALO) and modified ant lion algorithm (IALO) parameters: the population number N of ants and ant lions is 10, the lower bound lb is 0.01, and the upper bound ub is 100;
the maximum iteration number T of the 4 algorithms is 150;
and A6, acquiring historical behavior characteristic data of a loan user through the NARX neural network prediction model, outputting overdue risk prediction data of the client, implementing different loan risk control measures, generating suggestion processing measures for the user with high probability of default, reducing loan loss, and deploying the NARX neural network prediction model to a loan platform for client overdue risk prediction.
According to the Mean Square Error (MSE), the Mean Absolute Percentage Error (MAPE), the Mean Absolute Error (MAE) and the fitting degree coefficient (EC) of the predicted overdue amount and the actual overdue amount as evaluation indexes, calculation formulas are respectively as follows:
Figure BDA0003197385820000121
Figure BDA0003197385820000122
Figure BDA0003197385820000123
Figure BDA0003197385820000124
wherein n is the number of predicted samples, y'iAs a result of prediction of the corresponding model, yiActually outputting a result for the sample;
in the evaluation indexes, the smaller the values of MSE, MAPE and MAE are, the smaller the prediction error is, and the better the prediction performance of the corresponding model is; the closer the EC value is to 1, the higher the fitting degree between the predicted value and the true value is, and the more similar evolution trend exists between the predicted value and the true value, as shown in the following Table 1:
Figure BDA0003197385820000131
TABLE 1
MSE, MAPE and MAE values of the IALO-NARX model are lower than those of a reference model, and fitting degree coefficient EC values of the IALO-NARX model are higher than those of other models, so that the improved ant lion algorithm optimized NARX neural network has smaller prediction error and higher fitting degree.
The ant lion algorithm optimized NARX neural network risk prediction system comprises:
in the step A1, a client loan data sample is 10000 loan data with first loan application time within 6-12 months and repayment records, and the observation period and the presentation period of a good loan are determined in advance according to loan product business;
the client loan data sample contains multi-dimensional data of a client, and the multi-dimensional data comprises client attribute data, client loan data and user platform operation behavior data;
customer attribute data includes income, consumption, occupation, city, gender, age;
the client loan data comprises loan frequency, loan times, loan amount, overdue rate, repayment volume, historical repayment records, repayment period due, repayment period number and/or loan multi-head quantity;
the user platform operation behavior data includes but is not limited to information such as the number of times of logging in the platform, the number of clicks of a webpage/website, the frequency of clicks and the like;
obtaining the historical behavior characteristic derivative variables of the client loan according to the client loan data, wherein the derivative variables comprise information which is acquired by data processing, such as historical default times, historical default amount, default time interval, historical application times, application amount, refused times, proportion of loan account age to loan deadline, fund inflow amount in the current term, difference value between the current inflow amount and repayment amount, and the like;
the borrowing information comprises but is not limited to the amount of borrowing, interest rate of borrowing, term of borrowing, time interval between the borrowing of the time and the last time of borrowing, borrowing application, historical borrowing times, overdue times, borrowing times on other platforms, amount of borrowing, overdue times and the like; labeling each client in the client loan data samples according to the repayment record, namely, establishing a client label according to the client historical credit condition corresponding to the client loan data samples, taking the sample data of one client as an example, if the overdue days of the first repayment is less than or equal to 30 days, defining the client as a good client, namely a high-quality client, and using 0 to represent the good client; in contrast, if the number of overdue days for the first payment is more than 30 days, the client is defined as a bad client, namely, a client needing important monitoring is represented by 1;
setting the quantity and the proportion of the positive samples and the negative samples, judging whether the proportion of the positive samples and the negative samples meets the set proportion, and further enabling the proportion of the positive samples and the negative samples to meet the set proportion through oversampling or undersampling so as to realize the balance of the sample data;
the original data comprises service types and historical behavior characteristic data of clients;
the historical behavior features are obtained from behavior time sequence features, the time sequence data is an index value sequence obtained by sequencing actual index values of a user in a preset time period according to time sequence, and comprises click browsing data, browsing duration data, browsing range data, a user information index, a position index and an equipment index, and is generally extracted based on data collected by buried point operation, wherein access behavior information comprises behavior data of the user on a specific website (including but not limited to a traditional Web website and a webpage accessed from a mobile Application (APP)), such as operation and browsing situations (including information of various operation details, operation time, operation position, IP address and the like) in the preset time period (for example, the last week, one month, three months, half years, one year and the like), and is recorded by the service end 200;
extracting time-series behavior features by using an attention mechanism, extracting depth and time-series features by using an LSTM network, and encoding user behavior data by using the LSTM network to obtain processed user behavior data, wherein the encoded user behavior data comprises encoded user behavior data at 5 moments of t1, t2, t3, t4 and t5, and the encoded user behavior data is divided into behavior features of 3 dimensions of s1, s2 and s 3; directly taking the field of time as a continuous value characteristic, and counting the difference value between the current time and the user registration time, the birthday time, the loan time, the consumption time and the browsing time; or the other is discretization processing construction characteristics, and every 10 days is taken as an interval;
in addition, under the requirement of compliance, the method is not limited to acquiring three-party credit investigation data, and comprises the following steps: inquiring risk fraud class, big data rating class, multi-head loan class, online time and state class of a mobile phone, judicial information class, industrial and commercial information class, telecommunication consumption record class and the like of an applicant;
a2, preprocessing the sample data set by feature screening, missing completion and abnormal value processing;
feature screening, which analyzes features in the information base, specifically, statistical indicators including maximum, minimum, median, mean, variance, abnormal values, missing values, etc., checks the distribution of the data, and for example, visually and unambiguously identifies the abnormal values in the data and the degree of dispersion of the data through a box plot; by checking the maximum value, the minimum value and the average value, the data authenticity of partial information of the case can be determined, the information about data stability can be provided through variance, characteristic variables which are fixed constants are removed, cases containing missing values or abnormal values can be deleted or replaced by reasonable numerical values (average values), and whether the attribute can be used as an independent variable in the modeling process or not can be simply judged through observation of the indexes;
for deletion completion, there are three common methods for processing the deletion value:
(1) deletion values are directly deleted, but the premise is that the proportion of the deletion samples is small and the deletion samples occur randomly, so that the analysis result is not greatly influenced after the deletion values are deleted;
(2) the missing value is replaced, the processing is simple, the sample information is not reduced, and the deviation is generated when the missing value does not randomly appear;
(3) the multiple interpolation method predicts missing data through the relation between variables, generates a plurality of complete data sets by using a Monte Carlo random simulation method, analyzes the data sets respectively, and finally summarizes the analysis results;
normalization processing for scaling data to be fixed in a specific area and normalizing the data, mapping the data to an interval having a start value and an end value of 0 and 1, respectively, or processing the data by a logarithmic operation, the normalization expression being:
Figure BDA0003197385820000161
wherein, XnormIs normalized data, X is the original data of the sample, XmaxAnd XminRespectively, the maximum and minimum values in the original data set.
The ant lion algorithm optimized NARX neural network risk prediction system comprises:
the method for establishing the NARX neural network prediction model in the step A3 specifically comprises the following steps:
step B1: network initialization, namely determining the number of nodes of an input layer, the number of nodes of an output layer and the number of nodes of a hidden layer of an NARX neural network according to the number of labels in a client loan data sample and the number of dimensions influencing overdue risks, determining learning rate and activation functions of neurons, and initializing connection weights, hidden layer offsets and output layer offsets among the output layer, the output layer and the hidden layer;
step B2: calculating hidden layer node output H according to the following formula;
Figure BDA0003197385820000162
wherein HjOutput for the jth hidden layer node;
f () is an activation function of the hidden layer node, and a tanh function is selected;
p is 1,2, …, m is the delay of the external input variable;
q is 1,2, …, n is the delay of the output feedback signal;
w is a connection weight;
Wjpis the connection weight between the jth hidden layer node and the external input variable with the delay step length of p;
Wjqis the connection weight between the jth hidden layer node and the output feedback signal with the delay step length of q;
x (t) is the value of the external input variable at time t;
x (t-p) is a network input delay parameter;
x (t-q) is an external feedback delay parameter;
bjis the offset of the jth hidden layer node;
step B3: calculating the final output y (t +1) of the NARX neural network according to the following formula;
Figure BDA0003197385820000163
where y (t) is the value of the target quantity at time t;
x (t) is the value of the external input variable at time t;
n is the number of input neurons which is the number of features of the input sample;
m is the number of neurons in the hidden layer;
w is a connection weight;
s is the number of hidden layer nodes;
the structure of the NARX neural network is greatly different from that of the traditional feedforward neural network in that the NARX neural network is added with delay quantity of input and feedback output at the same time, so that the values of the current moment and the previous moments can be considered simultaneously during calculation, and the internal weight coefficient of the neural network is adjusted by continuously learning the nonlinear relation between a target and the input, so that the estimated value of the target quantity is calculated, and the purpose of prediction is achieved;
in step a3, training the weights and offset values of the NARX neural network prediction model using the training set by using the confidence domain method specifically includes the following steps:
step C1: setting a region with the maximum displacement as a radius, and searching an optimal point of an objective function in the region;
step C2: if the objective function value is increased, adjusting the area range to continue solving;
if the objective function value is reduced, iteration calculation is continued according to the rule, the algorithm can reduce the network scale on the premise of ensuring the network fitting precision, so that the network complexity is reduced to obtain good generalization performance, and the method has the advantages of less iteration times, high convergence speed, high precision and the like;
like BP, RBF neural network, NARX neural network weight and deviant are apt to be influenced by initial value, apt to get into the local optimum, at present genetic algorithm and particle swarm algorithm and weight and deviant in optimizing NARX neural network model commonly used, although get better effects, still have disadvantages such as the computational complexity is high, parameter is sensitive;
the learning training of the NARX neural network adjusts the weight and the offset, a generally adopted Levenberg-Marquards algorithm belongs to a trust domain method, the algorithm is the combination of a gradient descent method and a Newton method, and Jacobian iteration is used for guiding the weight adjustment;
the NARX (Nonlinear Auto-Regression with External input) neural network is called as a Nonlinear autoregressive network with External input, is a dynamic feedforward neural network, the output result of the NARX neural network depends on the current input and the past output result, and due to the existence of delay feedback, the NARX neural network has a memory function on historical state information and can well reflect the time-varying characteristic of a time sequence;
the NARX neural network mainly comprises an input layer, a hidden layer, an output layer and an input and output delay layer, wherein nodes of the input layer are used for inputting signals, nodes of the delay layer are used for delaying the time of the input signals and the output feedback signals, the hidden layer nodes perform nonlinear operation on the delayed signals by using an activation function, and the output layer nodes are used for performing linear weighting on the hidden layer output to obtain final network output; the generalization performance of the network can be reduced by using too many neurons and delay orders, so that the operation time of the network is increased, and therefore, even if the number of the hidden neurons and the number of the hidden layers are in direct proportion to the nonlinear fitting capacity of the model, in actual operation, a scheme with relatively less time consumption can be selected on the premise of ensuring the prediction quality;
the NARX neural network introduces 2 time series at the same time, uses the historical value of the predicted time series y (t) and the historical value of another time series x (t) to predict the future value of the time series y (t), and the prediction in the form is called non-linear autoregressive with external input.
The above-mentioned unexpired risk prediction system of NARX neural network is optimized based on ant lion algorithm, wherein:
in the step a4, a modified ant lion algorithm is adopted to optimize the weight and the offset value of the NARX neural network prediction model, and the method specifically comprises the following steps:
initializing a weight value and an offset value of a NARX neural network, determining a topology structure of the NARX neural network and the number of nodes of each layer according to a training set, wherein the dimension of an individual to be optimized is (n +1) x m, n is the characteristic number of an input sample, namely the number of input neurons, m is the number of hidden layer neurons, the weight value and the offset value of the NARX neural network are used as position vectors of ant lion populations to be coded, the position of each ant lion in the population dimension is determined, the initial size of the population is set to be P, the maximum size is Pmax, each ant lion individual represents a NARX neural network structure, and the expression is as follows:
Figure BDA0003197385820000181
wherein, wijIs represented by being located at [ -1,1 [)]The weight between the ith hidden layer neuron and the jth input neuron;
birepresents being located at [0, 1]]The offset value of the ith hidden layer neuron in between;
the improved ant lion algorithm is to update an ant walking boundary mode, and the definition formula of the ant walking boundary mode is as follows:
Figure BDA0003197385820000191
wherein gamma is a shrinkage adjustment coefficient; λ is a scale factor; t is the current iteration number; and T is the maximum iteration number.
Ant Lion Optimization (ALO) is a novel meta-heuristic swarm intelligence algorithm for simulating Ant Lion predation behavior in nature, and the Ant Lion algorithm comprises ants, Ant lions and elite Ant lions; ants represent random solutions, ant lions represent local optimal solutions, elite lions represent global optimal solutions, ants move around and finally fall into traps of which ant lions to be selected through a roulette strategy, and ant lions with higher fitness have higher chance of capturing ants; with the random walk of ants each time, the adaptability value of the ant lion is continuously updated, and the ant lion with the best adaptability is selected as the elite ant lion; the ant lion algorithm has relatively good optimizing efficiency and convergence accuracy, and the good exploration performance of the algorithm on a search space is ensured through the mechanisms of the random selection of the ant lion, the random walk of ants, the self-adaptive reduction boundary of traps and the like, so that the fast optimizing efficiency of the ant lion algorithm is realized;
initializing a population, randomly initializing populations of ants and ant lions, setting the population number of the ants and the ant lions as N, the rate of the ants escaping as Pesc, the maximum number of the escaping ants as Nant _ esc, the convergence offset as epsilon, the current iteration as T, and the maximum number of iterations as T;
the ant lion hunting mainly comprises five steps: ants randomly walk, ant lion construct traps, ant lion trap ants trapped in the traps, ant lion prey preys on preys, and ant lion rebuild traps;
ants walk randomly, and the mathematical expression of random walk is as follows:
x(t)=[0,cumsum(2r(t1)-1),cumsum(2r(t2)-1),…,cumsum(2r(tn)-1)];
wherein, cumsum is the accumulation of the migratory position of ants;
n is the set maximum iteration number;
t is the number of steps of walking, namely the current iteration number;
r (t) represents a random function defined as follows:
Figure BDA0003197385820000201
wherein t represents the number of random walk steps, namely the number of iterations;
rand is random numbers uniformly distributed and generated in the interval of [0,1 ];
in the optimization process, each step of the ant updates the position of the ant according to random walk, however, the search space of the ant has boundary limitation and is prevented from being out of range, and the position calculation formula of the ith dimension variable of the ant at the t iteration is as follows:
Figure BDA0003197385820000202
wherein, aiAnd diRespectively representing the minimum value and the maximum value of the ith variable random walk;
Figure BDA0003197385820000203
and
Figure BDA0003197385820000204
respectively representing the minimum value and the maximum value of the ith random walk of the variable during the tth iteration;
the ant lion constructs a trap, the ant lion algorithm simulates a hunting process of the ant lion, the root-mean-square error is used as an ant lion fitness value f, and the calculation formula is as follows:
Figure BDA0003197385820000205
wherein, YkIs the actual value of sample k;
Okthe predicted value of the sample k is k, and k is 1,2, … n is the number of training samples;
ants enter the trap, the boundary of the area where the ants walk is influenced by the positions of the ant lions, the random walk of the ants is influenced by the ant lions trap, and the ants move in a hypersphere around the selected ant lions, and the formula is as follows:
Figure BDA0003197385820000206
wherein, ctIs the minimum of all variables in the t iteration;
dtrepresents the maximum value containing all variables in the t-th iteration;
Figure BDA0003197385820000207
is the position of the ith lion in the t iteration;
ants fall into the trap center, once the ant lion realizes that the ants enter the trap, in order to prevent the ants from escaping, the ant lion digs sand outwards, and slides off the ants trying to escape, so that the ants slide to the trap center. At this time, the radius of the hyper-sphere where the ants walk randomly is reduced adaptively, which is expressed mathematically as:
Figure BDA0003197385820000211
Figure BDA0003197385820000212
wherein I is a boundary contraction factor; t is the current iteration; t is the maximum number of iterations; w is a constant defined based on T and T, and takes an integer value between [2, 6 ].
And the values of c and d are reduced in a self-adaptive manner as the iteration times are increased, so that the convergence speed is effectively improved, and the optimal solution is obtained.
The ant lion rebuilds the trap, the ant lion catches the ant when the ant reaches the trap center, the ant has a better target position than the ant lion at the moment, the ant lion needs to be updated to the position for catching the ant, the ant is used as the next generation ant lion to build the trap at the position of the ant lion, and therefore the opportunity for catching a new prey is increased. The formula for ant lion updating the position is as follows:
Figure BDA0003197385820000213
wherein t is the current iteration; antt iRepresenting the position of the ith ant at the t iteration;
the ant lion elite, ant lion algorithm determines the position of ants through roulette selection and random walk, and the formula is as follows:
Figure BDA0003197385820000214
wherein, Antt iRepresenting the position of the ith dimension ant of the t iteration;
Rt Ais an ant randomly wandering around the ant lion selected in the roulette in the t-th iteration;
Rt Eants that randomly walked around the elite ant lion in the t iteration;
the ant lion algorithm improvement method comprises the following steps:
s1: a continuity boundary contraction factor;
in the original ant lion algorithm, in the stage that ants fall into the trap center and walk around the trap, the boundary, namely the search range, of the ants is gradually reduced to develop the local optimal value of the trap, but the change of the boundary contraction factor I is discontinuous, the search boundary is uneven and slowly attenuates, the search optimization solution range is smaller and smaller due to the discontinuous increase of the I, so that the convergence speed is low and the ants easily sink into the local extreme value;
in order to solve the above problems, in order to enhance the traversability of the ant lion algorithm and improve the optimization performance and the convergence efficiency of the ant lion algorithm, the patent provides a boundary contraction factor method which is rapidly and continuously increased along with the iterative evolution of the algorithm, and defines the updating mode of the ant walking boundary as follows:
Figure BDA0003197385820000221
wherein gamma is a shrinkage adjustment coefficient; λ is a scale factor; through multiple benchmark function optimization experiments, selecting gamma as 400 and lambda as 20; t and T are respectively the current iteration times and the maximum iteration times;
s2: updating the dynamic weight coefficient by the position;
aiming at the problem that the ant lion position is easy to fall into the local optimum in the later iteration stage of the original ant lion algorithm ant lion reconstruction trap, the patent provides a mixed variation method based on the dynamic adjustment of normal distribution and Cauchy distribution, and the ant lion position Antliont jPerforming variant perturbation operation, wherein the expression is as follows:
Figure BDA0003197385820000222
wherein, Antliont+1 jThe ant lion position of the t +1 th generation; eta is an adjustment coefficient; c (0, 1) is a variation factor obeying Cauchy distribution; n (0, 1) is a variation factor which follows normal distribution;
weight coefficient w for selecting ant lion by roulette1Larger in the early stage of iteration to enable ants to search for a better area in a search space, and in the later stage, the elite ant lion is close to the optimal area, and the weight coefficient w of the elite ant lion is2Gradually increasing the number of the ants to develop in the neighborhood of the optimal area so as to improve the balance capability of the algorithm between global exploration and local development;
s3: adding a counter-regulation factor to regulate the step length;
aiming at the fact that the original ant lion algorithm is ant lion elite, in the whole optimization process, the ant lion is used for searching a nearby optimal solution in a better solution place, but the optimization step is fixed, in the later iteration stage, a local search space is reduced, the step is large and is easy to vibrate, the convergence speed is reduced, and an optimal value area is easy to miss, a counter regulation factor T/(100elg (T)) is added into the moving step of the ant lion, the step in the earlier stage of the algorithm is large, the counter regulation factor is gradually reduced along with the increase of the iteration, the step in the later stage is smaller and smaller, the global optimal value can be found, and the counter regulation factor regulation step expression is added as follows:
Figure BDA0003197385820000231
wherein R ist EAnts that randomly walked around the elite ant lion in the t iteration; t and T are respectively the current iteration times and the maximum iteration times; rand () is at [0, 1]]The generated random numbers are uniformly distributed in the interval;
s4: establishing an ALO-NARX neural network overdue risk prediction model;
decoding the global optimal solution of the improved ant lion algorithm, training the global optimal solution as an initial weight and an initial offset of the NARX neural network, establishing an ALO-NARX neural network overdue risk prediction model, judging whether a condition for NARX network training ending is met, if so, ending the training to obtain an optimal network structure, and inputting test set sample data for prediction; if not, the ant is transferred to move randomly.
Example 2:
the method for predicting the risk of the ant lion algorithm optimized NARX neural network comprises the steps of A1-A6.
An electronic device, comprising: the processor is connected with the memory;
the memory is used for storing programs;
the processor calls a program stored in the memory to perform any of the methods described above.
A computer-readable storage medium, in which a computer program is stored which, when executed by a computer, performs the method of any of the above.
As the customer loan data sample, the customer loan historical behavior characteristic derivative variable data may be, but is not limited to, a Personal Computer (PC), a tablet computer, a Mobile Internet Device (MID), and other devices.
It should be noted that the processor, memory, and other components that may be present in an electronic device are electrically connected to each other, directly or indirectly, to enable the transfer or interaction of data. For example, the processor, memory, and other components that may be present may be electrically coupled to each other via one or more communication buses or signal lines.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.
In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. The above-described system embodiments are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a notebook computer, a server, a mobile phone, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above embodiments are only preferred embodiments of the present invention, and are not intended to limit the present invention, and all simple modifications, changes and equivalent structural changes made to the above embodiments according to the technical spirit of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims (10)

1. The ant lion algorithm optimized NARX neural network risk prediction system is characterized by comprising a user side and a server side, wherein the user side comprises an information acquisition module and a risk prediction initiating module, and the server side comprises an information processing module, a database and a risk prediction module;
the information acquisition module is used for acquiring customer data by a user and integrating the customer data into customer data;
the risk prediction initiating module is used for initiating a risk prediction application request by a user;
the information processing module is used for acquiring client data, storing the client data in a database, acquiring and auditing a risk prediction application request and generating auditing information transmitted to the client and the risk prediction module;
the database is used for storing customer data;
the risk prediction module is used for acquiring audit information, acquiring client data in the database according to the audit information, and performing risk prediction on the client data to obtain client overdue risk prediction data;
the risk prediction module carries out risk prediction on the client data to obtain client overdue risk prediction data and comprises the following steps:
step A1, obtaining loan data required by modeling from customer data as a customer loan data sample, performing labeling processing on each customer in the customer loan data sample to obtain corresponding characteristic data of the customer, marking the risk level of the customer according to the repayment record of the customer, extracting the characteristic data in the customer loan data sample according to characteristic classification and associating the characteristic data with the risk level of the customer to form a sample data set;
step A2, preprocessing the sample data set and according to the following steps: 3, dividing the ratio into a training set and a test set, and carrying out normalization processing on the training set and the test set;
a3, establishing a NARX neural network prediction model, and training the NARX neural network prediction model by using a training set;
a4, optimizing the NARX neural network prediction model by adopting an improved ant lion algorithm and optimizing the NARX neural network prediction model by utilizing a specified network performance evaluation function;
a5, carrying out prediction performance test on the NARX neural network prediction model by using a test set to obtain prediction performance test data, comparing the obtained overdue probability value with a corresponding actual sample according to the fraud probability value of a fraud user, judging the stability of the overdue prediction model and formulating offset;
and A6, acquiring historical behavior characteristic data of the inventory lending user through the NARX neural network prediction model and outputting overdue risk prediction data of the client.
2. The ant lion algorithm optimized NARX neural network risk prediction system of claim 1, wherein:
the client loan data sample in the step A1 is 10000 loan data with repayment records and the first loan application time is within 6-12 months;
the client loan data sample comprises multi-dimensional data of a client, wherein the multi-dimensional data comprises client attribute data, client loan data and user platform operation behavior data;
labeling each customer in the customer loan data sample according to a repayment record;
the original data comprises service types and historical behavior characteristic data of clients;
in step a2, the sample data set is preprocessed by feature screening, missing completion, and abnormal value processing.
3. The ant-lion algorithm optimized NARX neural network risk prediction system of claim 1 or 2, wherein:
the step a3 of establishing the NARX neural network prediction model specifically includes the following steps:
step B1: network initialization, namely determining the number of nodes of an input layer, the number of nodes of an output layer and the number of nodes of a hidden layer of an NARX neural network according to the number of labels in a client loan data sample and the number of dimensions influencing overdue risks, determining learning rate and activation functions of neurons, and initializing connection weights, hidden layer offsets and output layer offsets among the output layer, the output layer and the hidden layer;
step B2: calculating hidden layer node output H according to the following formula;
Figure FDA0003197385810000021
wherein HjOutput for the jth hidden layer node;
f () is an activation function of the hidden layer node, and a tanh function is selected;
p is 1,2, …, m is the delay of the external input variable;
q is 1,2, …, n is the delay of the output feedback signal;
w is a connection weight;
Wjpis the connection weight between the jth hidden layer node and the external input variable with the delay step length of p;
Wjqis the connection weight between the jth hidden layer node and the output feedback signal with the delay step length of q;
x (t) is the value of the external input variable at time t;
x (t-p) is a network input delay parameter;
x (t-q) is an external feedback delay parameter;
bjis the offset of the jth hidden layer node;
step B3: calculating the final output y (t +1) of the NARX neural network according to the following formula;
Figure FDA0003197385810000031
where y (t) is the value of the target quantity at time t;
x (t) is the value of the external input variable at time t;
n is the number of input neurons which is the number of features of the input sample;
m is the number of neurons in the hidden layer;
w is a connection weight;
s is the number of hidden layer nodes;
in step a3, a confidence domain method is used to train a weight and an offset of the NARX neural network prediction model using a training set, which specifically includes the following steps:
step C1: setting a region with the maximum displacement as a radius, and searching an optimal point of an objective function in the region;
step C2: if the objective function value is increased, adjusting the area range to continue solving;
if the objective function value is reduced, the iterative computation is continued according to the rule.
4. The ant lion algorithm optimized NARX neural network risk prediction system of claim 3, wherein:
in the step a4, a modified ant lion algorithm is adopted to optimize the weight and the offset value of the NARX neural network prediction model, and the method specifically comprises the following steps:
initializing NARXDetermining a topological structure of the NARX neural network and the number of nodes of each layer according to a training set, wherein the dimension of an individual to be optimized is (n +1) x m, n is the characteristic number of an input sample, namely the number of input neurons, m is the number of hidden layer neurons, the weight and the offset of the NARX neural network are used as position vectors of ant lion populations to be coded, the position of each ant lion in the population dimension is determined, the initial scale of the population is set to be P, and the maximum scale is Pmax,
Figure FDA0003197385810000041
for each individual ant lion and representing a NARX neural network structure, the expression is as follows:
Figure FDA0003197385810000042
wherein, wijIs represented by being located at [ -1,1 [)]The weight between the ith hidden layer neuron and the jth input neuron;
birepresents being located at [0, 1]]The offset value of the ith hidden layer neuron in between;
the improved ant lion algorithm is to update an ant walking boundary mode, and the definition formula of the ant walking boundary mode is as follows:
Figure FDA0003197385810000043
wherein gamma is a shrinkage adjustment coefficient; λ is a scale factor; t is the current iteration number; and T is the maximum iteration number.
5. A method for predicting risk of optimization NARX neural network by ant lion algorithm is characterized by comprising the following steps:
step A1, obtaining loan data required by modeling from customer data as a customer loan data sample, performing labeling processing on each customer in the customer loan data sample to obtain corresponding characteristic data of the customer, marking the risk level of the customer according to the repayment record of the customer, extracting the characteristic data in the customer loan data sample according to characteristic classification and associating the characteristic data with the risk level of the customer to form a sample data set;
step A2, preprocessing the sample data set and according to the following steps: 3, dividing the ratio into a training set and a test set, and carrying out normalization processing on the training set and the test set;
a3, establishing a NARX neural network prediction model, and training the NARX neural network prediction model by using a training set;
a4, optimizing the NARX neural network prediction model by adopting an improved ant lion algorithm and optimizing the NARX neural network prediction model by utilizing a specified network performance evaluation function;
a5, carrying out prediction performance test on the NARX neural network prediction model by using a test set to obtain prediction performance test data, comparing the obtained overdue probability value with a corresponding actual sample according to the fraud probability value of a fraud user, judging the stability of the overdue prediction model and formulating offset;
and A6, acquiring historical behavior characteristic data of the inventory lending user through the NARX neural network prediction model and outputting overdue risk prediction data of the client.
6. The ant lion algorithm optimized NARX neural network risk prediction method of claim 5, wherein:
the client loan data sample in the step A1 is 10000 loan data with repayment records and the first loan application time is within 6-12 months;
the client loan data sample comprises multi-dimensional data of a client, wherein the multi-dimensional data comprises client attribute data, client loan data and user platform operation behavior data;
labeling each customer in the customer loan data sample according to a repayment record;
the original data comprises service types and historical behavior characteristic data of clients;
in step a2, the sample data set is preprocessed by feature screening, missing completion, and abnormal value processing.
7. The ant lion algorithm optimized NARX neural network risk prediction method of claim 5 or 6, wherein:
the step a3 of establishing the NARX neural network prediction model specifically includes the following steps:
step B1: network initialization, namely determining the number of nodes of an input layer, the number of nodes of an output layer and the number of nodes of a hidden layer of an NARX neural network according to the number of labels in a client loan data sample and the number of dimensions influencing overdue risks, determining learning rate and activation functions of neurons, and initializing connection weights, hidden layer offsets and output layer offsets among the output layer, the output layer and the hidden layer;
step B2: calculating hidden layer node output H according to the following formula;
Figure FDA0003197385810000061
wherein HjOutput for the jth hidden layer node;
f () is an activation function of the hidden layer node, and a tanh function is selected;
p is 1,2, …, m is the delay of the external input variable;
q is 1,2, …, n is the delay of the output feedback signal;
w is a connection weight;
Wjpis the connection weight between the jth hidden layer node and the external input variable with the delay step length of p;
Wjqis the connection weight between the jth hidden layer node and the output feedback signal with the delay step length of q;
x (t) is the value of the external input variable at time t;
x (t-p) is a network input delay parameter;
x (t-q) is an external feedback delay parameter;
bjis the offset of the jth hidden layer node;
step B3: calculating the final output y (t +1) of the NARX neural network according to the following formula;
Figure FDA0003197385810000062
where y (t) is the value of the target quantity at time t;
x (t) is the value of the external input variable at time t;
n is the number of input neurons which is the number of features of the input sample;
m is the number of neurons in the hidden layer;
w is a connection weight;
s is the number of hidden layer nodes;
in step a3, a confidence domain method is used to train a weight and an offset of the NARX neural network prediction model using a training set, which specifically includes the following steps:
step C1: setting a region with the maximum displacement as a radius, and searching an optimal point of an objective function in the region;
step C2: if the objective function value is increased, adjusting the area range to continue solving;
if the objective function value is reduced, the iterative computation is continued according to the rule.
8. The ant lion algorithm optimized NARX neural network risk prediction method of claim 7, wherein:
in the step a4, a modified ant lion algorithm is adopted to optimize the weight and the offset value of the NARX neural network prediction model, and the method specifically comprises the following steps:
initializing a weight value and an offset value of a NARX neural network, determining a topology structure of the NARX neural network and the number of nodes of each layer according to a training set, wherein the dimension of an individual to be optimized is (n +1) x m, n is the characteristic number of an input sample, namely the number of input neurons, m is the number of hidden layer neurons, the weight value and the offset value of the NARX neural network are used as position vectors of ant lion populations to be coded, the position of each ant lion in the population dimension is determined, the initial size of the population is set to be P, the maximum size is Pmax, each ant lion individual represents a NARX neural network structure, and the expression is as follows:
Figure FDA0003197385810000071
wherein, wijIs represented by being located at [ -1,1 [)]The weight between the ith hidden layer neuron and the jth input neuron;
birepresents being located at [0, 1]]The offset value of the ith hidden layer neuron in between;
the improved ant lion algorithm is to update an ant walking boundary mode, and the definition formula of the ant walking boundary mode is as follows:
Figure FDA0003197385810000072
wherein gamma is a shrinkage adjustment coefficient; λ is a scale factor; t is the current iteration number; and T is the maximum iteration number.
9. An electronic device, comprising: a memory and a processor, the processor and the memory being connected;
the memory is used for storing programs;
the processor calls a program stored in the memory to perform the method of any of claims 5-8.
10. A computer-readable storage medium, on which a computer program is stored which, when executed by a computer, performs the method of any one of claims 5-8.
CN202110894613.4A 2021-08-05 2021-08-05 Ant-lion algorithm optimized NARX neural network risk prediction system and method Withdrawn CN113657990A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110894613.4A CN113657990A (en) 2021-08-05 2021-08-05 Ant-lion algorithm optimized NARX neural network risk prediction system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110894613.4A CN113657990A (en) 2021-08-05 2021-08-05 Ant-lion algorithm optimized NARX neural network risk prediction system and method

Publications (1)

Publication Number Publication Date
CN113657990A true CN113657990A (en) 2021-11-16

Family

ID=78478403

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110894613.4A Withdrawn CN113657990A (en) 2021-08-05 2021-08-05 Ant-lion algorithm optimized NARX neural network risk prediction system and method

Country Status (1)

Country Link
CN (1) CN113657990A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037311A (en) * 2021-11-17 2022-02-11 北京中百信信息技术股份有限公司 Information system engineering supervision project risk assessment method
CN115514581A (en) * 2022-11-16 2022-12-23 国家工业信息安全发展研究中心 Data analysis method and equipment for industrial internet data security platform
CN117972530A (en) * 2024-03-28 2024-05-03 北京大数据先进技术研究院 Ant lion optimization-based missing unbalanced data multi-classification method and equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037311A (en) * 2021-11-17 2022-02-11 北京中百信信息技术股份有限公司 Information system engineering supervision project risk assessment method
CN115514581A (en) * 2022-11-16 2022-12-23 国家工业信息安全发展研究中心 Data analysis method and equipment for industrial internet data security platform
CN115514581B (en) * 2022-11-16 2023-04-07 国家工业信息安全发展研究中心 Data analysis method and equipment for industrial internet data security platform
CN117972530A (en) * 2024-03-28 2024-05-03 北京大数据先进技术研究院 Ant lion optimization-based missing unbalanced data multi-classification method and equipment
CN117972530B (en) * 2024-03-28 2024-06-11 北京大数据先进技术研究院 Ant lion optimization-based missing unbalanced data multi-classification method and equipment

Similar Documents

Publication Publication Date Title
CN109977151B (en) Data analysis method and system
Meiri et al. Using simulated annealing to optimize the feature selection problem in marketing applications
US20190311428A1 (en) Credit risk and default prediction by smart agents
CN113657990A (en) Ant-lion algorithm optimized NARX neural network risk prediction system and method
Liu et al. Predicting housing price in China based on long short-term memory incorporating modified genetic algorithm
US20200265511A1 (en) Micro-Loan System
CN107977864A (en) A kind of customer insight method and system suitable for financial scenario
Xue et al. Incremental multiple kernel extreme learning machine and its application in Robo-advisors
Shukla et al. Comparative analysis of ml algorithms & stream lit web application
Eddy et al. Credit scoring models: Techniques and issues
Cheng et al. A Seasonal Time‐Series Model Based on Gene Expression Programming for Predicting Financial Distress
CN114819967A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN116503158A (en) Enterprise bankruptcy risk early warning method, system and device based on data driving
Sawant et al. Study of Data Mining Techniques used for Financial Data Analysis
CN111178986A (en) User-commodity preference prediction method and system
CN114693409A (en) Product matching method, device, computer equipment, storage medium and program product
CN116993490B (en) Automatic bank scene processing method and system based on artificial intelligence
Son et al. Forecasting global stock market volatility: The impact of volatility spillover index in spatial‐temporal graph‐based model
CN117709446A (en) Method for constructing dynamic financial credit risk model based on rule engine
CN112508689A (en) Method for realizing decision evaluation based on multiple dimensions
Wu et al. Customer churn prediction for commercial banks using customer-value-weighted machine learning models
Dash et al. Designing an efficient predictor model using PSNN and crow search based optimization technique for gold price prediction
CN114170000A (en) Credit card user risk category identification method, device, computer equipment and medium
US11004156B2 (en) Method and system for predicting and indexing probability of financial stress
CN112991025A (en) Intelligent insurance recommendation method, system and equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20211116

WW01 Invention patent application withdrawn after publication