CN115049121A

CN115049121A - Bank customer loss risk prediction model generation method, device, equipment and medium

Info

Publication number: CN115049121A
Application number: CN202210638850.9A
Authority: CN
Inventors: 刘锴靖
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-06-07
Filing date: 2022-06-07
Publication date: 2022-09-13

Abstract

The application relates to the technical field of artificial intelligence, and provides a bank customer attrition risk prediction model generation method, a device, equipment and a medium, wherein the method comprises the following steps: acquiring data retained by a bank customer; preprocessing the data retained by the bank customer to obtain characteristic data related to customer churn; and training the XGboost model to be trained based on the characteristic data, the label corresponding to the characteristic data and the preset model parameters to obtain the bank customer loss risk prediction model. Because the XGboost model is a machine learning model and the training set adopted by the method is the characteristic data related to customer loss, the model generated by the embodiment of the application can effectively predict the risk of bank customer loss through training.

Description

Bank customer attrition risk prediction model generation method, device, equipment and medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a bank customer loss risk prediction model generation method, device, equipment and medium.

Background

For banks, customer churn is inevitable for a variety of reasons. However, if the client can be found in time when the client tends to lose, the client can be retained through various methods, so that the client loss rate is reduced. Therefore, it is important to effectively predict the risk of churn for customers. However, in the banking field, there is no tool capable of effectively predicting the customer attrition risk.

Disclosure of Invention

In view of the above technical problems, an object of the present application is to provide a method, an apparatus, a device and a medium for generating a bank customer loss risk prediction model, so as to solve the technical problem that no tool is available at present to effectively predict the bank customer loss risk.

In order to solve the foregoing technical problem, in a first aspect, an embodiment of the present application provides a method for generating a bank customer churn risk prediction model, including:

acquiring data retained by a bank client;

preprocessing the data retained by the bank customer to obtain characteristic data related to customer churn;

training an XGboost model to be trained based on the feature data, the label corresponding to the feature data and preset model parameters to obtain the bank customer loss risk prediction model; and the label corresponding to the characteristic data indicates whether the customer corresponding to the characteristic data is lost.

Further, the customer retention data includes a plurality of attribute data, the attributes including a customer ID, a customer name, a customer credit score, a gender, an age, a region, a deposit/loan status, whether there is a credit card, a financial product purchase/use amount, whether it is an active user, an estimated income, a length of time to use a financial product, and whether it has been lost.

Further, the preprocessing the data retained by the bank customer to obtain characteristic data related to customer churn includes:

and sequentially carrying out data cleaning processing, data conversion processing and feature screening processing on the data retained by the bank customer to obtain the feature data related to customer loss.

Further, the sequentially performing data cleaning, data conversion and feature screening on the data retained by the bank customer to obtain the feature data related to customer loss includes:

deleting data irrelevant to customer attrition in the data retained by the bank customer to obtain first characteristic data;

processing the abnormal value in the first characteristic data to obtain second characteristic data;

converting the second characteristic data into an input format conforming to the XGboost to obtain third characteristic data;

discretizing the third characteristic data to obtain fourth characteristic data;

and screening out the feature data with weak correlation from the fourth feature data to obtain the feature data related to the customer churn.

Further, the characteristics associated with customer churn include: customer credit score, gender, age, region, deposit/loan status, whether there is a credit card, amount of financial product purchase/use, whether it is an active user, estimated income, and length of time to use a financial product.

Further, training an XGboost model to be trained based on the feature data, the label corresponding to the feature data, and a preset model parameter to obtain the bank customer churn risk prediction model includes:

training the XGBoost model to be trained based on the feature data, the labels corresponding to the feature data and preset model parameters, and adjusting the model parameters according to a preset parameter adjusting strategy in the training process to obtain the bank customer loss risk prediction model.

Further, the parameter adjustment strategy is any one of the following parameter adjustment strategies:

the first parameter adjusting strategy is as follows: only adjusting the number of trees;

and a second parameter adjusting strategy: only adjusting the depth of the tree;

and a third parameter adjustment strategy: adjusting the number and depth of trees simultaneously;

and a fourth parameter adjustment strategy: adjusting only the learning rate;

and a fifth parameter adjustment strategy: simultaneously adjusting the number of trees and the learning rate;

and a sixth parameter adjustment strategy: modulating only the line sample bit rate;

and a seventh parameter adjustment strategy: only the column sample bit rate is adjusted.

In a second aspect, an embodiment of the present application provides a bank customer churn risk prediction model generation apparatus, including:

the acquisition module is used for acquiring data reserved by bank customers;

the preprocessing module is used for preprocessing the data retained by the bank customer to obtain characteristic data related to customer loss;

the training module is used for training an XGboost model to be trained based on the feature data, the label corresponding to the feature data and preset model parameters to obtain the bank customer loss risk prediction model; and the label corresponding to the characteristic data indicates whether the customer corresponding to the characteristic data is lost.

In a third aspect, an embodiment of the present application provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of any one of the above methods when executing the computer program.

In a fourth aspect, the present application provides a computer readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the method described in any one of the above.

The bank customer loss risk prediction model generation method provided by the embodiment of the application comprises the following steps: acquiring bank customer retained data, and preprocessing the bank customer retained data to obtain characteristic data related to customer loss; and training the XGboost model to be trained based on the characteristic data, the label corresponding to the characteristic data and the preset model parameters to obtain the bank customer loss risk prediction model. Because the XGboost model adopted by the embodiment of the application is a machine learning model and the training set adopted by the embodiment of the application is the characteristic data related to the loss of the customer, the model generated by the embodiment of the application can effectively predict the loss risk of the customer in the bank through training. In addition, the XGboost has the advantages of good speed effect when large-scale data are processed and low requirements for hardware resources such as a memory, so that the loss risk prediction model for the bank client generated by the embodiment of the application can be used for rapidly and accurately predicting the loss risk of the bank client.

Drawings

In order to more clearly illustrate the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart of a method for generating a bank customer churn risk prediction model according to a first embodiment of the present application;

FIG. 2 is a flow chart of a method for obtaining profile data associated with customer churn as provided in the first embodiment of the present application;

fig. 3 is a block diagram of a bank customer churn risk prediction model generation apparatus according to a second embodiment of the present application;

FIG. 4 is a block diagram of a preprocessing module provided in a second embodiment of the present application;

fig. 5 is a schematic block diagram of a computer device according to a third embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The first embodiment is as follows:

a first embodiment of the present application provides a bank customer churn risk prediction model generation method, which can be executed by a computer device, where the computer device can be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices.

Referring to fig. 1, a method for generating a bank customer churn risk prediction model according to an embodiment of the present application includes steps S1-S3:

s1, acquiring data reserved by the bank customer;

s2, preprocessing the data retained by the bank customer to obtain characteristic data related to customer churn;

s3, training an XGBoost model to be trained based on the feature data, the label corresponding to the feature data and preset model parameters to obtain the bank customer attrition risk prediction model; and the label corresponding to the characteristic data indicates whether the customer corresponding to the characteristic data is lost.

As mentioned in step S1, it should be noted that the bank customer retention data is generally obtained from a banking system, and the bank customer retention data includes various attribute data, generally, as shown in table 1, including the following attributes: customer ID, customer name, customer credit score, gender, age, region, deposit/loan status, whether there is a credit card, number of financial product purchases/uses, whether it is an active user, estimated income, length of time to use a financial product, whether it has been lost.

Table 1 bank customer retained data

As mentioned in step S2, it should be noted that, since the present application aims to train a model capable of predicting the risk of customer churn, and not all attributes in the data retained by the bank customer belong to the factor of customer churn, in order to ensure the accuracy of the model, it is necessary to extract attributes related to customer churn from the data retained by the bank customer as features to train the model.

As mentioned in step S3, xgboost (extreme gradient boosting) is mainly used to solve the supervised learning problem, and such problem utilizes the training data x containing multiple features _i To predict the target variable y _i . Model parameters for XGboost include: the number of trees, the depth of the trees, the learning rate, the row sampling ratio, and the column sampling ratio. To solve the multi-classification problem, XGboost provides two kinds of penalty functions, one is multi: softmax function, another is multi: softprob function. multi: softmax is the classification result generated after using softmax, while multi softprob is the probability matrix of the output. In the embodiment of the present application, the object of the present application can be achieved by using any one of the above loss functions, and preferably, multi: softprob function, by multi: the softprob function may output a customer churn risk level.

It should also be noted that XGboost has the following advantages: (1) simple and easy to use: compared with other machine learning libraries, the XGboost can be easily used by a user through the XGboost, and a good effect can be obtained; (2) high-efficiency and expandable: the method has the advantages of high speed and good effect when processing large-scale data sets, and low requirements on hardware resources such as memories and the like; (3) the robustness is strong: and the approximate effect can be achieved without fine parameter adjustment relative to a deep learning model.

Because the XGboost model adopted by the embodiment of the application is a machine learning model and the training set adopted by the embodiment of the application is the characteristic data related to the loss of the customer, the model generated by the embodiment of the application can effectively predict the loss risk of the customer in the bank through training. In addition, the XGBoost has the advantages of good speed effect when large-scale data are processed and low requirements for hardware resources such as a memory, so that the loss risk prediction model for the bank client generated by the embodiment of the application can quickly and accurately predict the loss risk of the bank client.

In one embodiment, the preprocessing the data retained by the bank customer to obtain characteristic data related to customer churn includes:

and sequentially carrying out data cleaning processing, data conversion processing and characteristic screening processing on the data retained by the bank customer to obtain the characteristic data related to customer attrition.

In the embodiment of the present application, it should be understood that data cleansing refers to processing a data source before constructing a data warehouse and implementing data mining, so that accuracy, integrity, consistency, timeliness and effectiveness of data are implemented to adapt to a process of subsequent operations. From the perspective of improving data quality, data cleaning is a process of processing data to ensure that the data has better quality, namely a process of obtaining clean data.

In the embodiment of the present application, it should be noted that, since the data format after data cleaning may not be the same as the data format required to be input by the XGboost model, the data after data cleaning needs to be converted into the data format conforming to the data format required to be input by the XGboost model.

In the embodiment of the present application, it should be further noted that, for the purpose of training a model with a specific function, if the feature engineering is good, what algorithm is selected later is not different greatly, and on the contrary, no matter what algorithm is selected, the effect is not improved in a breakthrough manner. Therefore, feature selection is particularly important.

Referring to fig. 2, in an embodiment, the sequentially performing data cleansing processing, data conversion processing, and feature screening processing on the data retained by the bank customer to obtain the feature data related to customer churn includes:

s21, deleting data irrelevant to customer loss in the data retained by the bank customer to obtain first characteristic data;

s22, processing the abnormal value in the first characteristic data to obtain second characteristic data;

s23, converting the second characteristic data into an input format conforming to the XGboost to obtain third characteristic data;

s24, discretizing the third feature data to obtain fourth feature data;

and S25, screening the feature data with weak correlation from the fourth feature data to obtain the feature data related to the customer churn.

As mentioned in step S21, it should be noted that some attribute data in the bank customer retained data is not related to customer churn, taking the example that the bank customer retained data includes the following attribute data: customer ID, customer name, customer credit score, gender, age, region, deposit/loan status, whether there is a credit card, number of financial product purchases/uses, whether it is an active user, estimated income, length of time financial product is used, whether it has been lost. Obviously, the customer ID and customer name are not factors of customer churn, i.e. are not related to customer churn, so it is first necessary to delete these data that are not related to customer churn from the data retained by the bank customer.

As mentioned above in step S22, it should be noted that, since there may be abnormal values in the data retained by the bank customer, for example, in the example shown in table 1, the data listed in the gender null, the credit null and the nonexclusive list are all considered as abnormal data. In addition, data with an excessive age distribution is not meaningful for bank customer churn risk prediction due to excessive product quantity purchases caused by activities and from a market perspective, and therefore, the data can be considered to be abnormal values. In order to obtain high-quality data, it is necessary to process these abnormal values, and there are various methods for processing the abnormal values, which may be deleted, supplemented, or processed by other methods, and the present application is not limited to this.

As described in step S23, it should be noted that, since the data subjected to the abnormal value processing may not have the same format as the data required to be input by the XGboost model, the data subjected to the abnormal value processing needs to be converted into the data format conforming to the data required to be input by the XGboost model. Taking the example shown in table 1 as an example, Gender, Area, and Loan deposit/Loan status are character-type variables and cannot be analyzed. The conversion is performed using a conversion value toolkit. After treatment under the conditions of Gender [ 1,2 ], Area [ 100,101,102, … ], Loan [ 201,202 ]

As the step S24 described above, it should be noted that, since there is a risk of overfitting the data of the excessively continuous features, in order to increase the iteration speed and have strong robustness to the abnormal data, the continuous features need to be discretized. Taking the example shown in Table 1 as an example, the two variables CreditScore and Age have abnormal values, and here, they are discretized, the credit score is divided into 5 groups of 600 or less, 600 + 650, 650 + 700, 700 + 750 and 750, and the Age is divided into 8 groups of 20 or less, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80 and 80 or more, so as to obtain the data distribution.

As mentioned in step S25, it should be noted that when there is strong correlation between features, if the two features are used together, the information redundancy will be caused, so we should consider rejecting the strong correlation between the variables and taking the feature with weaker correlation into account as the feature related to the customer churn. From thermodynamic diagrams we can know which features are strongly correlated and which features are weakly correlated.

In one embodiment, the characteristics associated with customer churn include: customer credit score, gender, age, region, deposit/loan status, whether there is a credit card, amount of financial product purchase/use, whether it is an active user, estimated income, and length of time to use a financial product.

In the embodiment of the present application, it should be noted that the characteristics related to customer churn are obtained after a data cleaning process, a data conversion process and a characteristic screening process are performed on the basis of table 1. From the foregoing, for the purpose of training a model with a specific function, if the feature engineering works well, what algorithm is selected later is not very different, and on the contrary, no matter what algorithm is selected, the effect is not improved in a breakthrough way. Therefore, feature selection is particularly important. Through verification, the characteristics selected by the embodiment of the application can effectively predict the loss risk of bank customers.

In one embodiment, the training of the XGboost model to be trained based on the feature data, the label corresponding to the feature data, and the preset model parameter to obtain the bank customer churn risk prediction model includes:

In one embodiment, the tuning policy is any one of the following policies:

and a fourth parameter adjustment strategy: adjusting only the learning rate;

In the embodiment of the present application, it should be noted that the parameter adjustment aims to obtain a better prediction effect, and the parameter adjustment may be performed automatically or manually.

The following provides an example of parameter adjustment, which can be performed according to specific conditions

Example two:

referring to fig. 3, an embodiment of the present application provides a bank customer churn risk prediction model generation apparatus, including:

the acquisition module 1 is used for acquiring data reserved by bank customers;

the preprocessing module 2 is used for preprocessing the data retained by the bank customer to obtain characteristic data related to customer loss;

the training module 3 is used for training an XGboost model to be trained based on the feature data, the label corresponding to the feature data and preset model parameters to obtain the bank customer loss risk prediction model; and the label corresponding to the characteristic data represents whether the customer corresponding to the characteristic data is lost.

As the obtaining module 1, it should be noted that the bank customer retention data is generally obtained from a banking system, and the bank customer retention data includes a plurality of attribute data, and generally, as shown in table 1, includes the following attributes: customer ID, customer name, customer credit score, gender, age, region, deposit/loan status, whether there is a credit card, number of financial product purchases/uses, whether it is an active user, estimated income, length of time to use a financial product, whether it has been lost.

As the preprocessing module 2 mentioned above, it should be noted that, since the purpose of the present application is to train a model capable of predicting the risk of customer churn, and not all attributes in the data retained by the bank customer belong to the factor of customer churn, in order to ensure the accuracy of the model, it is necessary to extract attributes related to customer churn from the data retained by the bank customer as features to train the model.

As the training module 3, it should be noted that xgboost (extremegradient boosting) is mainly used to solve the supervised learning problem, and such problem utilizes a training number containing a plurality of featuresAccording to x _i To predict the target variable y _i . Model parameters for XGboost include: the number of trees, the depth of the trees, the learning rate, the row sampling rate, and the column sampling rate. To solve the multi-classification problem, XGboost provides two kinds of penalty functions, one is multi: softmax function, another is multi: softprob function. multi: softmax is the classification result generated after using softmax, while multi softprob is the probability matrix of the output. In the embodiment of the present application, the object of the present application can be achieved by using any one of the above loss functions, and preferably, multi: softprob function, by multi: the softprob function may output the customer churn risk level.

It should also be noted that XGboost has the following advantages: (1) simple and easy to use: compared with other machine learning libraries, the XGboost can be easily used by a user through the XGboost, and a good effect can be obtained; (2) high-efficiency and expandable: the method has the advantages of high speed and good effect when processing a large-scale data set, and low requirements on hardware resources such as a memory and the like; (3) the robustness is strong: and the approximate effect can be achieved without fine parameter adjustment relative to a deep learning model.

In an embodiment, the preprocessing module is specifically configured to perform data cleaning, data conversion, and feature screening on data retained by the bank customer in sequence to obtain the feature data related to customer churn.

In the embodiment of the present application, it should be understood that data cleansing refers to processing a data source before constructing a data warehouse and implementing data mining, so as to implement accuracy, integrity, consistency, timeliness and validity of data, so as to adapt to the process of subsequent operations. From the perspective of improving data quality, data cleaning is a process of processing data to ensure that the data has better quality, namely a process of obtaining clean data.

Referring to fig. 4, in one embodiment, the preprocessing module 2 includes:

the deleting unit 21 is configured to delete data, which is irrelevant to customer churn, in the data retained by the bank customer to obtain first feature data;

an abnormal value processing unit 22, configured to process an abnormal value in the first feature data to obtain second feature data;

a data conversion unit 23, configured to convert the second feature data into an input format conforming to the XGboost to obtain third feature data;

a discretization unit 24, configured to perform discretization processing on the third feature data to obtain fourth feature data;

and a screening unit 25, configured to screen out feature data with weak correlation from the fourth feature data, so as to obtain the feature data related to customer churn.

As for the deletion unit in the above step, it should be noted that some attribute data in the bank customer retained data is irrelevant to customer churn, and the bank customer retained data includes the following attribute data as an example: customer ID, customer name, customer credit score, gender, age, region, deposit/loan status, whether there is a credit card, number of financial product purchases/uses, whether it is an active user, estimated income, length of time to use a financial product, whether it has been lost. Obviously, the customer ID and customer name are not factors of customer churn, i.e. are not related to customer churn, so it is first necessary to delete these data that are not related to customer churn from the data retained by the bank customer.

As the above abnormal value processing unit, it should be noted that, since there may be abnormal values in the data retained by the bank client, for example, in the example shown in table 1, the data listed in the gender null, the credit null, and the nonexclusive list are all considered as abnormal data. In addition, data with an excessive age distribution is not meaningful for bank customer churn risk prediction due to excessive product quantity purchases caused by activities and from a market perspective, and therefore, the data can be considered to be abnormal values. In order to obtain high-quality data, it is necessary to process these abnormal values, and there are various methods for processing the abnormal values, which may be deleted, supplemented, or processed by other methods, and the present application is not limited to this.

As described above, it should be noted that, since the data subjected to the abnormal value processing may not have the same format as the data required to be input by the XGboost model, the data subjected to the abnormal value processing needs to be converted into the data format conforming to the data required to be input by the XGboost model. Taking the example shown in table 1 as an example, Gender, Area, and Loan deposit/Loan status are character-type variables and cannot be analyzed. The conversion is performed using a conversion value toolkit. After treatment under the conditions of Gender [ 1,2 ], Area [ 100,101,102, … ], Loan [ 201,202 ]

As the above discrete units, it should be noted that, since there is a risk of overfitting data of the excessively continuous features, in order to increase the iteration speed and have strong robustness to abnormal data, the continuous features need to be discretized. Taking the example shown in Table 1 as an example, the two variables CreditScore and Age have abnormal values, and here, they are discretized, the credit score is divided into 5 groups of 600 or less, 600 + 650, 650 + 700, 700 + 750 and 750, and the Age is divided into 8 groups of 20 or less, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80 and 80 or more, so as to obtain the data distribution.

As the screening unit mentioned above, it should be noted that when strong correlation occurs between features, if the two features are used simultaneously, redundancy of information may be caused, so we should consider rejecting the feature with strong correlation and including the feature with weak correlation as the feature related to the customer churn. From thermodynamic diagrams we can know which features are strongly correlated and which features are weakly correlated.

In an embodiment, the training module is specifically configured to train the XGboost model to be trained based on the feature data, the label corresponding to the feature data, and a preset model parameter, and adjust the model parameter according to a preset parameter adjustment policy in a training process to obtain the bank customer churn risk prediction model.

In the embodiment of the present application, it should be noted that the model parameters include: the number of trees, the depth of the trees, the learning rate, the row sampling rate, and the column sampling rate.

In the embodiment of the present application, it should be noted that the loss function adopted by the model is

In one embodiment, the tuning policy is any one of the following policies:

and a fourth parameter adjustment strategy: adjusting only the learning rate;

and a seventh parameter adjusting strategy: only the column sample bit rate is adjusted.

Example three:

referring to fig. 5, an embodiment of the present application further provides a computer device, where the computer device may be a server, and an internal structure of the computer device may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer equipment is used for storing data suitable for a bank customer loss risk prediction model generation method and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. When executed by a processor, the computer program realizes a bank customer churn risk prediction model generation method, which comprises the following steps: acquiring data retained by a bank client; preprocessing the data retained by the bank customer to obtain characteristic data related to customer loss; training an XGboost model to be trained based on the feature data, the label corresponding to the feature data and preset model parameters to obtain the bank customer loss risk prediction model; and the label corresponding to the characteristic data indicates whether the customer corresponding to the characteristic data is lost.

Example four:

an embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for generating a bank customer churn risk prediction model, and the method includes: acquiring data retained by a bank client; preprocessing the data retained by the bank customer to obtain characteristic data related to customer loss; training an XGboost model to be trained based on the feature data, the label corresponding to the feature data and preset model parameters to obtain the bank customer loss risk prediction model; and the label corresponding to the characteristic data indicates whether the customer corresponding to the characteristic data is lost.

The XGboost model adopted by the embodiment of the application is a machine learning model, and the training set adopted by the embodiment of the application is the characteristic data related to the loss of the customer, so that the model generated by the embodiment of the application can effectively predict the loss risk of the customer of the bank through training. In addition, the XGBoost has the advantages of good speed effect when large-scale data are processed and low requirements for hardware resources such as a memory, so that the loss risk prediction model for the bank client generated by the embodiment of the application can quickly and accurately predict the loss risk of the bank client.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all the equivalent structures or equivalent processes that can be directly or indirectly applied to other related technical fields by using the contents of the specification and the drawings of the present application are also included in the scope of the present application.

Claims

1. A bank customer churn risk prediction model generation method is characterized by comprising the following steps:

acquiring data retained by a bank client;

preprocessing the data retained by the bank customer to obtain characteristic data related to customer loss;

2. The method of generating a bank customer attrition risk prediction model according to claim 1 wherein the customer retention data includes a plurality of attribute data, the attributes including customer ID, customer name, customer credit score, gender, age, region, deposit/loan status, whether there is a credit card, number of financial product purchases/uses, whether there are active customers, estimated income, length of time a financial product is used, and whether it has been attrited.

3. The method for generating a bank customer attrition risk prediction model according to claim 1 wherein the preprocessing of the bank customer retention data to obtain characteristic data related to customer attrition comprises:

4. The bank customer attrition risk prediction model generation method according to claim 3, wherein the sequentially performing data cleaning processing, data conversion processing and feature screening processing on the data retained by the bank customer to obtain the feature data related to customer attrition comprises:

deleting data irrelevant to customer loss in the data retained by the bank customer to obtain first characteristic data;

processing abnormal values in the first characteristic data to obtain second characteristic data;

5. The bank customer attrition risk prediction model generation method of claim 2 wherein the characteristics relating to customer attrition include: customer credit score, gender, age, region, deposit/loan status, whether there is a credit card, amount of financial product purchase/use, whether it is an active user, estimated income, and length of time to use a financial product.

6. The method for generating a bank customer churn risk prediction model according to claim 1, wherein training an XGboost model to be trained based on the feature data, a label corresponding to the feature data, and preset model parameters to obtain the bank customer churn risk prediction model comprises:

7. The method for generating a bank customer churn risk prediction model according to claim 6, wherein the parameter adjustment policy is any one of the following parameter adjustment policies:

and a third parameter adjusting strategy: adjusting the number and depth of trees simultaneously;

and a fourth parameter adjustment strategy: adjusting only the learning rate;

and sixth parameter adjustment strategy: modulating only the line sample bit rate;

8. A bank customer attrition risk prediction model generation device is characterized by comprising:

the acquisition module is used for acquiring data reserved by the bank customer;

the training module is used for training an XGboost model to be trained based on the feature data, the label corresponding to the feature data and preset model parameters to obtain the bank customer loss risk prediction model; and the label corresponding to the characteristic data represents whether the customer corresponding to the characteristic data is lost.

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.