CN117408742A

CN117408742A - User screening method and system

Info

Publication number: CN117408742A
Application number: CN202311730351.3A
Authority: CN
Inventors: 赵俊; 邓日晓; 杨志
Original assignee: Hunan Sanxiang Bank Co Ltd
Current assignee: Hunan Sanxiang Bank Co Ltd
Priority date: 2023-12-15
Filing date: 2023-12-15
Publication date: 2024-01-16
Anticipated expiration: 2043-12-15
Also published as: CN117408742B

Abstract

The invention relates to the field of data processing, in particular to a user screening method and a system, wherein the method comprises the following steps: firstly, grading users according to user asset values, screening users with negative growth of assets to obtain user amplitude reduction lists, obtaining a first user asset amplitude reduction distribution table and a second user asset amplitude reduction distribution table according to the user amplitude reduction lists of different months, setting a threshold value, screening out lost users based on the found optimal retention rate, and predicting future lost users by a training model. The invention finds the proper asset amplitude reduction distribution table by comparing the asset amplitude reduction amplitude of the users in each asset scale level in different months, realizes finding the optimal saving rate, determines the target asset amplitude reduction, and divides the lost users.

Description

User screening method and system

Technical Field

The invention relates to the field of data processing, in particular to a user screening method and system.

Background

As financial systems have evolved in depth, commercial banks are facing a number of challenges, including low customer loyalty and low product purchase rates. These problems directly affect the business development and market competitiveness of the bank, and therefore, early warning and retrieval of users who may run off have become important tasks of commercial banks.

The current technical means is mainly a user loss early warning method based on manual experience establishment rules. In recent years, some methods have introduced machine learning algorithms to improve the accuracy of the early warning.

However, the input model of these methods still relies on manually defined churn user criteria, and these rules based on manual experience have some drawbacks and disadvantages. For example, these methods typically rank users based on fixed feature weights and thresholds, which are often set manually or based on historical data statistics, that do not reflect real-time changes in user behavior and personalized differences. Secondly, most of these methods construct early warning models based on single or simple combined classification algorithms, and cannot fully utilize the advantages and complementarity among various algorithms. Finally, these methods are typically based on a single or periodic batch process to update the pre-alarm model, which is often time consuming and inefficient and does not respond in time to market changes and user demands. These imperfections can result in data input to the model that does not conform to the actual distribution, making model recognition less effective or less robust. Therefore, a method for dynamically adjusting according to the real-time change and personalized difference of the user behavior and integrating the advantages and complementarity among various classification algorithms to accurately update the early warning model is urgently needed.

Disclosure of Invention

Therefore, the invention provides a user screening method and a system, which solve the problem of insufficient standard precision of the division loss users in the prior art.

To achieve the above object, an aspect of the present invention provides a user screening method, including:

classifying the users into a plurality of asset scale classes according to the user asset values stored in the database;

screening lost users and non-lost users in the ith month;

selecting users with negative increase of the assets in the i month compared with the i-1 month in each asset scale level, and sequencing the users according to the amplitude reduction amplitude from large to small to obtain a user amplitude reduction list of the i month;

determining a first user asset amplitude reduction distribution table according to the user amplitude reduction list of the month i;

repeating the steps, and determining a second user asset amplitude reduction distribution table according to the user amplitude reduction list in the month i-1;

setting a threshold value;

determining a lost user in the current asset scale level according to the relation between the difference value of the first user asset amplitude reduction distribution table and the second user asset amplitude reduction distribution table and the threshold value;

and predicting a future churn user based on the churn user.

Further, the first user asset degradation profile determined from the user degradation list within each of the asset scale levels comprises: the average asset amplitude reduction of the first X% of users in each user amplitude reduction list is calculated respectively, and a first user asset amplitude reduction distribution table under each asset scale level is obtained, wherein the first user asset amplitude reduction distribution table comprises a plurality of average asset amplitude reduction and user percentages corresponding to the average asset amplitude reduction, the first user asset amplitude reduction distribution table is obtained according to user asset data of the ith month and the ith-1 month, and a certain average asset amplitude reduction in the first user asset amplitude reduction distribution table is obtained by screening target asset loss of the first ith month user and non-user in the asset scale level.

Further, obtaining the second user asset reduction profile includes: and selecting users with negative asset growth in the i-1 month and the i-2 month in each asset scale level, sequencing the users according to the amplitude reduction amplitude from large to small, acquiring a user amplitude reduction list in each asset scale level, and calculating the average asset amplitude reduction of X% of users before amplitude reduction in the user amplitude reduction list.

Further, determining a churn user within the current asset scale level from a relationship of a difference of the first user asset degradation profile and the second user asset degradation profile to the threshold comprises:

if the difference value between the first user asset amplitude reduction distribution table and the second user asset amplitude reduction distribution table is larger than a set threshold value, updating i to i-1, repeatedly executing all steps after screening is started, and if the difference value is smaller than or equal to the set threshold value, executing downwards;

finding an optimal saving X% from the first user asset reduction profile;

the optimal saving rate X% is brought into a first user asset amplitude reduction distribution table generated for the first time in the asset scale level, and the average asset amplitude reduction of the user corresponding to the optimal saving rate X% in the first user asset amplitude reduction distribution table generated for the first time is searched, wherein the average asset amplitude reduction is target asset amplitude reduction;

and in the user amplitude reduction list in each asset scale level in the first i month, if the actual asset amplitude reduction of the user is larger than the target asset amplitude reduction, judging that the user is a lost user in the current asset scale level.

Further, after determining the attrition user, the method further comprises: and if the actual asset amplitude reduction of the user is smaller than the target asset amplitude reduction, judging that the user is a non-loss user in the current asset scale level.

Further, upon determining a relationship of a difference of the first user asset reduction profile and the second user asset reduction profile to the threshold,

preliminarily setting the user percentage in the first user asset amplitude reduction distribution table to be 1% -100%, wherein average asset amplitude reduction data corresponding to the user percentage are A1-A100 respectively;

preliminarily setting the user percentage in the second user asset amplitude reduction distribution table to be 1% -100%, wherein average asset amplitude reduction data corresponding to the user percentage are respectively B1-B100;

summing A1-A100 to obtain a result C, summing B1-B100 to obtain a result D, obtaining a result E by C-D, dividing E by C to obtain a result F, wherein F is the difference value between the first user asset amplitude reduction distribution table and the second user asset amplitude reduction distribution table, and setting the user percentage according to the quantitative adjustment of the actual number of users;

the threshold is initially set to 50%, and the threshold is adjusted according to accuracy and coverage.

Further, in the process of searching for the optimal retention rate X%, a value group consisting of different retention rate X% values is initially generated, then a new value group is generated by adopting selection, crossover and mutation operations, and the fitness of each individual retention rate X is evaluated by a fitness function based on a user asset reduction distribution table and a preset target asset reduction design, wherein the fitness evaluation standard is the accuracy of predicting future loss users.

Further, predicting future loss users is performed through a trained deep learning model, the trained deep learning model is obtained by training based on the screened loss users as loss labels, the data of the screened loss users and the data of the non-loss users are used as training sets of the deep learning model, the selected deep learning model is input for training, and the trained deep learning model predicts the loss users possibly occurring in the future according to the existing user data.

Further, in the training process of the deep learning model, a gradient descent method is adopted for optimization, and in the training process, an overfitting prevention technology is applied, wherein the overfitting prevention technology comprises early stopping and regularization.

Further, the fitness function comprises an artificial cost constraint function and a convergence speed, wherein the artificial cost constraint function is calculated by adopting a formula (1);

the convergence speed is calculated by adopting a formula (2);

cost of labor constraint function C (t) =n (t) +d (t) (1)

Convergence speed v= (N1-N2)/(t 1-t 2) (2)

Where N (t) represents the number of characters in the user asset data used in the process of selecting the optimal rescue rate, D (t) represents the amount of data in the user asset data used in the process of selecting the optimal rescue rate, N1 is the optimal rescue rate determined at time t1, N2 is the optimal rescue rate determined at time t2, and t is the duration of use of the process of selecting the optimal rescue rate.

In another aspect, the present invention also provides a user screening system, comprising: the user asset classification module is used for classifying the users into a plurality of asset scale grades according to the user asset values stored in the database;

the asset negative growth screening module is used for starting screening lost users and non-lost users in the ith month; selecting users having negative increases in assets compared to i-1 months for i months within each of said asset size scales;

the average asset amplitude reduction calculation module is used for sequencing the users according to the amplitude reduction amplitude from large to small to obtain a user amplitude reduction list of i months; determining a first user asset amplitude reduction distribution table according to the user amplitude reduction list of the month i; repeating the steps, and determining a second user asset amplitude reduction distribution table according to the user amplitude reduction list in the month i-1;

the distribution table comparison module is used for setting a threshold value; determining a lost user in the current asset scale level according to the relation between the difference value of the first user asset amplitude reduction distribution table and the second user asset amplitude reduction distribution table and the threshold value;

the loss user identification module is used for predicting future loss users based on the loss users;

The distribution table comparison module comprises an updating unit, a selecting unit, a searching unit and a judging unit:

if the difference value between the first user asset amplitude reduction distribution table and the second user asset amplitude reduction distribution table is larger than a set threshold value, the updating unit repeatedly executes all steps after the screening is started after i is updated to be i-1, and if the difference value is smaller than or equal to the set threshold value, the updating unit executes the steps downwards;

the selection unit is used for finding the optimal saving rate X% from the first user asset amplitude reduction distribution table;

the searching unit is used for bringing the optimal saving rate X% into a first user asset amplitude reduction distribution table generated for the first time in the asset scale level, searching a corresponding user average asset amplitude reduction of the optimal saving rate X% in the first user asset amplitude reduction distribution table generated for the first time, wherein the average asset amplitude reduction is a target asset amplitude reduction;

in the user amplitude reduction list in each asset scale level of the first i month, if the actual asset amplitude reduction of the user is greater than the target asset amplitude reduction, the determining unit is configured to determine that the user is a loss user in the current asset scale level.

Compared with the prior art, the user screening method and system provided by the invention have the beneficial effects that the accuracy of identifying and dividing the lost user can be remarkably improved, the definition of the lost user can be dynamically adjusted according to the real-time user behavior data, and the real-time change and personalized requirements of the user behavior can be more accurately captured, so that the problem of insufficient accuracy caused by manually defining the lost user standard is avoided.

In particular, the invention optimizes the feature weight of the lost user, simulates natural selection and genetic mechanism, can process large-scale parameter space, and searches for the optimal feature weight and threshold value, thereby finding the optimal lost user dividing mode. The loss user dividing mode builds a more accurate user loss early warning model, breaks through the limit of manually setting weights and thresholds, and improves the dividing precision of the loss users and the accuracy and efficiency of the loss user early warning.

Particularly, the invention can remarkably improve the quality of the early warning model training set, greatly reduce the possibility of deviation of the model from the correct direction, more accurately identify the lost user group, improve the robustness of the model and improve the identification effect.

Particularly, the early warning model can be updated in real time, the market change and the user demand can be responded rapidly, the market change can be captured more rapidly, and the timeliness and the adaptability of the early warning model are improved remarkably.

In summary, the method and the device can dynamically optimize the feature weight and the threshold value, and divide the standard of the lost user more accurately, thereby improving the accuracy of dividing the lost user. Meanwhile, the deep learning model is utilized to predict the lost user, and the early warning model is updated in real time, so that the accuracy, coverage range and instantaneity of the user lost early warning are effectively improved, powerful tools are provided for commercial banks and other financial institutions, the users are helped to better save, the loyalty of the users is improved, and the market competitiveness is enhanced.

Drawings

FIG. 1 is a flow chart of a user screening method provided by an embodiment of the present invention;

FIG. 2 is a flow chart of a user screening method according to another embodiment of the present invention;

fig. 3 is a block diagram of a user screening system according to an embodiment of the present invention.

Detailed Description

In order that the objects and advantages of the invention will become more apparent, the invention will be further described with reference to the following examples; it should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.

It should be noted that, in the description of the present invention, terms such as "upper," "lower," "left," "right," "inner," "outer," and the like indicate directions or positional relationships based on the directions or positional relationships shown in the drawings, which are merely for convenience of description, and do not indicate or imply that the apparatus or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.

Furthermore, it should be noted that, in the description of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those skilled in the art according to the specific circumstances.

Referring to fig. 1, the user screening method provided by the embodiment of the invention includes:

step S100: classifying the users into a plurality of asset scale classes according to the user asset values stored in the database;

step S200: selecting users with negative increase of the assets in the i month compared with the i-1 month in each asset scale level, and sequencing the users according to the amplitude reduction amplitude from large to small to obtain a user amplitude reduction list of the i month;

step S300: determining a first user asset amplitude reduction distribution table according to the user amplitude reduction list of the month i;

step S400: repeating the steps, and determining a second user asset amplitude reduction distribution table according to the user amplitude reduction list in the month i-1;

Step S500: setting a threshold value;

step S600: determining a lost user in the current asset scale level according to the relation between the difference value of the first user asset amplitude reduction distribution table and the second user asset amplitude reduction distribution table and the threshold value;

step S700: and predicting a future churn user based on the churn user.

Specifically, in step S100, the user asset value is first entered into a stage of ranking to categorize the user by asset size. In this operational flow, users are classified at different asset size levels according to their asset values. For example, users with assets under 50 ten thousand may be classified as class a, users with assets between 50 ten thousand and 100 ten thousand are classified as class B, users with assets between 100 ten thousand and 150 ten thousand are classified as class C, and so on, one class is promoted every 50 ten thousand asset values until all users are covered.

Specifically, the method is a grading strategy based on the asset values of the users, and the core aim is to realize classification management of the users and differential analysis of the users based on the asset scale. This approach facilitates the fine management of user assets, thereby improving the efficiency of asset management.

Specifically, in step S200, the operation involves calculating the asset balances of each user in two consecutive months, converting these balances into a form of percent reduction, and sorting the differences in terms of reduction amplitude from large to small, resulting in a list of user reductions. For example, the asset number of the user A in 6 months is 100 ten thousand, the asset number in 7 months is 90 ten thousand, the asset is lowered by 10 ten thousand, and the asset width is lowered by 10%. User A with the asset scale grade A, wherein the asset value amplitude reduction is 10%; user B with the asset scale level A, wherein the asset value amplitude reduction is 70%; user C with the asset scale level A, wherein the asset value amplitude reduction is 60%; then, among the users with asset size class a, the user's list of decreasing amplitude is: the ranking order of the second, the third and the first is in turn.

Specifically, this stage is a filtering based on the absolute asset degradation of the user to determine users whose assets have decreased between month i and month i-1. It aims to identify users who may face financial problems by detecting the slipping down of the user's assets. This is to further differentiate and identify users with greater asset reduction, providing basis for subsequent risk assessment and user management.

Specifically, in step S300, the average asset degradation of the first X% of the users in the user degradation list in each asset scale level is calculated, and the first user asset degradation distribution table in which the assets in the ith month are reduced from those in the ith-1 month is acquired. This is a process of averaging and constructing a distribution table, for example, assuming that the asset reduction percentages of the first 30% of the users with highest reduction are 70%, 60% and 50%, respectively, of the users with asset scale level a, then the average asset reduction for the first 30% of the users is (70% +60% +50%)/3=60%; the asset reduction percentages of the first 40% of users with the highest reduction are respectively 70%, 60%, 50% and 40%, so that the average asset reduction of the first 40% of users is (70% +60% +50% + 40%)/4=55%, and the average asset reduction corresponding to the percentages of the users is respectively calculated by the same way.

Specifically, the function is to better understand the overall condition of the user asset amplitude reduction, and find out a month in which the amplitude reduction change is stable by comparing with the second user asset amplitude reduction distribution table at the back. The method has the advantages that the specific situations of users with different asset scale levels in the aspect of asset reduction can be explored and known finely, and the method is a pre-positioned basis for calculating the optimal retention rate.

Specifically, in step S400, the second user asset amplitude reduction distribution table is an obtained asset amplitude reduction distribution table of users whose assets are reduced in the i-1 th month compared with the i-2 th month, for example, it is assumed that, in the i-1 th month, among the users whose asset scale is a, the average asset amplitude reduction of the users 10% before the amplitude reduction is 70%, the average asset amplitude reduction of the users 20% before the amplitude reduction is 60%, and the average asset amplitude reduction of the users 50% before the amplitude reduction is 50%.

Specifically, the second user asset amplitude reduction distribution table is obtained to obtain historical data, for example, data of 70% of average asset amplitude reduction of the user with the amplitude of 10% before the amplitude reduction, 60% of average asset amplitude reduction of the user with the amplitude of 20% before the amplitude reduction, and 50% of average asset amplitude reduction of the user with the amplitude of 50% before the amplitude reduction can be used as a reference for comparing the difference of the amplitude reduction distribution data with the first user asset amplitude reduction distribution table in the month i, so as to identify the change trend of the asset amplitude reduction, and to help determine the first user asset amplitude reduction distribution table with the amplitude reduction change smaller than the threshold value.

Specifically, in step S500, the preliminary setting of the threshold value is 50%.

Specifically, the threshold is set to constrain the subsequent cycle to find a first user asset degradation profile that does not change much from the previous month degradation.

Specifically, in step S600, the asset amplitude reduction distribution table obtained in step S300 is compared with the asset amplitude reduction distribution table obtained in step S400, if the difference is greater than the set threshold, steps S200 to S600 are repeatedly performed after i is updated to i-1, and if the difference is less than or equal to the set threshold, the steps are performed downward. The optimal saving rate X% is found in the first user asset amplitude reduction distribution table obtained finally. And (3) introducing the found optimal saving rate X% into the user asset amplitude reduction distribution table obtained for the first time in the step S300, and finding the average amplitude reduction corresponding to the X%. And in the user amplitude reduction list in each asset scale level of the initial i month, if the actual amplitude reduction of the user is larger than the average amplitude reduction, judging that the user is a lost user.

Specifically, the strategy can be dynamically adjusted according to the variation trend of the asset amplitude reduction, whether the user asset amplitude reduction has mutation or not is checked, and if the user asset amplitude reduction has mutation, the user screening is needed to be carried out again, and the asset amplitude reduction distribution table is calculated. The optimal retention rate can then be determined in a change that tends to stabilize, and thus the lost users of the initial month can be screened out.

Specifically, the first user asset degradation distribution table determined from the user degradation list within each of the asset scale levels includes: the average asset amplitude reduction of the first X% of users in each user amplitude reduction list is calculated respectively, and a first user asset amplitude reduction distribution table under each asset scale level is obtained, wherein the first user asset amplitude reduction distribution table comprises a plurality of average asset amplitude reduction and user percentages corresponding to the average asset amplitude reduction, the first user asset amplitude reduction distribution table is obtained according to user asset data of the ith month and the ith-1 month, and a certain average asset amplitude reduction in the first user asset amplitude reduction distribution table is obtained by screening target asset loss of the first ith month user and non-user in the asset scale level.

Specifically, in the first X% of users in the user's frame reduction list, X is an integer from 0 to 100, for example, assuming that the asset frame reduction percentage of the first 30% of users with the highest frame reduction in the user frame reduction list with the asset scale level a is 70%, 60%, and 50%, then the average asset frame reduction of the first 30% of users is (70% +60% +50%)/3=60%; the asset degradation percentages of the first 40% of users with the highest degradation are 70%, 60%, 50%, 40%, then the average asset degradation of the first 40% of users is (70% +60% +50% + 40%)/4=55%.

In particular, this is a process of averaging and constructing a distribution table whose function is to better understand the overall condition of user asset degradation. The effect of this can be to explore and learn the specifics of users of different asset size classes in terms of asset curtailment, which is very helpful in formulating specific risk control strategies.

Specifically, obtaining a second user asset degradation profile at each of the asset scale levels includes: and selecting users with negative asset growth in the i-1 month and the i-2 month in each asset scale level, sequencing the users according to the amplitude reduction range from large to small, acquiring a user amplitude reduction list in each asset scale level, and acquiring a second user asset amplitude reduction distribution table under each asset scale level according to the user amplitude reduction list.

Specifically, during the acquisition, it is assumed that, among the users whose asset size class is a, the average asset amplitude reduction of the users 10% before the amplitude reduction is 70%, the average asset amplitude reduction of the users 20% before the amplitude reduction is 60%, and the average asset amplitude reduction of the users 50% before the amplitude reduction is 50% in i-1 month.

Specifically, the obtaining mode of the second user asset amplitude reduction distribution table is the same as that of the first user asset amplitude reduction distribution table, so that the second user asset amplitude reduction distribution table can be compared with the first user asset amplitude reduction distribution table in month i, and therefore the change trend of asset amplitude reduction is identified, and the first user asset amplitude reduction distribution table with amplitude reduction change smaller than a threshold value is determined.

Specifically, determining a churn user within the current asset scale level according to a relationship of a difference value of the first user asset amplitude reduction distribution table and the second user asset amplitude reduction distribution table to the threshold value comprises:

Finding an optimal saving X% from the first user asset reduction profile;

Specifically, when the difference between the first user asset amplitude reduction distribution table and the second user asset amplitude reduction distribution table is smaller than a set threshold, finding an optimal saving rate X% from the first user asset amplitude reduction distribution table, and bringing the optimal saving rate X% into the first user asset amplitude reduction distribution table obtained in step S300, to find an average amplitude reduction corresponding to the user of X% before amplitude reduction, where the amplitude reduction is the target asset amplitude reduction. And in the user amplitude reduction list in each asset scale level of the initial month i, if the actual amplitude reduction of the user is larger than the target asset amplitude reduction, judging that the user is a lost user. For example, using the found optimal saving rate X% of 30%, of the class a users in the initial i month, the average reduction corresponding to the user 30% before the reduction is 40%, then this 40% is the target asset reduction, assuming that the actual reduction of user a is 60%, then user a is determined to be a churn user.

In particular, the comparison strategy used in our method and asset degradation enables more accurate identification and judgment of churn users. For example, in our example, the optimal saving rate X% found is 30%, and the average reduction corresponding to the user 30% before the reduction is 40% among the class a users of the initial i month. We set this 40% as the target asset degradation, and assuming that the actual degradation of user a is 60%, user a is determined to be a churn user. This approach brings many advantages. First, by this strategy we find out more precisely those users who are at risk of churning. This is extremely important for enterprises because the necessary strategies can be taken in advance to save the users, reducing the loss caused by user churn. Second, this approach enables automated and systematic user management. For example, a threshold can be set, and potential lost users can be automatically screened out according to the threshold, so that the working efficiency can be greatly improved, and the manual work load can be reduced. Finally, this approach also provides us with more operating space and flexibility. For example, the threshold value can be adjusted at any time, so that the user can be better managed according to actual conditions and needs, and personalized service is realized. Therefore, the method not only can help us to manage users more effectively and reduce the user loss rate, but also can improve the service quality of us and meet the personalized requirements of users.

Specifically, after determining the attrition user, the method further comprises: and if the actual asset amplitude reduction of the user is smaller than the target asset amplitude reduction, judging that the user is a non-loss user in the current asset scale level.

Specifically, when the target asset is reduced by 40%, assuming that the actual reduction of user b is 20%, user b is determined to be a non-churn user.

In particular, this approach allows us to identify not only potential churn users, but also non-churn users as well. For example, in our model, when the target asset degradation is set to 40%, if the actual degradation for user b is 20%, we can determine that user b is a non-churn user. This approach to determining non-churn users also brings a number of advantages. First, this strategy makes it possible to understand the behaviour and needs of the user more deeply. For those non-churn users, we can further analyze the reasons for their remaining relatively stable assets, which may include our product or quality of service, user loyalty, etc., which are very valuable information. Second, this approach can help us to more accurately perform resource allocation. For non-churn users, we can relatively reduce the investment on them, and put more resources and attention on users who may churn, improving the overall customer retention. Finally, this approach provides a clear representation of the user, helping us to better group users. We can divide users into churn users and non-churn users based on asset degradation of users, thereby more effectively performing refined user management. In general, by determining the lost users and the non-lost users, we can not only effectively identify the users likely to be lost, take the saving measures in advance, but also better understand the demands and behaviors of the non-lost users, optimize the resource allocation, improve the customer retention rate and further improve the service quality.

In particular, upon determining a relationship of a difference value of the first and second user asset reduction profile tables to the threshold value,

Specifically, the user percentage in the user asset amplitude reduction distribution table can be adjusted according to the actual number of users, for example, when the number of users is large, the user percentage can be 0.01% -100%, average asset amplitude reduction data corresponding to the user percentage are respectively A1-A10000, when the number of users is small, the user percentage can be 10% -100%, and average asset amplitude reduction data corresponding to the user percentage are respectively A1-A10. The threshold may be adjusted according to accuracy and coverage, e.g., if the traffic demand is a priority guarantee accuracy, the threshold is decreased to increase the severity of the screening, and if the traffic demand is a priority guarantee coverage, the threshold is increased to increase the ease of the screening. Here we assume that the data of the first user asset reduction profile is: the first 10% user average reduction of 90%, the first 20% user average reduction of 80%, the first 30% user average reduction of 70%, the first 40% user average reduction of 40%, the first 50% user average reduction of 20%, the first 60% user average reduction of 15%, the first 70% user average reduction of 12%, the first 80% user average reduction of 8%, the first 90% user average reduction of 6%, the first 100% user average reduction of 5%, the second user asset reduction distribution table data are the first 10% user average reduction of 80%, the first 20% user average reduction of 75%, the first 30% user average reduction of 70%, the first 40% user average reduction of 30%, the first 50% user average reduction of 20%, the first 60% user average reduction of 15%, the first 70% user average reduction of 12%, the first 80% user average reduction of 8%, the first 90% user average reduction of 6%, and the first 100% user average reduction of 5%. Then the result for C was 346%, for D was 321%, for E was 25%, and for F was 7.78%. The threshold was initially set at 50% and adjusted to 20% in combination with the number of users. And obtaining a result that the difference value between the first user asset amplitude reduction distribution table and the second user asset amplitude reduction distribution table is smaller than a threshold value.

Specifically, this step is mainly to check if there is a mutation, and if so, it is necessary to re-rank the assets and screen the users. If not, finding the optimal retention may be performed downward. One key role of this process is to monitor and understand the overall trend of the user's asset degradation, and how to accommodate these changes. By setting and comparing thresholds, we can accurately assess the magnitude of the change in asset degradation, which reflects to some extent the change in the user's financial condition. Firstly, by summing, differencing and calculating percentages of data in two different user asset amplitude reduction distribution tables, a clear data view can be obtained, and the variation trend of the user asset amplitude reduction is reflected. This is of great importance for understanding user behavior, predicting user churn, and even improving products or services. Second, by dynamically adjusting the threshold, we can more flexibly respond to different business needs. For example, if our business needs are priority to ensure accuracy, we can reduce the threshold appropriately to increase the severity of the screening; if the business requirement is priority coverage, the threshold value can be properly increased, and the screening loose degree is improved. The strategy for dynamically adjusting the threshold value enables us to meet different business requirements while also maintaining screening accuracy.

Specifically, in the process of searching for the optimal retention rate X%, a numerical group consisting of different values of the retention rate X% is initially generated, then a new numerical group is generated by adopting selection, crossover and mutation operations, and the fitness of each individual retention rate X is evaluated by a fitness function based on a user asset degradation distribution table and a preset target asset degradation design, wherein the fitness evaluation standard is the accuracy rate of predicting the future loss users.

Specifically, when using the best-found rescue rate X%, for example, we can randomly select 50 rescue rate values from the range [0%,100% ] as the initial set of values, and the fitness function is the success rate of each rescue rate X% when the user is rescue. Finally, we find that the optimal saving rate X% in the class a users is 30%, and in the class a users in the initial i month, the average reduction corresponding to the users 30% before the reduction is 40%, then the 40% is the target asset reduction, and assuming that the actual reduction of the user a is 60%, then the user a is determined to be the lost user.

Specifically, the first step is to initialize a set of values. This set of values contains different values of the percentage of the saving x%, each of which can be regarded as an individual, and then a fitness function is defined to evaluate the fitness of each percentage of the saving x%, that is to say the higher the percentage of the saving, the higher the fitness. After the beginning of the iteration, in each iteration, the selection operation is first performed according to the fitness of each individual, i.e. the individual with high fitness has a higher chance to be selected. Then, a crossover operation is performed to simulate the genetic process of the organism, and two selected individuals can generate two new individuals. Finally, mutation operation is carried out to simulate the mutation process of organisms and randomly change a certain part of an individual. This iterative process may continue for multiple rounds until a certain stopping criterion is met, e.g., the number of iterations reaches a certain value, or the optimal solution does not improve within a certain number of rounds. The individual with the highest fitness, i.e. the retention X%, can then be selected from the final set of values as the optimal solution.

Specifically, predicting a future loss user is performed through a trained deep learning model, the trained deep learning model is obtained by training based on the screened loss user as a loss label, the screened data of the loss user and the non-loss user are used as a training set of the deep learning model, the selected deep learning model is input for training, and the trained deep learning model predicts the loss user possibly occurring in the future according to the existing user data.

Specifically, we can use a deep learning model called Long Short-Term Memory (LSTM) for training, for example. We first label the data of the lost user and the non-lost user, for example, the lost user label is 1, and the non-lost user label is 0. These annotation data are used as training sets and input into the LSTM model for training. The training set may include various behavioral data of the user, such as asset transition, consumption habits, login frequency, etc., which form a multi-dimensional feature vector.

During the training process, the model may attempt to find an inherent association between these behavioral data and user churn. After training, we get a model that can accept the existing user data as input, predicting whether it will run off.

For example, we have a user whose feature vector is [0.2, 0.5, 0.3, 0.1] (representing his asset transition, consumption habits, login frequency, etc.). We can input this feature vector into the trained LSTM model, which will output a value, such as 0.8. This value can be interpreted as a probability of user churn. If this value exceeds a threshold value set by us (e.g. 0.5), we can mark this user as a potentially lost user, thus taking a corresponding policy to save him.

In particular, this way of predicting attrition users provides an accurate and scientific way to understand the attrition situation of users and to develop an effective rescue strategy for users who may be attrited. The method provides a deep learning training based on precisely defined churn users and non-churn users, and the accuracy of the training result is greatly superior to that of a traditional prediction model. The deep learning model can mine the deep law of the user data, and the accuracy and coverage rate are effectively improved.

Most importantly, this method is dynamic, and it can delineate the attrition users in real time according to the attrition user screening method of the present invention, thereby updating the model's prediction results in real time. The dynamic property enables the model to accurately capture the variation trend of the user behavior, thereby adjusting the saving strategy in real time and keeping the effective prevention of the lost user.

Specifically, the deep learning model is optimized by a gradient descent method during training, and an overfitting prevention technique is applied during training, wherein the overfitting prevention technique comprises early stopping and regularization.

In particular, taking the simple linear regression problem as an example, our goal is to minimize the gap between the predicted and the true values, i.e. the error. We adjust the model parameters stepwise in an iterative fashion to minimize this error. Each adjustment is made along the negative gradient of the error function, that is, each time we adjust the parameter in the direction that gives the fastest reduction of the error, the so-called "gradient descent". For example, in training the LSTM model, the initial value of the parameter may be set randomly, e.g., the initial value of the weight W is 0.5. In the first iteration process, we calculate the gradient G of the error E and W, then update W to W-G, and repeat this until the error E is smaller than the threshold value set by us or the preset number of iterations is reached. Early cessation refers to the continuous monitoring of the performance of the model on the validation set during training, and once performance is found to cease to increase or begin to decrease, we cease training, which avoids overfitting the model on the training set. Regularization is to suppress the complexity of the model by adding a penalty term in the error function, so as to avoid the model from excessively depending on certain characteristics and prevent overfitting. For example, we can add the sum of squares (L2 regularization) or the sum of absolute values (L1 regularization) of the weight parameters as penalty terms in the error function, and control the degree of penalty by adjusting the penalty coefficients.

Specifically, first, the gradient descent method provides an effective optimization tool that can systematically adjust model parameters to minimize errors, which is the basis for many complex deep-learning model optimizations. Overfitting prevention techniques such as early stop and regularization can help the model achieve good performance on both the training and test sets, avoiding degradation of performance on new data caused by overfitting of the model on the training set.

Specifically, the fitness function comprises an artificial cost constraint function and a convergence speed, wherein the artificial cost constraint function is calculated by adopting a formula (1);

the convergence speed is calculated by adopting a formula (2);

cost of labor constraint function C (t) =n (t) +d (t) (1)

Convergence speed v= (N1-N2)/(t 1-t 2) (2)

Specifically, the fitness function is a function for measuring the fitness of an individual in an evolutionary algorithm, and comprises two parts: cost of labor constraint function and convergence speed. The cost of labor constraint function is used to measure the cost of labor in the process of selecting the optimal retention, and is calculated by using the formula (1) and is composed of the sum of the number of characters (N (t)) and the data amount (D (t)). The convergence speed (V) is used to measure the convergence speed in the process of selecting the best retention. The difference between the optimal saving rates at different times N1 and N2 divided by the use time lengths t1 and t2 of the process of selecting the optimal saving rate is calculated using formula (2). The specific parameters and calculation modes in formulas (1) and (2) may need to be adjusted according to specific application scenarios.

Referring to fig. 2, another embodiment of the present invention provides a user screening method, which includes:

step S1000: classifying the users according to the asset values of the users into different asset scale classes;

step S2000: screening out users with reduced assets between the ith month and the ith-1 month according to the absolute asset amplitude reduction of the users;

Step S3000: calculating the asset amplitude reduction percentage of the user screened in the step S2000, and sequencing according to the amplitude reduction range from large to small to obtain a user amplitude reduction list;

step S4000: calculating average asset amplitude reduction of the first X% of users in the user amplitude reduction list in each asset scale level, and acquiring an asset amplitude reduction distribution table of the users with assets reduced in the ith month compared with the ith-1 month;

step S5000: acquiring an asset amplitude reduction distribution table of a user with assets reduced in the i-1 month compared with the i-2 month;

step S6000: comparing the asset amplitude reduction distribution table obtained in the step S4000 with the asset amplitude reduction distribution table obtained in the step S5000, if the difference of the results is larger than the set threshold value, updating i to i-1, and then repeatedly executing the steps S2000 to S6000, and if the difference is smaller than or equal to the set threshold value, executing downwards;

step S7000: finding out an optimal saving rate X% from the user asset amplitude reduction distribution table obtained in the step S4000;

step S8000: the found optimal saving rate X% is put into a user asset amplitude reduction distribution table obtained for the first time in the step S4000, and the average amplitude reduction corresponding to the X% is found, wherein the amplitude reduction is target asset amplitude reduction;

step S9000: in the user amplitude reduction list in each asset scale level of the initial month i, if the actual amplitude reduction of the user is greater than the target asset amplitude reduction, judging the user as a lost user, and if the actual amplitude reduction of the user is less than the target asset amplitude reduction, judging the user as a non-lost user;

Step S10000: and (3) taking the lost users and the non-lost users divided in the step S9000 as training sets of models, inputting a deep learning model for training, and predicting the lost users possibly occurring in the future according to the existing user data by the trained models.

Specifically, in step S1000, the user asset value is first entered into a stage of ranking to classify the user by asset size. The method is a grading strategy based on the asset value of the user, and the core aim is to realize classification management of the user and differential analysis of the user based on the asset scale. In this operational flow, users are classified at different asset size levels according to their asset values. For example, users with assets under 50 ten thousand may be classified as class a, users with assets between 50 ten thousand and 100 ten thousand are classified as class B, users with assets between 100 ten thousand and 150 ten thousand are classified as class C, and so on, one class is promoted every 50 ten thousand asset values until all users are covered. This approach facilitates the fine management of user assets, thereby improving the efficiency of asset management.

Specifically, in step S2000, this stage is a filtering based on the absolute asset degradation of the user to determine users whose assets have decreased between month i and month i-1. It aims to identify users who may face financial problems by detecting the slipping down of the user's assets. The operation involves calculating the asset balance of each user in two consecutive months and converting these balance into a percentage form, for example, the asset value of user A at 6 months is 100 ten thousand, the asset value at 7 months is 90 ten thousand, the asset reduction is 10 ten thousand, and the asset reduction is 10%.

Specifically, in step S3000, the percentage of the user 'S asset degradation screened in step S2000 is calculated, and the user' S degradation list is obtained by sorting the assets according to the degradation from large to small. This is to further differentiate and identify users with greater asset reduction, providing basis for subsequent risk assessment and user management. For example, user A, asset size class A, asset value falls by 50%; user B with the asset scale level A, wherein the asset value amplitude reduction is 70%; user C with the asset scale level A, wherein the asset value amplitude reduction is 60%; then, among the users with asset size class a, the user's list of decreasing amplitude is: the second, the third and the first are sequentially arranged.

Specifically, in step S4000, the average asset degradation of the first X% of the users in the user degradation list in each asset scale level is calculated, and the asset degradation distribution table of the users whose assets are reduced in the ith month than in the ith-1 month is acquired. This is a process of averaging and constructing a distribution table whose function is to better understand the overall condition of user asset degradation. The effect of this can be to explore and learn the specifics of users of different asset size classes in terms of asset curtailment, which is very helpful in formulating specific risk control strategies. For example, assuming that the asset degradation percentages for the first 30% of users with highest degradation are 70%,60% and 50% among the users with asset scale level a, the average asset degradation for the first 30% of users is (70% +60% +50%)/3=60%; the asset degradation percentages of the first 40% of users with the highest degradation are 70%,60%, 50%, 40%, then the average asset degradation of the first 40% of users is (70% +60% +50% + 40%)/4=55%.

Specifically, in step S5000, the asset degradation distribution table of the user whose asset is reduced in the i-1 th month than in the i-2 th month is acquired. This step is to obtain historical data, and compare the historical data with the current month data, so that the asset amplitude reduction trend is identified. For example, assuming that in the i-1 month, among the users having the asset scale level a, the average asset degradation of the users having the previous 10% of the degradation level is 70%, the average asset degradation of the users having the previous 20% of the degradation level is 60%, and the average asset degradation of the users having the previous 50% of the degradation level is 50%, the data of the previous 10% of the degradation level 70%, the previous 20% of the degradation level 60% and the previous 50% of the degradation level 50% may be used as a reference for comparing the degradation distribution data of the month.

Specifically, in step S6000, the asset amplitude reduction distribution table obtained in step S4000 is compared with the asset amplitude reduction distribution table obtained in step S5000, if the difference is greater than the set threshold, for example, the set threshold is 50%, the asset amplitude reduction obtained in step S400 is assumed to be 60%, the asset amplitude reduction obtained in step S5000 is 50%,60% -50% = 10% is greater than the set threshold, steps S2000 to S6000 are repeatedly performed after i is updated to i-1, and if the difference is less than or equal to the set threshold, the steps are performed downward. Thus, the strategy can be dynamically adjusted according to the variation trend of asset amplitude reduction. This step is mainly to check if there is a mutation, and if so, it is necessary to re-rank the assets and screen the users.

Specifically, in step S7000, the optimal saving rate X% is found in the user asset reduction profile obtained in step S4000. For example, assuming that the retention rate can be optimized by finding out that the asset is reduced by 60% among users of asset size class A, this 60% is the optimal retention rate.

Specifically, the first step is to initialize a set of values. This set of values contains different values of the retention X%, each of which can be regarded as an individual. For example, we can randomly select the value of 50 pull rates from the range 0%,100% ] as the initial set of values. Next, a fitness function is defined to evaluate the fitness of each retention X%. In this case, the fitness function may be the success rate of each rescue rate X% at the time of rescue of the user. That is, the higher the retention, the higher the fitness. After starting the iteration, in each iteration, the selection operation is firstly performed according to the fitness of each individual, namely, the individual with high fitness has a higher chance to be selected. Then, a crossover operation is performed to simulate the genetic process of the organism, and two selected individuals can generate two new individuals. Finally, mutation operation is carried out to simulate the mutation process of organisms and randomly change a certain part of an individual. This iterative process may continue for multiple rounds until a certain stopping criterion is met, e.g., the number of iterations reaches a certain value, or the optimal solution does not improve within a certain number of rounds. The individual with the highest fitness, i.e. the retention X, can then be selected from the final set of values as the optimal solution. In this example we have found that the optimum saving X% is 60%, meaning that when saving the user, the best result is achieved if we use 60% saving.

Specifically, in step S8000, the found optimal saving rate X% is brought into the user asset amplitude reduction distribution table obtained for the first time in step S4000, and the average amplitude reduction corresponding to X% is found, where the amplitude reduction is the target asset amplitude reduction. For example, if the optimal saving rate found in the class a user is 60%, the average reduction corresponding to 60% found in the user asset reduction profile obtained for the first time in step S400 is 50%, and this 50% is the target asset reduction.

Specifically, in step S9000, in the user ' S decreasing amplitude list in each asset scale level of the initial i months, if the user ' S actual decreasing amplitude is greater than the target asset decreasing amplitude, the user is determined to be a lost user, and if the user ' S actual decreasing amplitude is less than the target asset decreasing amplitude, the user is determined to be a non-lost user. For example, assuming that in class a users the target reduction is 50%, if the actual reduction of user a is 60%, then user a is determined to be a churn user; if the actual reduction of the second user is 40%, then the second user is determined to be a non-attrition user.

Specifically, in step S10000, the churn users and the non-churn users divided in step S9000 are used as training sets of models, the training sets are input into a deep learning model for training, and the trained models can predict churn users possibly occurring in the future according to the existing user data. For example, after model training is completed, it can be predicted whether the user C is likely to become a lost user in the future according to the data of the user A and the user B.

Referring to fig. 3, a user screening system according to an embodiment of the present invention includes: the system comprises a user asset dividing module, an asset negative growth screening module, an average asset amplitude reduction calculation module, a distribution table comparison module, an execution module, a loss user identification module and a future loss user prediction module.

The user asset classifying module 10 is configured to classify users according to user asset values in the database to form a plurality of asset scale classes;

the asset negative growth screening module 20 is configured to select, for each user of the asset scale levels, users whose assets negatively grow in a specific month, and rank the users according to the size of the reduced frames, to generate a reduced list of users in each asset scale level;

the average asset amplitude reduction calculation module 30 is configured to calculate average asset amplitude reduction of a previous specific percentage of users according to the user amplitude reduction list, and generate a user asset amplitude reduction distribution table under each asset scale level;

the distribution table comparison module 40 is configured to compare differences of the user asset amplitude reduction distribution tables for two consecutive months at each asset scale level, and if the differences are greater than a set threshold, update the month and re-perform the filtering and calculating operations;

The execution module 50 is configured to find, for each of the asset scale levels, an optimal retention rate and a corresponding target asset degradation after the difference satisfies a threshold condition;

the churn user identification module 60 is configured to determine, in the user churn list of each asset scale level, that a user whose actual churn is greater than the target asset churn is a churn user, and that a user whose actual churn is less than the target asset churn is a non-churn user according to the obtained target asset churn;

the future churn user prediction module 70 is configured to predict churn users that may occur in the future based on the identified churn users and the non-churn users.

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

The foregoing description is only of the preferred embodiments of the invention and is not intended to limit the invention; various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of screening a user, comprising:

screening lost users and non-lost users in the ith month;

setting a threshold value;

Predicting a future churn user based on the churn user;

determining a churn user within the current asset scale level from the relationship of the difference of the first user asset degradation profile and the second user asset degradation profile to the threshold comprises:

finding an optimal saving rate X% from the first user asset amplitude reduction distribution table;

2. The user screening method of claim 1, wherein obtaining a first user asset reduction profile comprises: the average asset amplitude reduction of the first X% of users in each user amplitude reduction list is calculated respectively, and a first user asset amplitude reduction distribution table under each asset scale level is obtained, wherein the first user asset amplitude reduction distribution table comprises a plurality of average asset amplitude reduction and user percentages corresponding to the average asset amplitude reduction, the first user asset amplitude reduction distribution table is obtained according to user asset data of the ith month and the ith-1 month, and a certain average asset amplitude reduction in the first user asset amplitude reduction distribution table is obtained by screening target asset loss of the first ith month user and non-user in the asset scale level.

3. The user screening method of claim 2, wherein obtaining a second user asset reduction profile comprises: and selecting users with negative asset growth in the i-1 month and the i-2 month in each asset scale level, sequencing the users according to the amplitude reduction amplitude from large to small, acquiring a user amplitude reduction list in each asset scale level, and calculating the average asset amplitude reduction of X% of users before amplitude reduction in the user amplitude reduction list.

4. A user screening method according to claim 3, further comprising, after determining a attrition user: and if the actual asset amplitude reduction of the user is smaller than the target asset amplitude reduction, judging that the user is a non-loss user in the current asset scale level.

5. A user screening method as recited in claim 4, wherein, in determining a relationship between a difference of the first user asset reduction profile and the second user asset reduction profile and the threshold value,

The threshold is initially set to 50% and can be adjusted according to accuracy and coverage.

6. A user screening method according to claim 5, characterized in that in finding the optimal retention x%, a set of values of different retention x% is initially generated, then a new set of values is generated using selection, crossover and mutation operations, the fitness of each individual retention X is evaluated by a fitness function based on the user asset degradation profile and a predetermined target asset degradation design, the fitness evaluation criterion being the accuracy of predicting future loss users.

7. The user screening method according to claim 6, wherein the future loss user is predicted by a trained deep learning model, the trained deep learning model is obtained by training based on the screened loss user as a loss label, the data of the screened loss user and the non-loss user are used as a training set of the deep learning model, the selected deep learning model is input for training, and the trained deep learning model predicts the loss user which possibly appears in the future according to the existing user data.

8. A user screening method according to claim 7, wherein the deep learning model is optimized during training using a gradient descent method, and during training an overfitting prevention technique is applied, the overfitting prevention technique comprising early stopping and regularization.

9. The user screening method of claim 8, wherein the fitness function comprises an artificial cost constraint function and a convergence speed, wherein the artificial cost constraint function is calculated using formula (1);

the convergence speed is calculated by adopting a formula (2);

cost of labor constraint function C (t) =n (t) +d (t) (1)

Convergence speed v= (N1-N2)/(t 1-t 2) (2)

10. A system for applying the user screening method of any one of claims 1 to 9, comprising:

The user asset classification module is used for classifying the users into a plurality of asset scale grades according to the user asset values stored in the database;