CN114331696A - Risk assessment method, device and storage medium - Google Patents

Risk assessment method, device and storage medium Download PDF

Info

Publication number
CN114331696A
CN114331696A CN202111681270.XA CN202111681270A CN114331696A CN 114331696 A CN114331696 A CN 114331696A CN 202111681270 A CN202111681270 A CN 202111681270A CN 114331696 A CN114331696 A CN 114331696A
Authority
CN
China
Prior art keywords
sample
rejection
samples
rejected
backtracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111681270.XA
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Real AI Technology Co Ltd
Original Assignee
Beijing Real AI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Real AI Technology Co Ltd filed Critical Beijing Real AI Technology Co Ltd
Priority to CN202111681270.XA priority Critical patent/CN114331696A/en
Publication of CN114331696A publication Critical patent/CN114331696A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

Embodiments of the present application relate to the field of wind control, and some embodiments provide a risk assessment method, apparatus, and storage medium. The method comprises the following steps: acquiring user data of a new application user to be evaluated; extracting a first backtracking feature of the user data; respectively comparing the first backtracking features with each second backtracking feature and each third backtracking feature of a preset sample group to obtain the similarity of the first backtracking features with each second backtracking feature and each third backtracking feature respectively; and determining a risk evaluation result of the new application user according to the similarity. According to the method and the device, the unbiased model is obtained through the fusion training of the screened target rejection sample and the sample and serves as the risk assessment model, so that the default probability of the new application user is calculated through the risk assessment model, and the assessment accuracy of the default risk assessment of the new application user is effectively improved.

Description

Risk assessment method, device and storage medium
Technical Field
The embodiment of the application relates to the field of wind control, in particular to a risk assessment method and device and a storage medium.
Background
In the wind control modeling scene, the default risk of the applicant is mainly predicted by constructing a model, so that the default probability of the applicant is obtained to assist in judging whether to offer a loan.
In the related art, the performance of an evaluation model in a wind control modeling scene needs to consider the actual performance effect of the model after the model is online. However, the model is often overestimated due to sample bias, so that the model is superior in performance on the test data set, but the actual online effect is not as expected. The reason for forming the sample bias is that, in the statistical analysis, the global sample is replaced with the local sample without considering whether the local sample has enough representativeness, so that the analysis of the global sample is biased by the model trained based on the local sample.
Taking the wind control field as an example, the sample deviation is usually caused by selecting the crowd with the credit performance as the training sample and applying the trained model to all new application crowds. Although the adoption of the way of rejecting inference brings a certain gain to the problem of sample deviation, the rejecting inference depends on the assumption that the model which passes through the customer group is effective in rejecting the sample, so that the effect of rejecting inference finally cannot be expected due to the fact that the model is not good in rejecting the sample.
Disclosure of Invention
The embodiment of the application provides a risk assessment method, a risk assessment device and a storage medium, wherein the screened rejection sample and the screened passing sample are fused to serve as the basis of risk assessment, the assessment accuracy of default risk assessment of a new user is effectively improved no longer only according to the passing sample, and the user which is judged to be over-high in risk when the assessment is carried out only according to the passing sample can be salvaged.
In a first aspect of the present application, there is provided a risk assessment method comprising:
acquiring user data of a new application user to be evaluated;
extracting a first backtracking feature of the user data;
respectively comparing the first backtracking features with each second backtracking feature and each third backtracking feature of a preset sample group to obtain the similarity of the first backtracking features with each second backtracking feature and each third backtracking feature, wherein the owner of each second backtracking feature is a corresponding passing sample, and the owner of each third backtracking feature is a corresponding rejecting sample; wherein the backtracking characteristics are used for representing the pre-loan performance of the corresponding owner;
and determining a risk evaluation result of the new application user according to the similarity.
In a second aspect of the present application, there is provided a risk assessment apparatus comprising:
the input and output module is used for acquiring a sample to be evaluated of a new application user;
the processing module is used for extracting a first backtracking characteristic of the sample to be evaluated, and comparing the first backtracking characteristic of the sample to be evaluated with a preset second backtracking characteristic of a passing sample group and a preset third backtracking characteristic of a first rejected sample group to obtain a risk evaluation result of the new application user;
and the input and output module is also used for outputting the risk assessment result of the new application user.
In a third aspect of the present application, a computer-readable storage medium is provided, comprising instructions which, when run on a computer, cause the computer to perform the method according to the first aspect.
In a fourth aspect of the present application, there is provided a computing device comprising: at least one processor, a memory, and an input-output unit; wherein the memory is adapted to store a computer program and the processor is adapted to invoke the computer program stored in the memory to perform the method according to the first aspect.
Compared with the prior art, when risk assessment is performed on a new user, the prior art obtains a new application user sample, extracts the backtracking feature of the new application user sample, compares the extracted backtracking feature of the new application with the backtracking feature of a user sample which has passed the application and the backtracking feature of a rejected sample, and assesses the user sample which is newly applied at present, so that the prior art can be seen that the prior passed sample and rejected sample are directly subjected to secondary classification, and whether the sample which is applied at present passes or not is judged. In the embodiment of the application, a part of samples are screened from rejected samples, then the rejected samples are combined with the passed samples to form a non-deviation sample group, and then the newly applied user data is evaluated based on the non-deviation sample group.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present application will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present application are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
fig. 1 is a schematic view of an application scenario of a risk assessment method according to some embodiments of the present application;
FIG. 2 is a schematic flow chart of a risk assessment method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of screening reject samples in a risk assessment method according to an embodiment of the present application;
FIG. 4 is a schematic illustration of a screening reject sample of a risk assessment method according to another embodiment of the present application;
FIG. 5 is a schematic structural diagram of a risk assessment device according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a computing device according to an embodiment of the present application.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
The principles and spirit of the present application will be described with reference to a number of exemplary embodiments. It is understood that these examples are given solely to enable those skilled in the art to better understand and to practice the present application, and are not intended to limit the scope of the present application in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As will be appreciated by one skilled in the art, embodiments of the present application may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
Moreover, any number of elements in the drawings are by way of example and not by way of limitation, and any nomenclature is used solely for differentiation and not by way of limitation.
The embodiment of the application aims to obtain the unbiased model by screening the rejection sample and combining the screened rejection sample with the pass sample, so that a more accurate risk assessment result is output through the unbiased model, and the assessment capability of the default risk of an applicant is improved.
The unbiased model is an unbiased estimation that the output predicted value is the actual situation. Unbiased estimation is an unbiased inference when sample statistics are used to estimate the overall parameters. The mathematical expectation of the estimator is equal to the true value of the estimated parameter, and the estimator is called an unbiased estimation of the estimated parameter, i.e. has unbiased property, and is a criterion for evaluating the superiority of the estimator. The unbiased estimation means: their average approaches the estimated parameter true value over multiple iterations. Unbiased estimation is often applied to test score statistics.
The method and the device solve the relevant problems existing in the risk assessment scene in the wind control field by adopting an unbiased model, an artificial intelligence technology and a machine learning technology.
Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, data mining, risk assessment, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
The technical improvement principle of the present application will be described first, and then the technical solution of the present application will be described in detail with reference to several embodiments.
Summary of The Invention
The inventor finds that in the related art, the performance of the evaluation model in the wind control modeling scene is superior on the test data set due to sample deviation, but the actual online effect is not as expected. The reason for forming the sample bias is that, in the statistical analysis, the global sample is replaced with the local sample without considering whether the local sample has enough representativeness, so that the analysis of the global sample is biased by the model trained based on the local sample.
In the field of wind control, the sample bias is usually caused by selecting a population with a credited performance as a training sample and applying the trained model to all new application populations. In the related art, although a certain gain is brought to the problem of sample deviation by adopting the way of rejecting inference, the rejecting inference depends on the assumption that the model passing through the customer group is effective in rejecting the sample, so that the effect of rejecting inference finally cannot be expected due to poor effect of the model on rejecting the sample.
In other related technologies, although the rejected sample is labeled and participates in the model of the passed sample, the model design starts from the common feature dimension of the passed sample and the rejected sample. However, in a wind control scene of practical application, a client who rejects wind control (i.e., a rejection sample) is greatly different from a client who passes through the wind control (i.e., a pass sample), so that the common information dimension is small, and the credit risk condition of the client who rejects wind control cannot be accurately evaluated.
The inventor finds that representative user data is fished back from user data of a user refusing to apply for in the process of searching a risk assessment model capable of solving the problem of sample deviation, and the training sample obtained by fusing the representative user data with the user data applied for can enhance the integrity of the training sample, so that a complete unbiased model is trained through the training sample to serve as the risk assessment model, and the accuracy of the risk assessment model is improved. Furthermore, the default probability of the new application user is calculated through the risk assessment model, and the assessment accuracy of the default risk assessment of the new user is effectively improved.
Having described the basic principles of the present application, various non-limiting embodiments of the present application are described in detail below.
Application scene overview
Please refer to fig. 1, which illustrates a schematic structural diagram of an application environment related to a risk assessment method according to an embodiment of the present application. The application environment may include a terminal 01 and a server 02. The terminal 01 can be a computer, a tablet computer, a smart phone and the like. The server 02 may be a server, a server cluster composed of several servers, or a cloud computing service center. And a connection between the terminal 01 and the server 02 can be established through a wired or wireless network.
The server 02 may be deployed with risk assessment models, such as an Artificial Intelligence (AI) model trained by a machine learning-based method, such as a financial risk assessment model and a credit risk assessment model.
The terminal 01 may send original user data to the server 02, where the original user data includes user data of a user who has passed the application and user data of a user who has rejected the application, and the server 02 may use the user data of the user who has passed the application as a pass sample and the user data of the user who has rejected the application as a reject sample, and then perform sample screening using the reject sample and respective real-time characteristics of the pass sample. The sample screening result is used for further screening a target rejection sample needing to be fished back from the rejection samples, so that default risks of a new application user are predicted by using the target rejection sample and a risk assessment model obtained through sample training, and the risk assessment result of the new application user is fed back to the terminal 01.
It should be noted that, the implementation environment may also include only the terminal 01, and the server 02 is not included, and the risk prediction model may be directly deployed in the terminal 01. In this implementation, the terminal 01 may also be a server, or a server cluster composed of several servers, or a cloud computing service center.
The risk assessment method provided by the embodiment of the application can be applied to default risk assessment models in the field of wind control, such as credit risk assessment models.
Exemplary method
The method for generating the countermeasure disturbance according to the exemplary embodiment of the present application is described below with reference to fig. 2 in conjunction with the application scenario of fig. 1, and the method may be applied to a computing device, which may be the terminal 01 or the server 02 in the above-listed application scenario, and the present application does not limit the product form and structure of the computing device executing the risk assessment method. It should be noted that the above application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present application, and the embodiments of the present application are not limited in this respect. Rather, embodiments of the present application may be applied to any scenario where applicable.
In one embodiment of the present embodiment, there is provided a risk assessment method including:
step S110: and acquiring user data of the new application user to be evaluated.
In this step, user data of the new user is obtained, wherein the user data is, for example, a user's study, work, credit record, tax payment record, credit bill, online shopping record, repayment record, and the like.
Step S120: and extracting a first backtracking characteristic of the user data.
In this step, based on the user data of the new application user obtained in step S110, in this step, a first retrospective feature needs to be extracted from the user data, where the first retrospective feature is a feature that can reflect some features of the application user in a past period, such as extracting past credit bills, tax payment records, credit collection records, repayment records, and the like, and the first retrospective feature can be used to indicate the pre-loan performance of the applicant.
Step S130: and respectively comparing the first backtracking features with each second backtracking feature and each third backtracking feature of a preset sample group to obtain the similarity of the first backtracking features with each second backtracking feature and each third backtracking feature respectively.
The owner of each second backtracking feature is a corresponding pass sample, the owner of each third backtracking feature is a corresponding reject sample, and the backtracking features are used for representing the pre-loan performance of the corresponding owner. That is, a second backtracking feature may be extracted from the pass sample and a third backtracking feature may be extracted from the reject sample. After the first backtracking feature, the second backtracking feature and the third feature are obtained, the similarity of the first backtracking feature to each second backtracking feature and each third backtracking feature respectively can be calculated, and then a risk assessment result is obtained according to the similarity, namely step S140, the risk assessment result of the new application user is determined according to the similarity.
For example, in an embodiment, the user data of a plurality of past users who have applied for a pass may be obtained in advance as a pass sample, and the user data of a plurality of past users who have been denied for an application may be obtained as a deny sample. The plurality of pass samples form a pass sample group, and the plurality of reject samples form a reject sample group.
In the embodiment of the application, the user data requested by the service to be evaluated refers to the user data requested by the service to be evaluated in the historical user data. For example, user data of a user applying for a loan in a credit service. The user data of the application user is rejected, which means that the user data which does not pass the service application to be evaluated in the historical user data. For example, there is no user data for users who have applied for a loan in the credit business.
In this embodiment, the user data of each passing sample and each rejecting sample may be obtained from an open source data set, or may be obtained from historical user data provided by a user, which is not limited in this embodiment.
In practical applications, the user data includes, but is not limited to, real-time feature data and backtracking feature data. The real-time characteristic data is used for reflecting user real-time information, such as user real-time behavior information and user real-time attribute information. Specifically, the real-time feature data may be real-time data of the applicant provided by a third party, such as the number of calls to be received by the applicant within a preset time period (e.g., within one month, within one quarter, etc.), the credit card transaction status within the preset time period, and the current job status. The backtracking characteristic data is, for example, pre-loan characteristic data of the user, including but not limited to one or more of the applicant's financial situation, loan usage, historical record of default, business situation, company's industry. The applicant type is not limited in this embodiment, and may be an individual, an enterprise, or an organization. The type of user data actually applied may be adjusted according to the applicant type.
After the generic group of samples and the reject group of samples are acquired, the acquiring of the respective second backtracking features and the respective third backtracking features may be performed.
For the passing sample group, the second retrospective feature of each general sample can be directly obtained, but for the rejecting sample group, after the third retrospective feature of each rejecting sample is obtained, it needs to be screened.
The rejected samples are screened in two ways of similarity and preset model.
The first method is as follows: screening by similarity
As shown in fig. 3, in the present embodiment, first real-time characteristics of each passing sample in the group of passing samples are obtained to form a first real-time characteristic set, where the first real-time characteristics are used to represent the post-credit performance of the corresponding passing sample;
acquiring second real-time characteristics of each rejected sample in the rejected sample group to form a second real-time characteristic set, wherein the second real-time characteristics are used for representing the post-credit performance of the corresponding rejected sample;
respectively carrying out similarity calculation on each second real-time feature in the second real-time feature set and each first real-time feature in the first real-time feature set;
and screening the rejection samples in the rejection sample group according to the similarity calculation result to obtain the rejection samples meeting the preset conditions in the rejection sample group.
In this embodiment of the application, the reject samples in the reject sample group may be screened according to the similarity calculation result in the following manner, so as to obtain reject samples in the reject sample group that meet the preset condition, specifically as follows:
according to the similarity calculation result, acquiring first reject samples from the reject sample group, wherein the similarity of each pass sample of the pass sample group is greater than a first preset threshold;
acquiring second reject samples from the reject sample group, wherein the similarity of the second reject samples and each pass sample of the pass sample group is smaller than a second preset threshold;
and taking the first rejection sample and the second rejection sample as the rejection samples meeting the preset conditions.
For example, in this embodiment, the similarity between each rejected sample in the rejected sample group and each general sample is calculated, the first preset value is set to 99%, and the preset value is set to 1%, so that rejected samples with a similarity exceeding 99% and rejected samples with a similarity less than 1% can be selected as rejected samples meeting the conditions.
In another implementation of the present application, the screening of the rejection sample may also be performed by:
sorting each rejection sample in the rejection sample group according to the similarity calculation result;
obtaining a plurality of third rejection samples which are sorted in the front according to a first preset proportion from the sorted rejection sample group;
obtaining a plurality of fourth rejected samples which are sorted according to a second preset proportion from the sorted rejected sample groups;
and taking the third rejection sample and the fourth rejection sample as the rejection samples meeting the preset conditions.
For example, the similarity of the rejected samples is sorted from high to low, and the rejected samples of the top 10 percent (a first preset proportion) and the rejected samples of the bottom 10 percent (a second preset proportion) are taken as rejected samples meeting the preset condition.
The second method comprises the following steps: by predictive models
In another embodiment of the present application, as shown in fig. 4, the samples can be pre-screened based on the rejection sample group through a pre-established prediction model.
In the embodiment of the present application, the prediction model may be constructed as follows:
first real-time characteristics of each passing sample in the passing sample group are obtained, each first real-time characteristic constitutes a first real-time characteristic set, and each first real-time characteristic is used for representing the post-credit performance of each passing sample.
Each pass sample in the set of pass samples is labeled.
Then, the first set of real-time features is fitted to the labels of each pass sample in the group of pass samples to build the predictive model.
In another embodiment of the present application, the establishing of the sample screening model by using the first real-time feature data and the respective real-time features of the labeled pass samples may be implemented as: and taking the first real-time characteristic data and the passing sample with the label as input, and training to obtain a real-time prediction model.
For example, real-time feature data by sample is assumed to be third party user data by the user, such as the number of hasty calls received by the user in the last month. And fusing the number of the call collection terminals passing through the sample with the labels corresponding to the passing sample, and training to obtain a real-time prediction model aiming at the number of the call collection terminals (namely real-time characteristics).
After how to obtain the sample screening model for screening the rejected samples is determined, the real-time characteristic data of the rejected samples can be input into the sample screening model for screening, so that the rejected samples can be salvaged. After the prediction model is constructed, each rejected sample in the rejected sample group can be screened by using the prediction model, and in the embodiment of the application, the screening process is as follows:
firstly, obtaining second real-time characteristics of each rejected sample of the rejected sample group to obtain a second real-time characteristic set;
then inputting the second fact characteristic of each rejected sample in the rejected sample group into the prediction model, namely, inputting each second real-time characteristic in the second real-time characteristic set into the prediction model to obtain a prediction result corresponding to each rejected sample;
at this time, the rejection samples in the rejection sample group can be screened according to the prediction results of the rejection samples, so as to obtain the rejection samples meeting the preset conditions in the rejection sample group.
Specifically, in the foregoing step, the sample screening model is used to predict the second real-time feature data to obtain a prediction evaluation result corresponding to the rejected sample, and the prediction evaluation result may be implemented as: and taking the second real-time characteristic data as input, carrying out real-time prediction on the second real-time characteristic data through a real-time prediction model, and outputting a label rejecting the sample.
For example, real-time feature data by sample is assumed to be third party user data by the user, such as the number of hasty calls received by the user in the last month. The real-time characteristic data of the rejection sample is assumed to be third party user data of the rejecting user, such as the number of incoming calls received by the user in the last month.
And (3) taking the number of the incoming calls of the rejected sample as input, and carrying out real-time prediction on the number of the incoming calls of the rejected sample by using a real-time prediction model for the number of the incoming calls (namely real-time characteristics) obtained by training in the steps, and outputting a label of the rejected sample. In practical application, the default risk probability corresponding to the rejection sample can be used.
In the embodiment of the present application, the rejected samples in the rejected sample group may be screened according to the prediction result of each rejected sample in the following manner:
firstly, sorting all rejection samples in the rejection sample group according to the prediction results of all rejection samples;
obtaining a plurality of fifth rejection samples ranked in the front according to a third preset proportion from the sorted rejection sample group;
obtaining a plurality of sixth reject samples which are sorted according to a fourth preset proportion from the sorted reject sample group;
and taking the fifth rejection sample and the sixth rejection sample as the rejection samples meeting the preset condition.
For example, in this embodiment, the prediction evaluation result corresponding to the rejection sample includes, but is not limited to, a default probability of the rejection sample, and a category of the rejection sample. For example, the rejection samples are sorted from high to low according to the default probability, and the first 10 percent (third preset proportion) of the rejection samples and the last 10 percent (fourth preset proportion) of the rejection samples are taken as the rejection samples meeting the preset condition. In practical application, the predicted default probability value of the rejected user can be obtained, the rejection sample of the first 10 percent is taken as a bad sample, that is, the rejection sample of the first 10 percent with the default probability value closest to 1, the rejection sample of the last 10 percent is taken as a good sample, that is, the rejection sample of the first 10 percent with the default probability value closest to 0, and the two rejection samples are taken as qualified rejection samples needing to be salvaged.
For another example, assuming that the prediction result corresponding to the rejection sample is the default probability of the rejection sample, based on this, the preset condition for screening the rejected user may also be that the default probability of the rejection sample is higher than a third preset proportion, and the default probability of the rejection sample is lower than a fourth preset proportion. For example, the rejection samples with the default probability greater than a (third preset proportion) and the default probability less than b (fourth preset proportion) are used as the target rejection samples to be retrieved. Wherein a is greater than b.
Whether the similarity between the real-time characteristics of the rejected samples and the real-time characteristics of the passed samples is determined or a preset prediction model is determined, a representative part of samples can be screened from the rejected samples, so that the problem of sample deviation is avoided.
In another embodiment of the present application, after obtaining a rejected sample that meets a preset condition in the rejected sample group (i.e. after screening the rejected sample group according to the preset condition), the method further includes:
obtaining a third backtracking feature of each rejected sample;
obtaining a fourth backtracking feature of each passing sample;
and pre-establishing a rejection deduction model for risk assessment according to the third backtracking characteristics and the corresponding labels of each rejection sample and the fourth backtracking characteristics and the corresponding labels of each passing sample.
After how to screen out the rejection samples meeting the conditions from the rejection sample group is determined, in the embodiment, the default risk of the new application user is predicted by using the rejection samples meeting the preset conditions and the risk assessment model obtained through sample training, so as to obtain a risk assessment result of the new application user.
Specifically, a data training set can be constructed by using the samples and the screened rejection samples, a risk assessment model is obtained by using the data training set for training, and the risk assessment model is used for predicting the default risk of the new application user to obtain a risk assessment result.
Wherein the third backtracking feature for each rejected sample after screening and the fourth backtracking feature for each passed sample have been obtained. Based on the above, the third backtracking characteristic and the fourth backtracking characteristic are fused to construct a data training set for establishing the rejection deduction model. And then, taking the passed sample and the rejected sample after screening as input, and modeling the third backtracking characteristic data and the fourth backtracking characteristic data to obtain a rejected deduction model capable of outputting an unbiased estimated value. Therefore, the risk assessment model is adopted to predict the default risk of the new application user to obtain a risk assessment result, such as: and taking the first backtracking characteristic data of the new application user as input, and outputting the default probability of the new application user through a refusal deduction model.
Compared with the prior art, when risk assessment is performed on a new user, the prior art obtains a new application user sample, extracts the backtracking feature of the new application user sample, compares the extracted backtracking feature of the new application with the backtracking feature of a user sample which has passed the application and the backtracking feature of a rejected sample, and assesses the user sample which is newly applied at present, so that the prior art can be seen that the prior art directly classifies the passed sample and the rejected sample in the past twice, judges whether the sample which is applied at present passes or not, but in the rejected sample, part of the sample is rejected, but belongs to the passable type after the sample is rejected, and therefore, the assessment method is not accurate. In the embodiment of the application, a part of samples are screened from rejected samples, then the samples are combined to form an unbiased sample group, and then the user samples of new applications are evaluated based on the unbiased sample group, so that the evaluation method is more accurate.
Exemplary devices
Having described the risk assessment method according to the exemplary embodiment of the present application, next, referring to fig. 5, an apparatus for generating a risk assessment result with higher assessment accuracy according to the exemplary embodiment of the present application, which may also be applied to a computing device shown in an application scenario, includes:
an input/output module 310 configured to acquire user data of a new application user to be evaluated;
a processing module 320 configured to extract a first backtracking feature of the user data; respectively comparing the first backtracking features with each second backtracking feature and each third backtracking feature of a preset sample group to obtain the similarity of the first backtracking features with each second backtracking feature and each third backtracking feature respectively; determining a risk evaluation result of the new application user according to the similarity;
the input-output module 310 is further configured to output the risk assessment results.
In an embodiment of the present application, the preset sample group includes a plurality of reject samples, and the processing module is configured to screen the reject samples in advance by:
acquiring a first real-time feature set of each passing sample, wherein the first real-time feature set comprises real-time features of each passing sample, and the real-time features are used for representing the post-credit performance of the corresponding passing samples;
acquiring a second real-time feature set of each rejected sample, wherein the second real-time feature set comprises the real-time features of each rejected sample;
respectively carrying out similarity calculation on each second real-time feature in the second real-time feature set and each first real-time feature in the first real-time feature set;
and screening the rejection samples in the rejection sample group according to the similarity calculation result to obtain the rejection samples meeting the preset conditions in the rejection sample group.
In this embodiment of the application, the processing module 320 is further configured to screen the rejection samples in the rejection sample group according to the similarity calculation result to obtain rejection samples meeting a preset condition in the rejection sample group by:
according to the similarity calculation result, acquiring first reject samples from the reject sample group, wherein the similarity of each pass sample of the pass sample group is greater than a first preset threshold;
according to the similarity calculation result, acquiring second reject samples, of which the similarity with each pass sample of the pass sample group is smaller than a second preset threshold value, from the reject sample group;
and taking the first rejection sample and the second rejection sample as the rejection samples meeting the preset conditions.
In this embodiment of the application, the processing module 320 is further configured to screen the rejection samples in the rejection sample group according to the similarity calculation result, so as to obtain rejection samples meeting a preset condition in the rejection sample group by:
sorting each rejection sample in the rejection sample group according to the similarity calculation result;
obtaining a plurality of third rejection samples which are sorted in the front according to a first preset proportion from the sorted rejection sample group;
obtaining a plurality of fourth rejected samples which are sorted according to a second preset proportion from the sorted rejected sample groups;
and taking the third rejection sample and the fourth rejection sample as the rejection samples meeting the preset conditions.
In an embodiment of the present application, the processing module 320 is further configured to: the prediction model is built by:
acquiring a first real-time feature set of a passing sample group, wherein the first real-time feature set comprises real-time features of each passing sample, and the real-time features are used for representing the post-credit performance of the corresponding passing samples;
fitting the first set of real-time features to labels of individual pass samples in the group of pass samples to build the predictive model;
the processing module 320 is further configured to: pre-screening the plurality of reject samples based on a reject sample group using the predictive model by:
obtaining a second real-time feature set of the rejected sample groups, wherein the second real-time feature set comprises real-time features of each rejected sample;
inputting the second real-time characteristics of each rejected sample in the rejected sample group into the prediction model to obtain a corresponding prediction result;
and screening the rejection samples in the rejection sample group according to the prediction result to obtain the rejection samples meeting preset conditions in the rejection sample group.
In an embodiment of the present application, after obtaining the rejection samples meeting the preset condition in the rejection sample group, the processing module is further configured to establish a rejection deduction model for risk assessment by:
obtaining a third backtracking feature of each rejected sample;
obtaining a fourth backtracking feature of each passing sample;
and pre-establishing a rejection deduction model for risk assessment according to the third backtracking characteristics and the corresponding labels of each rejection sample and the fourth backtracking characteristics and the corresponding labels of each passing sample.
Compared with the prior art, when risk assessment is performed on a new user, the prior art obtains a new application user sample, extracts the backtracking feature of the new application user sample, compares the extracted backtracking feature of the new application with the backtracking feature of a user sample which has passed the application and the backtracking feature of a rejected sample, and assesses the user sample which is newly applied at present, so that the prior art can be seen that the prior passed sample and rejected sample are directly subjected to secondary classification, and whether the sample which is applied at present passes or not is judged. In the embodiment of the application, a part of samples are screened from rejected samples, then the samples are combined to form an unbiased sample group, and then the user samples of new applications are evaluated based on the unbiased sample group, so that the evaluation method is more accurate.
Exemplary Medium
Having introduced the risk assessment method and apparatus of the exemplary embodiment of the present application, next, a computer-readable storage medium of the exemplary embodiment of the present application will be described with reference to fig. 6, please refer to fig. 6, which illustrates a computer-readable storage medium, which is an optical disc 40, on which a computer program (i.e., a program product) is stored, and when the computer program is executed by a processor, the computer program will implement the steps described in the above method embodiments, for example, obtaining user data of a new application user to be assessed; extracting a first backtracking feature of the user data; respectively comparing the first backtracking features with each second backtracking feature and each third backtracking feature of a preset sample group to obtain the similarity of the first backtracking features with each second backtracking feature and each third backtracking feature respectively; and determining a risk evaluation result of the new application user according to the similarity. The specific implementation of each step is not repeated here.
It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.
Exemplary computing device
Having described the risk assessment methods, apparatus, and media of the exemplary embodiments of the present application, a computing device for risk assessment of the exemplary embodiments of the present application is next described with reference to fig. 7.
FIG. 7 illustrates a block diagram of an exemplary computing device 50 suitable for use in implementing the present application, the computing device 50 may be a computer system or server. The computing device 50 shown in fig. 7 is only one example and should not impose any limitations on the functionality or scope of use of embodiments of the present application.
As shown in fig. 7, components of computing device 50 may include, but are not limited to: one or more processors or processing units 501, a system memory 502, and a bus 503 that couples the various system components (including the system memory 502 and the processing unit 501).
Computing device 50 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computing device 50 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 502 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)5021 and/or cache memory 5022. Computing device 50 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, the ROM5023 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 7, which is commonly referred to as a "hard drive"). Although not shown in FIG. 7, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to the bus 503 by one or more data media interfaces. At least one program product may be included in system memory 502 having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the application.
A program/utility 5025 having a set (at least one) of program modules 5024 may be stored in, for example, system memory 502, and such program modules 5024 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment. The program modules 5024 generally perform the functions and/or methodologies of the embodiments described herein.
Computing device 50 may also communicate with one or more external devices 504 (e.g., keyboard, pointing device, display, etc.). Such communication may be through input/output (I/O) interfaces 505. Moreover, computing device 50 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via network adapter 506. As shown in FIG. 7, network adapter 506 communicates with other modules of computing device 50 (e.g., processing unit 501, etc.) via bus 503. It should be appreciated that although not shown in FIG. 7, other hardware and/or software modules may be used in conjunction with computing device 50.
The processing unit 501 executes various functional applications and data processing by running a program stored in the system memory 502, for example, acquiring user data of a new application user to be evaluated; extracting a first backtracking feature of the user data; respectively comparing the first backtracking features with each second backtracking feature and each third backtracking feature of a preset sample group to obtain the similarity of the first backtracking features with each second backtracking feature and each third backtracking feature respectively; and determining a risk evaluation result of the new application user according to the similarity. The specific implementation of each step is not repeated here.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the risk assessment arrangement are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, according to embodiments of the application. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
While the spirit and principles of the application have been described with reference to several particular embodiments, it is to be understood that the application is not limited to the specific embodiments disclosed, nor is the division of aspects, which is for convenience only as the features in such aspects cannot be combined to advantage. The application is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A method of risk assessment, comprising:
acquiring user data of a new application user to be evaluated;
extracting a first backtracking feature of the user data;
respectively comparing the first backtracking features with each second backtracking feature and each third backtracking feature of a preset sample group to obtain the similarity of the first backtracking features with each second backtracking feature and each third backtracking feature, wherein the owner of each second backtracking feature is a corresponding passing sample, and the owner of each third backtracking feature is a corresponding rejecting sample; wherein the backtracking characteristics are used for representing the pre-loan performance of the corresponding owner;
and determining a risk evaluation result of the new application user according to the similarity.
2. The risk assessment method of claim 1, wherein the preset sample group comprises a plurality of reject samples, which are pre-screened based on the reject sample group by:
acquiring a first real-time feature set of each passing sample, wherein the first real-time feature set comprises real-time features of each passing sample, and the real-time features are used for representing the post-credit performance of the corresponding passing samples;
acquiring a second real-time feature set of each rejected sample, wherein the second real-time feature set comprises the real-time features of each rejected sample;
respectively carrying out similarity calculation on each second real-time feature in the second real-time feature set and each first real-time feature in the first real-time feature set;
and screening the rejection samples in the rejection sample group according to the similarity calculation result to obtain the rejection samples meeting the preset conditions in the rejection sample group.
3. The risk assessment method according to claim 2, wherein the screening the rejected samples in the rejected sample group according to the similarity calculation result to obtain rejected samples meeting a preset condition in the rejected sample group comprises:
according to the similarity calculation result, acquiring first reject samples from the reject sample group, wherein the similarity of each pass sample of the pass sample group is greater than a first preset threshold;
according to the similarity calculation result, acquiring second reject samples, of which the similarity with each pass sample of the pass sample group is smaller than a second preset threshold value, from the reject sample group;
and taking the first rejection sample and the second rejection sample as the rejection samples meeting the preset conditions.
4. The risk assessment method according to claim 2, wherein the screening the rejected samples in the rejected sample group according to the similarity calculation result to obtain rejected samples meeting a preset condition in the rejected sample group comprises:
sorting each rejection sample in the rejection sample group according to the similarity calculation result;
obtaining a plurality of third rejection samples which are sorted in the front according to a first preset proportion from the sorted rejection sample group;
obtaining a plurality of fourth rejected samples which are sorted according to a second preset proportion from the sorted rejected sample groups;
and taking the third rejection sample and the fourth rejection sample as the rejection samples meeting the preset conditions.
5. The risk assessment method of claim 2, wherein the plurality of reject samples are pre-screened based on a group of reject samples by a pre-established predictive model;
the prediction model is built in the following way:
acquiring a first real-time feature set of a passing sample group, wherein the first real-time feature set comprises real-time features of each passing sample, and the real-time features are used for representing the post-credit performance of the corresponding passing samples;
fitting the first set of real-time features to labels of individual pass samples in the group of pass samples to build the predictive model;
pre-screening the rejected samples based on rejected sample groups by the pre-established prediction model, including
Obtaining a second real-time feature set of the rejected sample groups, wherein the second real-time feature set comprises real-time features of each rejected sample;
inputting the second real-time characteristics of each rejected sample in the rejected sample group into the prediction model to obtain a corresponding prediction result;
and screening the rejection samples in the rejection sample group according to the prediction result to obtain the rejection samples meeting preset conditions in the rejection sample group.
6. The risk assessment method of claim 5, wherein screening the rejected samples in the rejected sample group according to the prediction result to obtain rejected samples meeting a preset condition in the rejected sample group comprises:
sorting each rejection sample in the rejection sample group according to the prediction result;
obtaining a plurality of fifth rejection samples ranked in the front according to a third preset proportion from the sorted rejection sample group;
obtaining a plurality of sixth reject samples which are sorted according to a fourth preset proportion from the sorted reject sample group;
and taking the fifth rejection sample and the sixth rejection sample as the rejection samples meeting the preset condition.
7. The risk assessment method of claim 5, wherein after obtaining a reject sample meeting a predetermined condition in the reject sample group, the method further comprises:
obtaining a third backtracking feature of each rejected sample;
obtaining a fourth backtracking feature of each passing sample;
and pre-establishing a rejection deduction model for risk assessment according to the third backtracking characteristics and the corresponding labels of each rejection sample and the fourth backtracking characteristics and the corresponding labels of each passing sample.
8. A risk assessment device comprising:
the input and output module is used for acquiring a sample to be evaluated of a new application user;
the processing module is used for extracting a first backtracking characteristic of the sample to be evaluated, and comparing the first backtracking characteristic of the sample to be evaluated with a preset second backtracking characteristic of a passing sample group and a preset third backtracking characteristic of a first rejected sample group to obtain a risk evaluation result of the new application user;
and the input and output module is also used for outputting the risk assessment result of the new application user.
9. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any one of claims 1-7.
10. A computing device, comprising:
at least one processor, a memory, and an input-output unit;
wherein the memory is for storing a computer program and the processor is for calling the computer program stored in the memory to perform the method of any one of claims 1-7.
CN202111681270.XA 2021-12-31 2021-12-31 Risk assessment method, device and storage medium Pending CN114331696A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111681270.XA CN114331696A (en) 2021-12-31 2021-12-31 Risk assessment method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111681270.XA CN114331696A (en) 2021-12-31 2021-12-31 Risk assessment method, device and storage medium

Publications (1)

Publication Number Publication Date
CN114331696A true CN114331696A (en) 2022-04-12

Family

ID=81023223

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111681270.XA Pending CN114331696A (en) 2021-12-31 2021-12-31 Risk assessment method, device and storage medium

Country Status (1)

Country Link
CN (1) CN114331696A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659985A (en) * 2019-09-30 2020-01-07 上海淇玥信息技术有限公司 Method and device for fishing back false rejection potential user and electronic equipment
WO2020042290A1 (en) * 2018-08-28 2020-03-05 卫盈联信息技术(深圳)有限公司 Risk management method, and apparatus and computer-readable storage medium
CN111325228A (en) * 2018-12-17 2020-06-23 上海游昆信息技术有限公司 Model training method and device
CN111681102A (en) * 2020-06-05 2020-09-18 深圳市卡牛科技有限公司 Credit prediction method, apparatus, device and storage medium
CN111798310A (en) * 2020-07-22 2020-10-20 睿智合创(北京)科技有限公司 Rejection inference method based on Cox regression and logistic regression and electronic equipment
CN112269937A (en) * 2020-11-16 2021-01-26 加和(北京)信息科技有限公司 Method, system and device for calculating user similarity
CN112488817A (en) * 2020-10-21 2021-03-12 上海旻浦科技有限公司 Financial default risk assessment method and system based on refusal inference
CN112508580A (en) * 2021-02-03 2021-03-16 北京淇瑀信息科技有限公司 Model construction method and device based on rejection inference method and electronic equipment
CN112819527A (en) * 2021-01-29 2021-05-18 百果园技术(新加坡)有限公司 User grouping processing method and device
CN113298264A (en) * 2021-04-29 2021-08-24 上海淇玥信息技术有限公司 Equipment authentication method and system based on shallow self-learning algorithm rejection inference and electronic equipment
CN113313582A (en) * 2021-06-25 2021-08-27 上海冰鉴信息科技有限公司 Guest refusing and reflashing model training method and device and electronic equipment
CN113657440A (en) * 2021-07-08 2021-11-16 同盾科技有限公司 Rejection sample inference method and device based on user feature clustering

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020042290A1 (en) * 2018-08-28 2020-03-05 卫盈联信息技术(深圳)有限公司 Risk management method, and apparatus and computer-readable storage medium
CN111325228A (en) * 2018-12-17 2020-06-23 上海游昆信息技术有限公司 Model training method and device
CN110659985A (en) * 2019-09-30 2020-01-07 上海淇玥信息技术有限公司 Method and device for fishing back false rejection potential user and electronic equipment
CN111681102A (en) * 2020-06-05 2020-09-18 深圳市卡牛科技有限公司 Credit prediction method, apparatus, device and storage medium
CN111798310A (en) * 2020-07-22 2020-10-20 睿智合创(北京)科技有限公司 Rejection inference method based on Cox regression and logistic regression and electronic equipment
CN112488817A (en) * 2020-10-21 2021-03-12 上海旻浦科技有限公司 Financial default risk assessment method and system based on refusal inference
CN112269937A (en) * 2020-11-16 2021-01-26 加和(北京)信息科技有限公司 Method, system and device for calculating user similarity
CN112819527A (en) * 2021-01-29 2021-05-18 百果园技术(新加坡)有限公司 User grouping processing method and device
CN112508580A (en) * 2021-02-03 2021-03-16 北京淇瑀信息科技有限公司 Model construction method and device based on rejection inference method and electronic equipment
CN113298264A (en) * 2021-04-29 2021-08-24 上海淇玥信息技术有限公司 Equipment authentication method and system based on shallow self-learning algorithm rejection inference and electronic equipment
CN113313582A (en) * 2021-06-25 2021-08-27 上海冰鉴信息科技有限公司 Guest refusing and reflashing model training method and device and electronic equipment
CN113657440A (en) * 2021-07-08 2021-11-16 同盾科技有限公司 Rejection sample inference method and device based on user feature clustering

Similar Documents

Publication Publication Date Title
EP3985578A1 (en) Method and system for automatically training machine learning model
US10943186B2 (en) Machine learning model training method and device, and electronic device
CN110378786B (en) Model training method, default transmission risk identification method, device and storage medium
CN110111198A (en) User's financial risks predictor method, device, electronic equipment and readable medium
Hooman et al. Statistical and data mining methods in credit scoring
Van Thiel et al. Artificial intelligence credit risk prediction: An empirical study of analytical artificial intelligence tools for credit risk prediction in a digital era
CN111325248A (en) Method and system for reducing pre-loan business risk
CN112417176B (en) Method, equipment and medium for mining implicit association relation between enterprises based on graph characteristics
CN110930038A (en) Loan demand identification method, loan demand identification device, loan demand identification terminal and loan demand identification storage medium
Aphale et al. Predict loan approval in banking system machine learning approach for cooperative banks loan approval
CN110634060A (en) User credit risk assessment method, system, device and storage medium
CN111563187A (en) Relationship determination method, device and system and electronic equipment
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
Berrada et al. A review of Artificial Intelligence approach for credit risk assessment
CN115293336A (en) Risk assessment model training method and device and server
CN112750038B (en) Transaction risk determination method, device and server
CN114049204A (en) Suspicious transaction data entry method, device, computer equipment and computer-readable storage medium
CN117575773A (en) Method, device, computer equipment and storage medium for determining service data
CN112836742A (en) System resource adjusting method, device and equipment
CN117196630A (en) Transaction risk prediction method, device, terminal equipment and storage medium
CN111738824A (en) Method, device and system for screening financial data processing modes
CN114331696A (en) Risk assessment method, device and storage medium
Zang Construction of Mobile Internet Financial Risk Cautioning Framework Based on BP Neural Network
CN113724064A (en) Parameter determination method and device based on artificial intelligence and electronic equipment
CN113256351A (en) User service demand identification method and device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination