CN112767129A - Model training method, risk prediction method and device - Google Patents

Model training method, risk prediction method and device Download PDF

Info

Publication number
CN112767129A
CN112767129A CN202110085351.7A CN202110085351A CN112767129A CN 112767129 A CN112767129 A CN 112767129A CN 202110085351 A CN202110085351 A CN 202110085351A CN 112767129 A CN112767129 A CN 112767129A
Authority
CN
China
Prior art keywords
data
target
risk
customer
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110085351.7A
Other languages
Chinese (zh)
Inventor
王雪
黄昶君
陈惊雷
付荣辉
徐少迪
林晨
罗晔
太明珠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCB Finetech Co Ltd
Original Assignee
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCB Finetech Co Ltd filed Critical CCB Finetech Co Ltd
Priority to CN202110085351.7A priority Critical patent/CN112767129A/en
Publication of CN112767129A publication Critical patent/CN112767129A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Accounting & Taxation (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Finance (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • Technology Law (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Development Economics (AREA)
  • Evolutionary Biology (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a model training method, a risk prediction method and a risk prediction device, which are applied to the technical field of financial science and technology, wherein a plurality of sample data sets are determined based on the coincidence of client data of sample clients, and then microscopic risk models are respectively trained based on the sample data sets, so that all available dimensional data can be fully utilized, and the problem of data block or dimensional data missing in the model training process is solved; in addition, the method and the device improve the data utilization rate, evaluate the default risk of the client by using all data obtained by the client as far as possible, and can improve the accuracy of evaluation.

Description

Model training method, risk prediction method and device
Technical Field
The application relates to the technical field of financial science and technology, in particular to a model training method, a risk prediction method and a risk prediction device.
Background
Under a new financial form, the Internet and big data promote the upgrading of a consumption structure, promote scientific and technological innovation and financial innovation, promote the innovation strength of credit products, perfect a financial product system, and promote the rapid development of on-line products and off-line services, but also promote new credit management problems.
On one hand, different scenes or subdivision products of a bank often provide enterprise or personal information with different dimensions, but different data sources are closely related to regions, groups and periods where specific scenes are applicable, so that the problem of data loss is often faced when a risk model is built. In the second aspect, the traditional financial institution usually develops a risk assessment model for different subdivided products individually, or builds a unified risk assessment model by eliminating low coverage features.
However, the former has problems that for products with less traffic, the number of training samples is low, and the stability of the model constructed based on data driving is not good; the unified traditional scoring card model constructed for a plurality of service scenes in the latter needs the mold-entering characteristics to cover most of the customer groups, and cannot fully and effectively utilize the specific customer data of different service scenes.
Disclosure of Invention
The application provides a model training method, a risk prediction method and a risk prediction device, which are used for fully utilizing client data, improving the utilization rate of the data and improving the accuracy of risk prediction. The technical scheme adopted by the application is as follows:
in a first aspect, a model training method is provided, including:
the method comprises the steps of obtaining customer data and default information of a plurality of sample customers, wherein the customer data dimensions of different sample customers are not identical;
determining a plurality of sample data sets based on a coincidence of customer data of the sample customers;
and respectively training a microscopic risk model based on each sample data set to obtain a plurality of microscopic risk models.
Optionally, said determining a plurality of sample data sets based on the coincidence of the customer data of the sample customer comprises:
determining dimensions of data contained in the customer data based on the customer data of all the sample customers;
a set of sample data is determined based on any at least two data dimensions.
Optionally, determining a sample data set based on any at least two data dimensions includes:
traversing customer data of the sample customer;
and when the customer data of the sample customer comprises the data of the at least two data dimensions, adding the customer data of the sample customer and the corresponding default information to the sample data set corresponding to the at least two data dimensions.
Optionally, the training of the microscopic risk models based on the respective sample data sets to obtain a plurality of microscopic risk models respectively includes:
judging whether the client data in each sample data set meets a preset quantity threshold condition;
training a micro risk model based on sample data in a certain sample data set if the certain sample data set meets a predetermined number threshold condition.
Optionally, the method further comprises:
and training and determining the weight of each microscopic risk model based on the customer data and default information of the plurality of sample customers.
In a second aspect, a risk prediction method is provided, including:
acquiring client data of a target client;
determining at least one target micro risk model matching the target customer from a plurality of micro risk models trained according to the first aspect based on customer data of the target customer;
determining a target micro-breach risk for the target customer based on the determined at least one target micro-risk model and the customer data for the target customer.
Optionally, determining at least one target micro risk model matching the target customer from a plurality of micro risk models trained according to the first aspect based on customer data of the target customer comprises:
determining a data dimension contained by the target customer based on customer data of the target customer;
and determining a target micro risk model based on the data dimension contained in the target client and the data dimension of training sample data corresponding to the pre-trained micro risk model.
Optionally, the method further comprises:
and when the target micro risk models comprise a plurality of target micro risk models, determining the target micro default risk based on the risk prediction result of each target micro risk model and the corresponding weight of each target micro risk model.
Optionally, the method further comprises:
normalizing the weight of each target micro risk model to obtain the normalized weight of each target micro risk model;
the determining the microscopic target default risk based on the risk prediction result of each microscopic target risk model and the corresponding weight of each microscopic target risk model includes:
and determining the microscopic default risk of the target based on the risk prediction result of each target microscopic risk model and the normalized weight of each target microscopic risk model.
Optionally, the normalizing the weight of each target micro risk model to obtain the normalized weight of each target micro risk model includes:
obtaining the weighted value of each target micro risk model based on a formula
Figure RE-GDA0002984311590000041
Determining the weight value of each target micro model after normalization, wherein XkWeight representing target micro model, K representing number of target micro models, omegakAnd representing the weight value of each target micro risk model after normalization.
Optionally, characterized in that it comprises:
determining a target macroscopic default risk based on a pre-trained macroscopic risk model;
determining a target breach risk for the target customer based on the determined target macroscopic breach risk and the target microscopic breach risk.
Optionally, determining the target macroscopic default risk based on the pre-trained macroscopic risk model comprises:
acquiring current macro-Zhongguang economic index data;
and determining the target macro default risk through a pre-trained macro risk model based on the obtained current macro intermediate view economic index data.
In a third aspect, a model training apparatus is provided, which includes:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring customer data and default information of a plurality of sample customers, and the customer data dimensions of different sample customers are not completely the same;
a first determining module, configured to determine a plurality of sample data sets based on the coincidence of the client data of the sample clients acquired by the first acquiring module;
and the first training module is used for respectively training a microscopic risk model based on each sample data set determined by the first determining module to obtain a plurality of microscopic risk models.
Optionally, the first determining module includes:
a first determination unit configured to determine a dimension of data included in the customer data based on the customer data of all the sample customers;
and the second determination unit is used for determining a sample data set based on any at least two data dimensions.
Optionally, the second determining unit is specifically configured to traverse the customer data of the sample customer;
and when the customer data of the sample customer comprises the data of the at least two data dimensions, adding the customer data of the sample customer and the corresponding default information to the sample data set corresponding to the at least two data dimensions.
Optionally, the apparatus further comprises:
the judging module is used for judging whether the client data in each sample data set meets a preset quantity threshold condition;
the training module is specifically configured to train a micro risk model based on sample data in a certain sample data set if the certain sample data set meets a predetermined number threshold condition.
Optionally, the apparatus further comprises:
and the second training module is specifically used for training and determining the weight of each microscopic risk model based on the customer data and default information of the plurality of sample customers.
In a fourth aspect, a risk prediction apparatus is provided, the apparatus comprising:
the second acquisition module is used for acquiring the client data of the target client;
a second determination module for determining at least one target micro risk model matching the target customer from a plurality of micro risk models trained according to the first aspect based on customer data of the target customer;
a third determination module for determining a target micro-breach risk of the target customer based on the at least one target micro-risk model determined by the second determination module and the customer data of the target customer.
Optionally, the second determining module includes:
a third determining unit, configured to determine, based on the client data of the target client, a data dimension included in the target client;
and the fourth determining unit is used for determining the target micro risk model based on the data dimension contained in the target client and the data dimension of training sample data corresponding to the pre-trained micro risk model.
Optionally, the apparatus further comprises:
and the fourth determining module is used for determining the target micro default risk based on the risk prediction result of each target micro risk model and the weight of each corresponding target micro risk model when the target micro risk models comprise a plurality of models.
Optionally, the apparatus further comprises:
the normalization module is used for performing normalization processing on the weight of each target micro risk model to obtain the normalized weight of each target micro risk model;
and the fourth determining module is specifically used for determining the microscopic default risk of the target based on the risk prediction result of each target microscopic risk model and the normalized weight of each target microscopic risk model.
Optionally, the normalization module is specifically configured to obtain a weight value of each target micro risk model, and is based on a formula
Figure RE-GDA0002984311590000061
Determining the weight value of each target micro model after normalization, wherein XkWeight representing target micro model, K representing number of target micro models, omegakAnd representing the weight value of each target micro risk model after normalization.
Optionally, the apparatus further comprises:
a fifth determining module for determining a target macro default risk based on the pre-trained macro risk model;
a sixth determining giving module for determining a target breach risk of the target customer based on the determined target macroscopic breach risk and the target microscopic breach risk.
Optionally, the apparatus further comprises:
the third acquisition module is used for acquiring current macro-Zhongzhuang economic index data;
and the seventh determining module is used for determining the target macro default risk through a pre-trained macro risk model based on the acquired current macro mesoscopic economic index data.
In a fifth aspect, an electronic device is provided, which includes:
one or more processors;
a memory;
one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: the method of the first and/or second aspect is performed.
In a sixth aspect, a computer-readable storage medium is provided for storing computer instructions that, when executed on a computer, cause the computer to perform the method of the first aspect and/or the second aspect.
The application provides a model training method, a risk prediction method and a risk prediction device, and the customer data dimensionality of different sample customers is not completely the same by acquiring the customer data and default information of a plurality of sample customers; determining a plurality of sample data sets based on a coincidence of customer data of the sample customers; and respectively training a microscopic risk model based on each sample data set to obtain a plurality of microscopic risk models. According to the method and the device, a plurality of sample data sets are determined based on the coincidence of the client data of the sample client, and the micro risk model is trained respectively based on each sample data set, so that all available dimensional data can be fully utilized, and the problem of block data or dimensional data missing in the model training process is solved. In addition, the method and the device improve the data utilization rate, evaluate the default risk of the customer by using all data obtained by the customer as far as possible, and improve the accuracy of evaluation.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic flow chart of a model training method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart illustrating a risk prediction method according to an embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of a model training apparatus according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a risk prediction device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 6 is a sample illustration of debt items according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Example one
An embodiment of the present application provides a model training method, as shown in fig. 1, the method may include the following steps:
step S101, customer data and default information of a plurality of sample customers are obtained, and customer data dimensions of different sample customers are not completely the same.
Specifically, the sample client may be an individual client or an enterprise client, and the corresponding client data may be basic information of the client, such as name, enterprise name, transaction information, business information, enterprise historical settlement data, enterprise credit investigation data, enterprise tax data, enterprise real controller settlement data, enterprise real controller credit investigation data, enterprise real controller consumption data, and the like; the data dimension can be different in data type or different in data source, such as consumption data of a user or an enterprise or business data acquired through a certain platform. Wherein the default information is whether the customer defaults.
The client data of the acquired sample client can be cleaned based on the historical sample data distribution, the range of abnormal values can be determined, and missing values, abnormal values and standardization processing can be performed.
Step S102, determining a plurality of sample data sets based on the coincidence of the customer data of the sample customers.
Illustratively, sample customer A includes four dimensions of ABCD data, sample customer B includes four dimensions of ABDE data, and sample customer C includes four dimensions of ABCE data. The ethylene-propylene-methyl respectively comprises data of the two dimensions AB, and a sample data set can be determined based on the two dimensions AB; the A and B comprise data of three dimensions of ABD, and a sample data set can be determined based on the ABD; the ethylene-propylene contains data of three dimensions of ABE, and a sample data set can be determined based on the ABE; based on the method, the related client data can be respectively added to the corresponding sample data sets according to the coincidence among the client data of other sample clients, so that a plurality of sample data sets are obtained.
Specifically, data of each dimension or data of multiple dimensions can be regarded as one data block, and multiple sample data sets are determined based on the coincidence of the data blocks. Specifically, data of multiple related dimensions can be regarded as one data block based on the relevance of the data dimensions. Specifically, for each data module, features deriving different dimensions based on the data can be respectively constructed; for each data module, the prediction capability of the variable can be detected according to historical sample data covered by each feature, the variable with poor prediction capability is deleted from a full-quantity variable list, and a new variable set is constructed to be used as a final model. In particular, features with certain predictive capabilities can be automatically screened through the Lasso algorithm.
Illustratively, as shown in fig. 6, counting debt item samples (i.e. customer samples), determining three data blocks including GS1, GS2 and GS3, determining a debt item sample 1 (i.e. sample customer 1) including data blocks GS1 and GS2, determining a debt item sample 2 (i.e. sample customer 2) including data blocks GS1, GS2 and GS3, determining a sample data set based on data coincidence between customer data, i.e. coincidence between databases, and determining a sample data set based on databases GS1 and GS2 for the data blocks included by the sample customer 1 and the sample customer 2, wherein GS1 and GS2 may be data blocks included by the sample customer 1 and the sample customer 2; for sample client 2 and sample client L, GS2, GS3 are data chunks that both contain, one sample data set may be determined based on the databases GS1, GS2,
and S103, respectively training a microscopic risk model based on each sample data set to obtain a plurality of microscopic risk models.
Specifically, the micro risk model can be trained respectively based on each sample data set through different algorithms such as logistic regression and random forest. Preferably, the microscopic risk model may be trained by the LightGBM algorithm.
The embodiment of the present application provides a possible implementation manner, and specifically, determining a plurality of sample data sets based on the coincidence of the client data of the sample clients includes:
determining dimensions of data contained in the customer data based on the customer data of all the sample customers;
a set of sample data is determined based on any at least two data dimensions.
Illustratively, if a client A comprises data with two dimensions AB, a client B comprises data with two dimensions AC, and a client C comprises data with three dimensions ABC, determining the dimensions of the data contained in the client data as the three dimensions ABC; wherein, four sample number sets of AB/AC/BC/ABC can be determined.
The embodiment of the present application provides a possible implementation manner, and specifically, determining a sample data set based on any at least two data dimensions includes:
traversing customer data of the sample customer;
and when the customer data of the sample customer comprises the data of the at least two data dimensions, adding the customer data of the sample customer and the corresponding default information to the sample data set corresponding to the at least two data dimensions.
The embodiment of the present application provides a possible implementation manner, and specifically, the training of the microscopic risk models based on each sample data set respectively to obtain a plurality of microscopic risk models includes:
judging whether the client data in each sample data set meets a preset quantity threshold condition;
training a micro risk model based on sample data in a certain sample data set if the certain sample data set meets a predetermined number threshold condition.
Specifically, if the customer data in the sample data set meets a predetermined threshold condition, such as greater than 50, the micro risk model is trained based on the sample data in the sample data set. In particular, the number of violations by a customer in the set of sample data for training the micro risk model may also be limited to be greater than a predetermined threshold.
Optionally, the method further comprises:
and training and determining the weight of each microscopic risk model based on the customer data and default information of the plurality of sample customers. Specifically, the weights of the individual microscopic risk models may be determined by a corresponding deep learning algorithm training. Specifically, the weights that optimize each of the microscopic risk models may be trained with the goal of maximizing the model prediction performance under the historical samples.
The application provides a model training method, which is characterized in that customer data dimensions of different sample customers are not completely the same by acquiring customer data and default information of a plurality of sample customers; determining a plurality of sample data sets based on a coincidence of customer data of the sample customers; and respectively training a microscopic risk model based on each sample data set to obtain a plurality of microscopic risk models. According to the method and the device, a plurality of sample data sets are determined based on the coincidence of the client data of the sample client, and the micro risk model is trained respectively based on each sample data set, so that all available dimensional data can be fully utilized, and the problem of block data or dimensional data missing in the model training process is solved. In addition, the method and the device improve the data utilization rate, evaluate the default risk of the customer by using all data obtained by the customer as far as possible, and improve the accuracy of evaluation.
Example two
As shown in fig. 2, an embodiment of the present application provides a risk prediction method, including:
step S201, obtaining client data of a target client;
step S202, determining at least one target micro risk model matching the target customer from a plurality of micro risk models trained according to the first aspect based on customer data of the target customer;
in particular, determining at least one target micro risk model matching the target client from a plurality of micro risk models trained according to the first aspect based on client data of the target client comprises:
determining a data dimension contained by the target customer based on customer data of the target customer;
and determining a target micro risk model based on the data dimension contained in the target client and the data dimension of training sample data corresponding to the pre-trained micro risk model.
Illustratively, the data of the target customer comprises two dimensions AB, the data dimension of the training micro risk model 1 is ABC, the data dimension of the training micro risk model 2 is AC, the data dimension of the training micro risk model 3 is AB, and the data dimension of the training micro risk model 4 is BC, and then the micro risk model matched with the target customer is determined to be the micro risk model 3.
Illustratively, the data of the target customer includes three dimensions ABC, the data dimension of the training micro risk model 1 is ABC, the data dimension of the training micro risk model 2 is AC, the data dimension of the training micro risk model 3 is AB, the data dimension of the training micro risk model 4 is BC, and the data dimension of the training micro risk model 5 is ABCDE, and then the micro risk models matched with the target customer are determined to be the micro risk model 1, the micro risk model 2, the micro risk model 3, and the micro risk model 4.
Step S203, determining the target micro default risk of the target customer based on the determined at least one target micro risk model and the customer data of the target customer.
Specifically, when the target micro risk models include a plurality of target micro risk models, determining the target micro default risk based on the risk prediction result of each target micro risk model and the corresponding weight of each target micro risk model; specifically, the weight may be determined manually through a priori knowledge, or may be obtained through training, such as training through a deep learning method.
The embodiment of the present application provides a possible implementation manner, and further, if each target micro risk model is obtained through training, the method further includes:
normalizing the weight of each target micro risk model to obtain the normalized weight of each target micro risk model;
the determining the microscopic target default risk based on the risk prediction result of each microscopic target risk model and the corresponding weight of each microscopic target risk model includes:
and determining the microscopic default risk of the target based on the risk prediction result of each target microscopic risk model and the normalized weight of each target microscopic risk model.
Optionally, the normalizing the weight of each target micro risk model to obtain the normalized weight of each target micro risk model includes:
obtaining the weighted value of each target micro risk model based on a formula
Figure RE-GDA0002984311590000141
Determining the weight value of each target micro model after normalization, wherein XkWeight representing target micro model, K representing number of target micro models, omegakAnd representing the weight value of each target micro risk model after normalization.
The embodiment of the present application provides a possible implementation manner, and further, the method further includes:
determining a target macroscopic default risk based on a pre-trained macroscopic risk model;
determining a target breach risk for the target customer based on the determined target macroscopic breach risk and the target microscopic breach risk.
For the embodiment of the application, the default risk of the target customer is determined by integrating the micro risk model and the macro risk model, and the accuracy and the stability of prediction can be improved.
Optionally, determining the target macroscopic default risk based on the pre-trained macroscopic risk model comprises:
acquiring current macro-Zhongguang economic index data; specifically, the macro mesoscopic index may be a GDP index, a CPI index of each province, a PPI parity, an industry added value parity, a power consumption parity, a currency exchange rate of renminbi to dollars, and the like. In particular, macro mesoscopic indicators may be screened based on the predictability of each macro mesoscopic indicator.
And determining the target macro default risk through a pre-trained macro risk model based on the obtained current macro intermediate view economic index data.
For the embodiment of the application, the model corresponding to the data dimension is determined based on the customer data of the target customer, and the default risk of the customer is evaluated by using all data obtained by the customer as much as possible, so that the data utilization rate is improved, and the evaluation accuracy is improved.
EXAMPLE III
The embodiment of the present application provides a model training device, and the device 30 includes:
the first obtaining module 301 is configured to obtain customer data and default information of a plurality of sample customers, where customer data dimensions of different sample customers are not completely the same;
a first determining module 302, configured to determine a plurality of sample data sets based on the coincidence of the client data of the sample clients acquired by the first acquiring module;
a first training module 303, configured to respectively train a microscopic risk model based on each sample data set determined by the first determining module, so as to obtain multiple microscopic risk models.
Optionally, the first determining module includes:
a first determination unit configured to determine a dimension of data included in the customer data based on the customer data of all the sample customers;
and the second determination unit is used for determining a sample data set based on any at least two data dimensions.
Optionally, the second determining unit is specifically configured to traverse the customer data of the sample customer;
and when the customer data of the sample customer comprises the data of the at least two data dimensions, adding the customer data of the sample customer and the corresponding default information to the sample data set corresponding to the at least two data dimensions.
Optionally, the apparatus further comprises:
the judging module is used for judging whether the client data in each sample data set meets a preset quantity threshold condition;
the training module is specifically configured to train a micro risk model based on sample data in a certain sample data set if the certain sample data set meets a predetermined number threshold condition.
Optionally, the apparatus further comprises:
and the second training module is specifically used for training and determining the weight of each microscopic risk model based on the customer data and default information of the plurality of sample customers.
The apparatus of the embodiment of the present application can execute the method of the first embodiment, and the beneficial effects of the embodiment of the present application are the same as those of the first embodiment, which are not described herein again.
Example four
The embodiment of the present application provides a risk prediction apparatus, where the apparatus 40 includes:
a second obtaining module 401, configured to obtain client data of a target client;
a second determining module 402 for determining at least one target micro risk model matching the target customer from a plurality of micro risk models trained according to the first aspect based on customer data of the target customer;
a third determining module 403, configured to determine a target micro-breach risk of the target customer based on the at least one target micro-risk model determined by the second determining module and the customer data of the target customer.
Optionally, the second determining module includes:
a third determining unit, configured to determine, based on the client data of the target client, a data dimension included in the target client;
and the fourth determining unit is used for determining the target micro risk model based on the data dimension contained in the target client and the data dimension of training sample data corresponding to the pre-trained micro risk model.
Optionally, the apparatus further comprises:
and the fourth determining module is used for determining the target micro default risk based on the risk prediction result of each target micro risk model and the weight of each corresponding target micro risk model when the target micro risk models comprise a plurality of models.
Optionally, the apparatus further comprises:
the normalization module is used for performing normalization processing on the weight of each target micro risk model to obtain the normalized weight of each target micro risk model;
and the fourth determining module is specifically used for determining the microscopic default risk of the target based on the risk prediction result of each target microscopic risk model and the normalized weight of each target microscopic risk model.
Optionally, the normalization module is specifically configured to obtain a weight value of each target micro risk model, and is based on a formula
Figure RE-GDA0002984311590000171
Determining the weight value of each target micro model after normalization, wherein XkWeight representing target micro model, K representing number of target micro models, omegakAnd representing the weight value of each target micro risk model after normalization.
Optionally, the apparatus further comprises:
a fifth determining module for determining a target macro default risk based on the pre-trained macro risk model;
a sixth determining giving module for determining a target breach risk of the target customer based on the determined target macroscopic breach risk and the target microscopic breach risk.
Optionally, the apparatus further comprises:
the third acquisition module is used for acquiring current macro-Zhongzhuang economic index data;
and the seventh determining module is used for determining the target macro default risk through a pre-trained macro risk model based on the acquired current macro mesoscopic economic index data.
The beneficial effects of the embodiment of the present application are the same as those of the second embodiment, and are not described herein again.
EXAMPLE five
An embodiment of the present application provides an electronic device, as shown in fig. 5, an electronic device 50 shown in fig. 5 includes: a processor 501 and a memory 505. Wherein the processor 501 is coupled to the memory 503, such as via the bus 502. Further, the electronic device 50 may also include a transceiver 503. It should be noted that the transceiver 504 is not limited to one in practical application, and the structure of the electronic device 50 is not limited to the embodiment of the present application. The processor 501 is applied in the embodiment of the present application, and is used to implement the functions of the modules shown in fig. 3 or 4. The transceiver 504 includes a receiver and a transmitter.
The processor 501 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 501 may also be a combination of implementing computing functionality, e.g., comprising one or more microprocessors, a combination of DSPs and microprocessors, and the like.
Bus 502 may include a path that transfers information between the above components. The bus 502 may be a PCI bus or an EISA bus, etc. The bus 502 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
Memory 505 may be, but is not limited to, ROM or other type of static storage device that can store static information and instructions, RAM or other type of dynamic storage device that can store information and instructions, EEPROM, CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 505 is used for storing application program codes for executing the scheme of the application, and the execution is controlled by the processor 501. The processor 501 is configured to execute application program codes stored in the memory 505 to implement the functions of the apparatus provided by the embodiment shown in fig. 3 or fig. 4.
The embodiment of the application provides electronic equipment, and a plurality of sample data sets are determined based on the coincidence of client data of a sample client, and a microscopic risk model is respectively trained based on each sample data set, so that all available dimensional data can be fully utilized, and the problem of data partitioning or dimensional data missing in the model training process is solved. In addition, the method and the device improve the data utilization rate, evaluate the default risk of the customer by using all data obtained by the customer as far as possible, and improve the accuracy of evaluation.
The embodiment of the application provides an electronic device suitable for the method embodiment. And will not be described in detail herein.
EXAMPLE six
The present application provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method shown in the above embodiments is implemented.
The embodiment of the application provides a computer-readable storage medium, a plurality of sample data sets are determined based on the coincidence of client data of a sample client, and a micro risk model is respectively trained based on each sample data set, so that all available dimensional data can be fully utilized, and the problem of data block or dimensional data missing in the model training process is solved. In addition, the method and the device improve the data utilization rate, evaluate the default risk of the customer by using all data obtained by the customer as far as possible, and improve the accuracy of evaluation.
The embodiment of the application provides a computer-readable storage medium which is suitable for the method embodiment. And will not be described in detail herein.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims (16)

1. A method of model training, comprising:
the method comprises the steps of obtaining customer data and default information of a plurality of sample customers, wherein the customer data dimensions of different sample customers are not identical;
determining a plurality of sample data sets based on a coincidence of customer data of the sample customers;
and respectively training a microscopic risk model based on each sample data set to obtain a plurality of microscopic risk models.
2. The method of claim 1, wherein determining a plurality of sample data sets based on a coincidence of customer data of the sample customers comprises:
determining dimensions of data contained in the customer data based on the customer data of all the sample customers;
a set of sample data is determined based on any at least two data dimensions.
3. The method of claim 1, wherein determining a set of sample data based on any at least two data dimensions comprises:
traversing customer data of the sample customer;
and when the customer data of the sample customer comprises the data of the at least two data dimensions, adding the customer data of the sample customer and the corresponding default information to the sample data set corresponding to the at least two data dimensions.
4. The method of claim 1, wherein said training a microscopic risk model based on each of said sample data sets, respectively, results in a plurality of microscopic risk models, previously comprising:
judging whether the client data in each sample data set meets a preset quantity threshold condition;
training a micro risk model based on sample data in a certain sample data set if the certain sample data set meets a predetermined number threshold condition.
5. The method of claims 1-4, further comprising:
and training and determining the weight of each microscopic risk model based on the customer data and default information of the plurality of sample customers.
6. A method of risk prediction, comprising:
acquiring client data of a target client;
determining at least one target micro risk model matching the target customer from a plurality of micro risk models trained according to any one of claims 1-5 based on customer data of the target customer;
determining a target micro-breach risk for the target customer based on the determined at least one target micro-risk model and the customer data for the target customer.
7. The method of claim 6, wherein determining at least one target micro risk model matching the target customer from the plurality of micro risk models trained according to any of claims 1-5 based on the customer data of the target customer comprises:
determining a data dimension contained by the target customer based on customer data of the target customer;
and determining a target micro risk model based on the data dimension contained in the target client and the data dimension of training sample data corresponding to the pre-trained micro risk model.
8. The method of claim 6, further comprising:
and when the target micro risk models comprise a plurality of target micro risk models, determining the target micro default risk based on the risk prediction result of each target micro risk model and the corresponding weight of each target micro risk model.
9. The method of claim 8, further comprising:
normalizing the weight of each target micro risk model to obtain the normalized weight of each target micro risk model;
the determining the microscopic target default risk based on the risk prediction result of each microscopic target risk model and the corresponding weight of each microscopic target risk model includes:
and determining the microscopic default risk of the target based on the risk prediction result of each target microscopic risk model and the normalized weight of each target microscopic risk model.
10. The method of claim 9, wherein the normalizing the weights of the target micro risk models to obtain the normalized weights of the target micro risk models comprises:
obtaining the weighted value of each target micro risk model based on a formula
Figure FDA0002910575070000031
Determining the weight value of each target micro model after normalization, wherein XkWeight representing target micro model, K representing number of target micro models, omegakAnd representing the weight value of each target micro risk model after normalization.
11. The method according to any one of claims 6-10, comprising:
determining a target macroscopic default risk based on a pre-trained macroscopic risk model;
determining a target breach risk for the target customer based on the determined target macroscopic breach risk and the target microscopic breach risk.
12. The method of claim 11, wherein determining a target macroscopic breach risk based on a pre-trained macroscopic risk model comprises:
acquiring current macro-Zhongguang economic index data;
and determining the target macro default risk through a pre-trained macro risk model based on the obtained current macro intermediate view economic index data.
13. A model training apparatus, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring customer data and default information of a plurality of sample customers, and the customer data dimensions of different sample customers are not completely the same;
a first determining module, configured to determine a plurality of sample data sets based on the coincidence of the client data of the sample clients acquired by the first acquiring module;
and the first training module is used for respectively training a microscopic risk model based on each sample data set determined by the first determining module to obtain a plurality of microscopic risk models.
14. A risk prediction device, comprising:
the second acquisition module is used for acquiring the client data of the target client;
a second determination module for determining at least one target micro risk model matching the target customer from a plurality of micro risk models trained according to any one of claims 1-5 based on customer data of the target customer;
a third determination module for determining a target micro-breach risk of the target customer based on the at least one target micro-risk model determined by the second determination module and the customer data of the target customer.
15. An electronic device, comprising:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing the method according to any one of claims 1 to 12.
16. A computer-readable storage medium for storing computer instructions which, when executed on a computer, cause the computer to perform the method of any of claims 1 to 12.
CN202110085351.7A 2021-01-22 2021-01-22 Model training method, risk prediction method and device Pending CN112767129A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110085351.7A CN112767129A (en) 2021-01-22 2021-01-22 Model training method, risk prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110085351.7A CN112767129A (en) 2021-01-22 2021-01-22 Model training method, risk prediction method and device

Publications (1)

Publication Number Publication Date
CN112767129A true CN112767129A (en) 2021-05-07

Family

ID=75702599

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110085351.7A Pending CN112767129A (en) 2021-01-22 2021-01-22 Model training method, risk prediction method and device

Country Status (1)

Country Link
CN (1) CN112767129A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165683A (en) * 2018-08-10 2019-01-08 深圳前海微众银行股份有限公司 Sample predictions method, apparatus and storage medium based on federation's training
CN110349038A (en) * 2019-06-13 2019-10-18 中国平安人寿保险股份有限公司 Risk evaluation model training method and methods of risk assessment
WO2020037942A1 (en) * 2018-08-20 2020-02-27 平安科技(深圳)有限公司 Risk prediction processing method and apparatus, computer device and medium
CN110929886A (en) * 2019-12-06 2020-03-27 支付宝(杭州)信息技术有限公司 Model training and predicting method and system
CN110955915A (en) * 2019-12-14 2020-04-03 支付宝(杭州)信息技术有限公司 Method and device for processing private data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165683A (en) * 2018-08-10 2019-01-08 深圳前海微众银行股份有限公司 Sample predictions method, apparatus and storage medium based on federation's training
WO2020037942A1 (en) * 2018-08-20 2020-02-27 平安科技(深圳)有限公司 Risk prediction processing method and apparatus, computer device and medium
CN110349038A (en) * 2019-06-13 2019-10-18 中国平安人寿保险股份有限公司 Risk evaluation model training method and methods of risk assessment
CN110929886A (en) * 2019-12-06 2020-03-27 支付宝(杭州)信息技术有限公司 Model training and predicting method and system
CN110955915A (en) * 2019-12-14 2020-04-03 支付宝(杭州)信息技术有限公司 Method and device for processing private data

Similar Documents

Publication Publication Date Title
WO2007106786A2 (en) Methods and systems for multi-credit reporting agency data modeling
CN107886241B (en) Resource analysis method, device, medium, and electronic apparatus
CN110717687A (en) Evaluation index acquisition method and system
US20140379310A1 (en) Methods and Systems for Evaluating Predictive Models
CN110428139A (en) The information forecasting method and device propagated based on label
CN109800138B (en) CPU testing method, electronic device and storage medium
CN111339157A (en) Method, system and equipment for calculating and predicting daily operation efficiency of power distribution network
Hayden Estimation of a rating model for corporate exposures
CN109255368B (en) Method, device, electronic equipment and storage medium for randomly selecting characteristics
CN117993887A (en) Intelligent decision method, system and medium based on optimization control
CN110796381B (en) Modeling method and device for wind control model, terminal equipment and medium
CN116757476A (en) Method and device for constructing risk prediction model and method and device for risk prevention and control
CN116468547A (en) Credit card resource allocation method and system based on data mining
CN112767129A (en) Model training method, risk prediction method and device
CN115601042A (en) Information identification method and device, electronic equipment and storage medium
EP3493082A1 (en) A method of exploring databases of time-stamped data in order to discover dependencies between the data and predict future trends
CN115358894A (en) Intellectual property life cycle trusteeship management method, device, equipment and medium
CN113849618A (en) Strategy determination method and device based on knowledge graph, electronic equipment and medium
CN114781937A (en) Method and device for pre-paid card enterprise risk early warning and storage medium
CN114936714A (en) Vehicle replacement prediction model training method, prediction method, device, medium and equipment
CN114862618A (en) Artificial intelligence-based urban water consumption prediction method, device, equipment and medium
CN114707733A (en) Risk indicator prediction method and device, electronic equipment and storage medium
Deng et al. Financial futures prediction using fuzzy rough set and synthetic minority oversampling technique
Kubenka et al. Implementation of standards into predictors of financial stability
CN115293266A (en) Credit rating method, device, electronic equipment and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination