CN113724060A - Credit risk assessment method and system - Google Patents

Credit risk assessment method and system Download PDF

Info

Publication number
CN113724060A
CN113724060A CN202110245073.7A CN202110245073A CN113724060A CN 113724060 A CN113724060 A CN 113724060A CN 202110245073 A CN202110245073 A CN 202110245073A CN 113724060 A CN113724060 A CN 113724060A
Authority
CN
China
Prior art keywords
data
credit risk
projection matrix
classifier
risk assessment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110245073.7A
Other languages
Chinese (zh)
Inventor
陈秀华
宫辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Haoxiang Basic Software Research Institute Co ltd
Nanjing University of Science and Technology
Original Assignee
Nanjing Haoxiang Basic Software Research Institute Co ltd
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Haoxiang Basic Software Research Institute Co ltd, Nanjing University of Science and Technology filed Critical Nanjing Haoxiang Basic Software Research Institute Co ltd
Priority to CN202110245073.7A priority Critical patent/CN113724060A/en
Publication of CN113724060A publication Critical patent/CN113724060A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Technology Law (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a credit risk assessment method and a credit risk assessment system. The method comprises the following steps: acquiring credit risk assessment data and a current projection matrix; determining a classifier by taking the minimum misclassification experience risk as a target according to the credit risk evaluation data and the current projection matrix; classifying the non-label credit risk data by adopting a classifier, and distributing a pseudo label to the non-label sample data to obtain pseudo label data; performing linear discriminant analysis on the pseudo label data and the positive sample data to obtain an updated projection matrix; if the iteration end condition is met, outputting a classifier and an updated projection matrix; and performing credit risk assessment on the credit risk assessment data according to the classifier and the updated projection matrix to obtain a credit risk assessment result. By adopting the method and the system, a robust classifier is favorably constructed by introducing linear discriminant analysis, and the credit risk assessment effect is improved.

Description

Credit risk assessment method and system
Technical Field
The invention relates to the technical field of credit risk assessment, in particular to a credit risk assessment method and a credit risk assessment system.
Background
In the field of machine learning, the classification task is a very fundamental piece of research. Typically, the data sets in a binary task contain both positively labeled exemplars and negatively labeled exemplars. However, in reality labels for negative examples are often difficult to obtain, e.g., in credit risk assessment, bad credits may be unambiguously considered as positive examples, while unevaluated credit risk data is not necessarily a negative example (i.e., good credits). In recent years, credit card fraudulent transactions have been growing at an unprecedented rate and have become a major problem in the financial sector. As a result of these fraudulent activities, significant losses are incurred by both the merchant and the financial institution. Therefore, credit risk assessment is an indispensable link in credit loan approval for the financial department.
Most of the existing credit risk assessment methods are based on a supervision mechanism, and the reality situation of credit risk assessment is not completely met. Although the existing credit risk assessment method can obtain a better assessment classification effect, the current credit risk assessment method has the problems that the current credit risk assessment method is difficult to acquire negative samples, expensive in acquisition cost and the like in the current life, and the current credit risk assessment method is not separable, so that great difficulty is brought to the establishment of a robust classifier. Therefore, how to improve the credit risk assessment effect is a problem to be solved urgently.
Disclosure of Invention
The invention aims to provide a credit risk assessment method and a credit risk assessment system, which are beneficial to constructing a robust classifier by introducing linear discriminant analysis and improve the credit risk assessment effect.
In order to achieve the purpose, the invention provides the following scheme:
a credit risk assessment method, comprising:
acquiring credit risk assessment data and a current projection matrix; the credit risk assessment data comprises single-class credit risk data and unlabeled credit risk data; the single class credit risk data comprises a plurality of positive sample data, and the unlabeled credit risk data comprises a plurality of unlabeled sample data; the current projection matrix is obtained by performing linear discriminant analysis on the credit risk assessment data;
determining a classifier according to the credit risk assessment data and the current projection matrix by taking the minimized misclassification experience risk as a target;
classifying the unlabeled credit risk data by adopting the classifier, and distributing pseudo labels to the unlabeled sample data to obtain pseudo label data;
performing linear discriminant analysis on the pseudo label data and the positive sample data to obtain an updated projection matrix;
judging whether an iteration end condition is met; if yes, outputting the classifier and the updated projection matrix; if not, taking the updated projection matrix as a current projection matrix, and then returning to the step of determining a classifier by taking the minimum misclassification risk as a target according to the credit risk assessment data and the current projection matrix;
and performing credit risk assessment on the credit risk assessment data according to the classifier and the updated projection matrix to obtain a credit risk assessment result.
Optionally, after acquiring the credit risk assessment data, further comprising:
and carrying out normalization processing on the credit risk data to obtain normalized credit risk evaluation data.
Optionally, the determining a classifier based on the credit risk assessment data and the current projection matrix with a goal of minimizing the misclassification experience risk specifically includes:
determining a classifier according to the credit risk assessment data and the current projection matrix by adopting the following formula:
Figure BDA0002963799140000021
in the formula,
Figure BDA0002963799140000022
for misclassification experience risk, f is the classifier, f (-) is the classifier output result, pi is the prior probability of the positive class,
Figure BDA0002963799140000023
in order to be the positive sample data,
Figure BDA0002963799140000024
for unlabeled sample data, l (-) is a loss function, λ is a trade-off parameter, npIs the number of positive samples, nuThe number of unlabeled samples, i is the number, and R is the projection matrix.
Optionally, the performing linear discriminant analysis on the pseudo tag data and the positive sample data to obtain an updated projection matrix specifically includes:
performing linear discriminant analysis on the pseudo label data and the positive sample data by adopting the following formula to obtain an updated projection matrix:
Figure BDA0002963799140000031
wherein,
sb=(μpn)(μpn)T
Figure BDA0002963799140000032
wherein R is a projection matrix, SbIs the degree of divergence in class, SwIs interplass divergence, mupIs the mean vector of the positive sample data, μnIs the mean vector of the negative sample data, X is the sample, XpIs a positive sample set, XnIs a negative sample set; the positive sample set is data with credit risk, and the negative sample set is data without credit risk.
Optionally, the performing credit risk assessment on the credit risk assessment data according to the classifier and the updated projection matrix to obtain a credit risk assessment result specifically includes:
and according to the updated projection matrix and the credit risk assessment data, performing credit risk classification by using the classifier to obtain a credit risk classification result.
A credit risk assessment system, comprising:
the acquisition module is used for acquiring credit risk assessment data and a current projection matrix; the credit risk assessment data comprises single-class credit risk data and unlabeled credit risk data; the single class credit risk data comprises a plurality of positive sample data, and the unlabeled credit risk data comprises a plurality of unlabeled sample data; the current projection matrix is obtained by performing linear discriminant analysis on the credit risk assessment data;
a classifier determining module, configured to determine a classifier based on the credit risk assessment data and the current projection matrix with a goal of minimizing a misclassification experience risk;
a pseudo label data generating module, configured to classify the non-label credit risk data by using the classifier, and allocate a pseudo label to the non-label sample data to obtain pseudo label data;
the linear discriminant analysis module is used for performing linear discriminant analysis on the pseudo label data and the positive sample data to obtain an updated projection matrix;
the judging module is used for judging whether the iteration ending condition is met or not; if yes, executing an output module; if not, executing an updating module;
the updating module is used for taking the updated projection matrix as a current projection matrix and then executing the classifier determining module;
an output module for outputting the classifier and the updated projection matrix;
and the credit risk evaluation module is used for performing credit risk evaluation on the credit risk evaluation data according to the classifier and the updated projection matrix to obtain a credit risk evaluation result.
Optionally, the method further comprises:
and the processing module is used for carrying out normalization processing on the credit risk data to obtain normalized credit risk evaluation data.
Optionally, the classifier determining module specifically includes:
a classifier determining unit, configured to determine a classifier according to the credit risk assessment data and the current projection matrix by using the following formula:
Figure BDA0002963799140000041
in the formula,
Figure BDA0002963799140000042
for misclassification experience risk, f is the classifier, f (-) is the classifier output result, pi is the prior probability of the positive class,
Figure BDA0002963799140000043
in order to be the positive sample data,
Figure BDA0002963799140000044
for unlabeled sample data, l (-) is a loss function, λ is a trade-off parameter, npIs the number of positive samples, nuThe number of unlabeled samples, i is the number, and R is the projection matrix.
Optionally, the linear discriminant analysis module specifically includes:
a linear discriminant analysis unit, configured to perform linear discriminant analysis on the pseudo tag data and the positive sample data by using the following formula, so as to obtain an updated projection matrix:
Figure BDA0002963799140000045
wherein,
Sb=(μpn)(μpn)T
Figure BDA0002963799140000046
in the formula,r is a projection matrix, SbIs the degree of divergence in class, SwIs interplass divergence, mupIs the mean vector of the positive sample data, μnIs the mean vector of the negative sample data, X is the sample, XpIs a positive sample set, XnIs a negative sample set; the positive sample set is data with credit risk, and the negative sample set is data without credit risk.
Optionally, the credit risk assessment module specifically includes:
and the credit risk evaluation unit is used for carrying out credit risk classification by adopting the classifier according to the updated projection matrix and the credit risk evaluation data to obtain a credit risk classification result.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a credit risk assessment method and a credit risk assessment system, which are used for acquiring credit risk assessment data and a current projection matrix; determining a classifier by taking the minimum misclassification experience risk as a target according to the credit risk evaluation data and the current projection matrix; classifying the non-label credit risk data by adopting a classifier, and distributing a pseudo label to the non-label sample data to obtain pseudo label data; performing linear discriminant analysis on the pseudo label data and the positive sample data to obtain an updated projection matrix; if the iteration end condition is met, outputting a classifier and an updated projection matrix; and performing credit risk assessment on the credit risk assessment data according to the classifier and the updated projection matrix to obtain a credit risk assessment result. The method greatly reduces the sample marking cost, is closer to the situation that the risk assessment for the traditional Chinese medicine lacks negative sample data, simultaneously considers the distribution situation of the data, utilizes linear discriminant analysis to increase the discriminability of the data, is more favorable for constructing a robust classifier, directly utilizes the single-class credit risk data and the non-label credit risk data to evaluate, and has accurate classification and stable effect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flowchart of a credit risk assessment method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a credit risk assessment system according to an embodiment of the present invention;
FIG. 3 is a graph comparing the effects of the embodiments of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a credit risk assessment method and a credit risk assessment system, which are beneficial to constructing a robust classifier by introducing linear discriminant analysis and improve the credit risk assessment effect.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Examples
Fig. 1 is a flowchart of a credit risk assessment method according to an embodiment of the present invention, and as shown in fig. 1, a credit risk assessment method includes:
step 101: acquiring credit risk assessment data and a current projection matrix; the credit risk assessment data comprises single-class credit risk data and unlabeled credit risk data; the single-class credit risk data comprises a plurality of positive sample data, and the non-label credit risk data comprises a plurality of non-label sample data; the current projection matrix is obtained by performing linear discriminant analysis on the credit risk assessment data.
Step 101, then also includes: and carrying out normalization processing on the credit risk data to obtain normalized credit risk evaluation data.
Step 102: and determining a classifier according to the credit risk assessment data and the current projection matrix by taking the minimum misclassification experience risk as a target.
Step 102, specifically comprising:
determining a classifier according to the credit risk assessment data and the current projection matrix by adopting the following formula:
Figure BDA0002963799140000061
in the formula,
Figure BDA0002963799140000062
for misclassification experience risk, f is the classifier, f (-) is the classifier output result, pi is the prior probability of the positive class,
Figure BDA0002963799140000063
in order to be the positive sample data,
Figure BDA0002963799140000064
for unlabeled sample data, l (-) is a loss function, λ is a trade-off parameter, npIs the number of positive samples, nuThe number of unlabeled samples, i is the number, and R is the projection matrix.
Step 103: and classifying the unlabeled credit risk data by adopting a classifier, and distributing a pseudo label to the unlabeled sample data to obtain pseudo label data.
Step 104: and performing linear discriminant analysis on the pseudo label data and the positive sample data to obtain an updated projection matrix.
Step 104, specifically comprising:
performing linear discriminant analysis on the pseudo label data and the positive sample data by adopting the following formula to obtain an updated projection matrix:
Figure BDA0002963799140000071
wherein,
sb=(μpn)(μpn)T
Figure BDA0002963799140000072
wherein R is a projection matrix, SbIs the degree of divergence in class, swIs interplass divergence, mupIs the mean vector of the positive sample data, μnIs the mean vector of the negative sample data, X is the sample, XpIs a positive sample set, XnIs a negative sample set; the positive sample set is data with credit risk, and the negative sample set is data without credit risk.
Step 105: judging whether an iteration end condition is met; if yes, go to step 107; if not, go to step 106.
Step 106: the updated projection matrix is used as the current projection matrix, and then the process returns to step 102.
Step 107: and outputting the classifier and the updated projection matrix.
Step 108: and performing credit risk assessment on the credit risk assessment data according to the classifier and the updated projection matrix to obtain a credit risk assessment result.
Step 108, specifically comprising:
and according to the updated projection matrix and the credit risk assessment data, performing credit risk classification by adopting a classifier to obtain a credit risk classification result.
FIG. 2 is a block diagram of a credit risk assessment system according to an embodiment of the present invention. As shown in fig. 2, a credit risk assessment system includes:
an obtaining module 201, configured to obtain credit risk assessment data and a current projection matrix; the credit risk assessment data comprises single-class credit risk data and unlabeled credit risk data; the single-class credit risk data comprises a plurality of positive sample data, and the non-label credit risk data comprises a plurality of non-label sample data; the current projection matrix is obtained by performing linear discriminant analysis on the credit risk assessment data;
and the processing module is used for carrying out normalization processing on the credit risk data to obtain normalized credit risk evaluation data.
A classifier determination module 202, configured to determine a classifier based on the credit risk assessment data and the current projection matrix with the objective of minimizing the misclassification experience risk;
the classifier determining module 202 specifically includes:
and the classifier determining unit is used for determining a classifier by adopting the following formula according to the credit risk assessment data and the current projection matrix:
Figure BDA0002963799140000081
in the formula,
Figure BDA0002963799140000082
for misclassification experience risk, f is the classifier, f (-) is the classifier output result, pi is the prior probability of the positive class,
Figure BDA0002963799140000083
in order to be the positive sample data,
Figure BDA0002963799140000084
for unlabeled sample data, l (-) is a loss function, λ is a trade-off parameter, npIs the number of positive samples, nuThe number of unlabeled samples, i is the number, and R is the projection matrix.
The pseudo tag data generating module 203 is configured to classify the non-tag credit risk data by using a classifier, and allocate a pseudo tag to non-tag sample data to obtain pseudo tag data;
the linear discriminant analysis module 204 is configured to perform linear discriminant analysis on the pseudo tag data and the positive sample data to obtain an updated projection matrix;
the linear discriminant analysis module 204 specifically includes:
the linear discriminant analysis unit is used for performing linear discriminant analysis on the pseudo label data and the positive sample data by adopting the following formula to obtain an updated projection matrix:
Figure BDA0002963799140000085
wherein,
Sb=(μpn)(μpn)T
Figure BDA0002963799140000091
wherein R is a projection matrix, SbIs the degree of divergence in class, SwIs interplass divergence, mupIs the mean vector of the positive sample data, μnIs the mean vector of the negative sample data, X is the sample, XpIs a positive sample set, XnIs a negative sample set; the positive sample set is data with credit risk, and the negative sample set is data without credit risk.
A judging module 205, configured to judge whether an iteration end condition is met; if yes, executing an output module; if not, executing an updating module;
an update module 206, configured to use the updated projection matrix as a current projection matrix, and then execute the classifier determination module;
an output module 207 for outputting the classifier and the updated projection matrix;
and the credit risk evaluation module 208 is configured to perform credit risk evaluation on the credit risk evaluation data according to the classifier and the updated projection matrix to obtain a credit risk evaluation result.
The credit risk assessment module 208 specifically includes:
and the credit risk evaluation unit is used for classifying the credit risk by adopting a classifier according to the updated projection matrix and the updated credit risk evaluation data to obtain a credit risk classification result.
To further illustrate the discriminant credit risk assessment method based on single-class classification provided by the present invention, the following is specifically described:
according to the invention, an optimal projection matrix is searched by iteratively solving a double-layer optimization problem, so that the class spacing of original data in a new feature space is increased and the class inner spacing is reduced, and the discriminability of the data is increased, thereby constructing a robust classifier and realizing the intelligent evaluation of the credit risk only depending on single-class samples and label-free samples.
The specific implementation steps are as follows:
step 1: and (4) preprocessing and normalizing data. Dividing a data set of credit risk sample data to obtain a positive sample set
Figure BDA0002963799140000092
And unlabeled sample set
Figure BDA0002963799140000093
Wherein n ispAnd nuThe number of samples in the positive and unlabeled exemplar sets, respectively. Poor credits are considered as a positive sample set in the credit risk assessment, and collected good credits and undetected credit risks are considered as an unlabeled sample set, where samples in the unlabeled sample set may be good credits or poor credits. Then, normalization processing is carried out on the sample characteristics to enable the characteristic value to be in the interval [0, 1 ]]And (4) the following steps.
Step 2: and training a classifier. Taking the positive sample in the step 1
Figure BDA0002963799140000101
And unlabeled samples
Figure BDA0002963799140000102
Respectively obtained by projecting a projection matrix R into a new feature space
Figure BDA0002963799140000103
And
Figure BDA0002963799140000104
constructing a misclassification experience risk based on the positive sample and the unlabeled sample:
Figure BDA0002963799140000105
for function f (R)Tx) using a linear parametric model:
Figure BDA0002963799140000106
wherein,
Figure BDA0002963799140000107
is a set of basis functions, alpha is the coefficient of the classifier f, and b is the bias term of the classifier f. As the basis function, a gaussian function, a linear function, or a polynomial function may be used as the basis function. Using this model, equation (1) can be further expressed as:
Figure BDA0002963799140000108
to obtain the optimal classifier f, it is necessary to minimize the empirical risk of the above equation, i.e.
Figure BDA0002963799140000109
Here, the square loss is used
Figure BDA00029637991400001010
As a loss function of the above optimization problem, where z is a variable. B in model (2) is incorporated into alpha, and
Figure BDA00029637991400001011
is enlarged by
Figure BDA00029637991400001012
Then carry with l2The objective function of the regularization term becomes:
Figure BDA00029637991400001013
wherein phipIs a matrix of values for positive samples, phiuIs a matrix of values for unlabeled exemplars,
Figure BDA00029637991400001014
Figure BDA00029637991400001015
is a basis function with 1 being a column vector of all 1's. To find the minimum of this objective function, the first derivative is found and made equal to zero, resulting in an analytical solution for α:
Figure BDA0002963799140000111
and step 3: the unlabeled exemplars are assigned a pseudo label. Subjecting alpha obtained in step 2 to
Figure BDA0002963799140000112
Each sample in the label-free data set is assigned with a pseudo label, and then the original positive sample set is combined according to the pseudo label to obtain the positive and negative sample sets of the whole data set
Figure BDA0002963799140000113
And
Figure BDA0002963799140000114
wherein,
Figure BDA0002963799140000115
and
Figure BDA0002963799140000116
respectively representing the unlabeled data classified by the classifier obtained in step 2The positive and negative samples of the sample are collected,
Figure BDA0002963799140000117
and
Figure BDA0002963799140000118
respectively representing the number of positive and negative samples in this case, then
Figure BDA0002963799140000119
And
Figure BDA00029637991400001110
and 4, step 4: and (5) solving a projection matrix. I.e. to solve for
Figure BDA00029637991400001111
Since R hereTSbR and RTSwR is a matrix and not a scalar and therefore cannot be optimized as a scalar function. However, other alternative optimization objectives may be implemented, such as
Figure BDA00029637991400001112
Therein, IIdiagA is the product of the main diagonal elements of A. The optimization procedure of H (R) can be converted into
Figure BDA00029637991400001113
Wherein m is the feature dimension after projection. Note that the rightmost side of the above equation is the generalized Rayleigh quotient, the maximum of which is the matrix
Figure BDA00029637991400001114
The maximum eigenvalue of (2), the product of the maximum m values is the matrix
Figure BDA00029637991400001115
And the corresponding matrix R is a matrix formed by expanding eigenvectors corresponding to the largest m eigenvalues at the moment. Utilizing the positive and negative sample sets obtained in the step 3
Figure BDA0002963799140000121
And
Figure BDA0002963799140000122
can find out
Figure BDA0002963799140000123
And then a projection matrix R is obtained.
And 5: repeating the step 2 to the step 4 until convergence, and obtaining the optimal classifier f*And an optimal projection matrix R*
And finally, classifying the credit risk test data according to the obtained model parameters. Using the optimal projection matrix R*Transforming the credit risk test data set into a new feature space, and then using the optimal classifier f*And classifying to obtain the accuracy of the final credit risk assessment result.
The present invention takes the German Credit actual dataset, which classifies credits as "good" and "bad" according to a set of attributes, as an example of Credit risk assessment. Characteristic attributes include the status of an existing checking account, credit record, credit usage, years of employment, property, personal identity, installment rate as a percentage of disposable revenue, etc. In order to verify the robustness of the discriminant credit risk assessment method based on single-class classification, the invention sets the unmarked rate of the positive class to be 20%, 30% and 40% respectively when constructing the data sets of the positive class and the unlabeled class, namely 20%, 30% and 40% of bad credit samples and all good credit samples are taken respectively to form the unlabeled class sample set. Fig. 3 is a graph comparing the effect of the method of the present invention and an unbiased single-class classification method on the German Credit actual dataset when the positive class unlabeling rates are 20%, 30% and 40%, respectively, the ordinate of fig. 3 represents the accuracy, and fig. 3 shows a graph comparing the effect of the method of the present invention and an unbiased single-class classification method on the German Credit actual dataset under the above three conditions. As can be seen from FIG. 3, the method of the present invention further enhances the credit risk assessment effect of the unbiased single-class classification method on the data set under the condition that the positive class unmarked rate is 20%, 30% and 40%.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In summary, this summary should not be construed to limit the present invention.

Claims (10)

1. A credit risk assessment method, comprising:
acquiring credit risk assessment data and a current projection matrix; the credit risk assessment data comprises single-class credit risk data and unlabeled credit risk data; the single class credit risk data comprises a plurality of positive sample data, and the unlabeled credit risk data comprises a plurality of unlabeled sample data; the current projection matrix is obtained by performing linear discriminant analysis on the credit risk assessment data;
determining a classifier according to the credit risk assessment data and the current projection matrix by taking the minimized misclassification experience risk as a target;
classifying the unlabeled credit risk data by adopting the classifier, and distributing pseudo labels to the unlabeled sample data to obtain pseudo label data;
performing linear discriminant analysis on the pseudo label data and the positive sample data to obtain an updated projection matrix;
judging whether an iteration end condition is met; if yes, outputting the classifier and the updated projection matrix; if not, taking the updated projection matrix as a current projection matrix, and then returning to the step of determining a classifier by taking the minimum misclassification risk as a target according to the credit risk assessment data and the current projection matrix;
and performing credit risk assessment on the credit risk assessment data according to the classifier and the updated projection matrix to obtain a credit risk assessment result.
2. The credit risk assessment method of claim 1, further comprising, after obtaining the credit risk assessment data:
and carrying out normalization processing on the credit risk data to obtain normalized credit risk evaluation data.
3. The method according to claim 1, wherein the determining a classifier based on the credit risk assessment data and the current projection matrix with the goal of minimizing the misclassification experience risk specifically comprises:
determining a classifier according to the credit risk assessment data and the current projection matrix by adopting the following formula:
Figure FDA0002963799130000011
in the formula,
Figure FDA0002963799130000021
for misclassification experience risk, f is the classifier, f (-) is the classifier output result, pi is the prior probability of the positive class,
Figure FDA0002963799130000022
in order to be the positive sample data,
Figure FDA0002963799130000023
for unlabeled sample data, l (-) is a loss function, λ is a trade-off parameter, npIs the number of positive samples, nuThe number of unlabeled samples, i is the number, and R is the projection matrix.
4. The method according to claim 1, wherein the performing linear discriminant analysis on the pseudo tag data and the positive sample data to obtain an updated projection matrix specifically comprises:
performing linear discriminant analysis on the pseudo label data and the positive sample data by adopting the following formula to obtain an updated projection matrix:
Figure FDA0002963799130000024
wherein,
Sb=(μpn)(μpn)T
Figure FDA0002963799130000025
wherein R is a projection matrix, SbIs the degree of divergence in class, SwIs interplass divergence, mupIs the mean vector of the positive sample data, μnIs the mean vector of the negative sample data, X is the sample, XpIs a positive sample set, XnIs a negative sample set; the positive sample set is data with credit risk, and the negative sample set is data without credit risk.
5. The method according to claim 1, wherein the performing credit risk assessment on the credit risk assessment data according to the classifier and the updated projection matrix to obtain a credit risk assessment result comprises:
and according to the updated projection matrix and the credit risk assessment data, performing credit risk classification by using the classifier to obtain a credit risk classification result.
6. A credit risk assessment system, comprising:
the acquisition module is used for acquiring credit risk assessment data and a current projection matrix; the credit risk assessment data comprises single-class credit risk data and unlabeled credit risk data; the single class credit risk data comprises a plurality of positive sample data, and the unlabeled credit risk data comprises a plurality of unlabeled sample data; the current projection matrix is obtained by performing linear discriminant analysis on the credit risk assessment data;
a classifier determining module, configured to determine a classifier based on the credit risk assessment data and the current projection matrix with a goal of minimizing a misclassification experience risk;
a pseudo label data generating module, configured to classify the non-label credit risk data by using the classifier, and allocate a pseudo label to the non-label sample data to obtain pseudo label data;
the linear discriminant analysis module is used for performing linear discriminant analysis on the pseudo label data and the positive sample data to obtain an updated projection matrix;
the judging module is used for judging whether the iteration ending condition is met or not; if yes, executing an output module; if not, executing an updating module;
the updating module is used for taking the updated projection matrix as a current projection matrix and then executing the classifier determining module;
an output module for outputting the classifier and the updated projection matrix;
and the credit risk evaluation module is used for performing credit risk evaluation on the credit risk evaluation data according to the classifier and the updated projection matrix to obtain a credit risk evaluation result.
7. The credit risk assessment system of claim 6, further comprising:
and the processing module is used for carrying out normalization processing on the credit risk data to obtain normalized credit risk evaluation data.
8. The credit risk assessment system of claim 6, wherein the classifier determination module specifically comprises:
a classifier determining unit, configured to determine a classifier according to the credit risk assessment data and the current projection matrix by using the following formula:
Figure FDA0002963799130000031
in the formula,
Figure FDA0002963799130000032
for misclassification experience risk, f is the classifier, f (-) is the classifier output result, pi is the prior probability of the positive class,
Figure FDA0002963799130000033
in order to be the positive sample data,
Figure FDA0002963799130000034
for unlabeled sample data, l (-) is a loss function, λ is a trade-off parameter, npIs the number of positive samples, nuThe number of unlabeled samples, i is the number, and R is the projection matrix.
9. The credit risk assessment system of claim 6, wherein the linear discriminant analysis module specifically comprises:
a linear discriminant analysis unit, configured to perform linear discriminant analysis on the pseudo tag data and the positive sample data by using the following formula, so as to obtain an updated projection matrix:
Figure FDA0002963799130000041
wherein,
Sb=(μpn)(μpn)T
Figure FDA0002963799130000042
wherein R is a projection matrix, SbIs the degree of divergence in class, SwIs interplass divergence, muPIs the mean vector of the positive sample data, μnIs the mean vector of the negative sample data, X is the sample, XpIs a positive sample set, XnIs a negative sample set; the positive sample set is data with credit risk, and the negative sample set is data without credit risk.
10. The credit risk assessment system of claim 6, wherein the credit risk assessment module specifically comprises:
and the credit risk evaluation unit is used for carrying out credit risk classification by adopting the classifier according to the updated projection matrix and the credit risk evaluation data to obtain a credit risk classification result.
CN202110245073.7A 2021-03-05 2021-03-05 Credit risk assessment method and system Pending CN113724060A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110245073.7A CN113724060A (en) 2021-03-05 2021-03-05 Credit risk assessment method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110245073.7A CN113724060A (en) 2021-03-05 2021-03-05 Credit risk assessment method and system

Publications (1)

Publication Number Publication Date
CN113724060A true CN113724060A (en) 2021-11-30

Family

ID=78672597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110245073.7A Pending CN113724060A (en) 2021-03-05 2021-03-05 Credit risk assessment method and system

Country Status (1)

Country Link
CN (1) CN113724060A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117952619A (en) * 2024-03-26 2024-04-30 南京赛融信息技术有限公司 Risk behavior analysis method, system and computer readable medium based on digital RMB wallet account correlation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117952619A (en) * 2024-03-26 2024-04-30 南京赛融信息技术有限公司 Risk behavior analysis method, system and computer readable medium based on digital RMB wallet account correlation
CN117952619B (en) * 2024-03-26 2024-06-07 南京赛融信息技术有限公司 Risk behavior analysis method, system and computer readable medium based on digital RMB wallet account correlation

Similar Documents

Publication Publication Date Title
Kaski et al. Bankruptcy analysis with self-organizing maps in learning metrics
Mwebaze et al. Divergence-based classification in learning vector quantization
CN103093235B (en) A kind of Handwritten Numeral Recognition Method based on improving distance core principle component analysis
Zhang et al. Label propagation based supervised locality projection analysis for plant leaf classification
Kuismin et al. Estimation of covariance and precision matrix, network structure, and a view toward systems biology
CN108564107A (en) The sample class classifying method of semi-supervised dictionary learning based on atom Laplce's figure regularization
CN111382930B (en) Time sequence data-oriented risk prediction method and system
CN103226595B (en) The clustering method of the high dimensional data of common factor analyzer is mixed based on Bayes
CN113887661B (en) Image set classification method and system based on representation learning reconstruction residual analysis
Bahrami et al. Joint auto-weighted graph fusion and scalable semi-supervised learning
CN112270596A (en) Risk control system and method based on user portrait construction
CN110781970A (en) Method, device and equipment for generating classifier and storage medium
Shang et al. Feature selection via non-convex constraint and latent representation learning with laplacian embedding
He et al. Novel discriminant locality preserving projection integrated with Monte Carlo sampling for fault diagnosis
US10546246B2 (en) Enhanced kernel representation for processing multimodal data
CN113724060A (en) Credit risk assessment method and system
CN112836754A (en) Image description model generalization capability evaluation method
CN111832391A (en) Image dimension reduction method and image identification method based on truncated nuclear norm low-rank discriminant embedding method
Baruque et al. THE S 2-ENSEMBLE FUSION ALGORITHM
Jena et al. Elitist TLBO for identification and verification of plant diseases
CN104778479B (en) A kind of image classification method and system based on sparse coding extraction
CN113988161A (en) User electricity consumption behavior pattern recognition method
CN111428510A (en) Public praise-based P2P platform risk analysis method
Yang et al. Efficient pattern unmixing of multiplex proteins based on variable weighting of texture descriptors
CN116304358B (en) User data acquisition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination