CN114626553A - Training method and device of financial data monitoring model and computer equipment - Google Patents

Training method and device of financial data monitoring model and computer equipment Download PDF

Info

Publication number
CN114626553A
CN114626553A CN202210334899.5A CN202210334899A CN114626553A CN 114626553 A CN114626553 A CN 114626553A CN 202210334899 A CN202210334899 A CN 202210334899A CN 114626553 A CN114626553 A CN 114626553A
Authority
CN
China
Prior art keywords
financial data
financial
data set
variable
screened
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210334899.5A
Other languages
Chinese (zh)
Inventor
汪志艺
张华鹏
于庆疆
陈钰锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202210334899.5A priority Critical patent/CN114626553A/en
Publication of CN114626553A publication Critical patent/CN114626553A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting
    • G06Q40/125Finance or payroll

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Technology Law (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The present application relates to the field of artificial intelligence, and in particular, to a method, an apparatus, a computer device, a storage medium, and a computer program product for training a financial data monitoring model. The method comprises the following steps: determining a first financial data set and a second financial data set in the historical financial data set; the first financial data set includes anomalous financial data having anomalous financial behavior recorded therein; the second financial data set comprises financial data with no abnormal financial behavior recorded; determining a target recording time point corresponding to each abnormal financial data; according to the target recording time points, screening financial data with preset data volume in the second financial data set to serve as a third financial data set; and training the financial data monitoring model to be trained according to the first financial data set and the third financial data set to obtain a target financial data monitoring model. By adopting the method, the problem of unbalanced data set in the model training process can be solved, and the abnormal financial data monitoring accuracy of the model is improved.

Description

Training method and device of financial data monitoring model and computer equipment
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for training a financial data monitoring model, a computer device, a storage medium, and a computer program product.
Background
With the development of internet finance, financial transactions become more and more frequent, but abnormal transactions are increased day by day, such as fraudulent transactions, illegal transactions and the like, which seriously affect the safety and benefits of financial departments such as banks, and therefore, monitoring financial data generated in transaction behaviors becomes important work content of financial business departments such as banks.
The conventional financial data monitoring method is usually to use a traditional machine learning model for monitoring, but in the model training process, the accuracy of a prediction model obtained by final training is poor due to unbalanced data sets, and abnormal financial data which is generated in the transaction process and has abnormal financial behaviors recorded cannot be accurately identified.
Therefore, the problem that the monitoring accuracy of the abnormal financial data is low exists in the prior art.
Disclosure of Invention
In view of the foregoing, there is a need to provide a method, an apparatus, a computer device, a computer readable storage medium, and a computer program product for training a financial data monitoring model, which can improve the monitoring accuracy of abnormal financial data.
In a first aspect, the present application provides a method for training a financial data monitoring model. The method comprises the following steps:
determining a first financial data set and a second financial data set in the historical financial data set; the first financial data set comprises abnormal financial data recorded with abnormal financial behaviors; the second financial data set comprises financial data for which no abnormal financial behavior is recorded;
determining a target recording time point corresponding to each abnormal financial data;
according to each target recording time point, screening financial data with preset data volume in the second financial data set to serve as a third financial data set; recording time points corresponding to the financial data in the third financial data set are matched with the target recording time points;
training a financial data monitoring model to be trained according to the first financial data set and the third financial data set to obtain a target financial data monitoring model; the target financial data monitoring model is used for determining whether abnormal financial behaviors are recorded in the financial data to be monitored.
In one embodiment, the screening, in the second financial data set, financial data of a preset data amount as a third financial data set according to each target recording time point includes:
screening the financial data in the second financial data set according to each target recording time point, and screening out a piece of financial data matched with each target recording time point to obtain a financial data subset;
and if the financial data volume in the financial data subset is smaller than the preset data volume, performing the step of screening the financial data of the second financial data set according to each target recording time point until the financial data volume in the financial data subset is equal to the preset data volume to obtain a third financial data set.
In one embodiment, the target recording time point comprises a current target recording time point; the step of screening the financial data in the second financial data set according to each target recording time point to screen out a piece of financial data matched with each target recording time point to obtain a financial data subset comprises:
if the financial data matched with the current target recording time point does not exist in the second financial data set, screening the financial data in the second financial data set according to the next target recording time point of the current target recording time point;
and if a plurality of candidate financial data matched with the current target recording time point exist in the second financial data set, adding any one of the candidate financial data into the financial data subset.
In one embodiment, the method further comprises:
dividing the historical financial data set into the first financial data set and the second financial data set according to the data type labels corresponding to the historical financial data in the historical financial data set; the data type tag is used for determining whether abnormal financial behaviors are recorded in the historical financial data;
determining variables to be screened corresponding to the historical financial data;
performing abnormal correlation screening on the variables to be screened according to the distribution condition of the variables to be screened in the first financial data set and the distribution condition of the variables to be screened in the second financial data set to obtain screened variables;
determining a target variable according to the screened variable; the target variable is used for training the financial data monitoring model to be trained.
In one embodiment, the performing abnormal correlation screening on the variables to be screened according to the distribution of the variables to be screened in the first financial data set and the distribution of the variables to be screened in the second financial data set to obtain screened variables includes:
performing characteristic dimension reduction processing on the variable to be screened to obtain the processed variable to be screened;
acquiring a first frequency distribution condition of each processed variable to be screened in the first financial data set and a second frequency distribution condition of each processed variable to be screened in the second financial data set;
determining the difference between the first frequency distribution condition corresponding to each processed variable to be screened and the second frequency distribution condition corresponding to each processed variable to be screened;
and taking the processed variable to be screened, in which the difference between the corresponding first frequency distribution condition and the corresponding second frequency distribution condition meets a preset difference condition, as the screened variable.
In one embodiment, the determining the target variable according to the filtered variable includes:
determining an initial time variable corresponding to the historical financial data;
performing unit conversion processing on the initial time variable to obtain a conversion time variable; the time unit of the conversion time variable is greater than the time unit of the initial time variable;
standardizing the conversion time variable to obtain a preprocessing time variable;
and determining the target variable according to the screened variable and the pretreatment time variable.
In one embodiment, after the step of training the financial data monitoring model to be trained according to the first financial data set and the third financial data set to obtain the target financial data monitoring model, the method further includes:
acquiring a financial data value matched with the target variable in the financial data to be monitored;
inputting the financial data value into the target financial data monitoring model to obtain the abnormal probability corresponding to the financial data to be monitored;
and if the abnormal probability is greater than a preset abnormal probability threshold value, judging that the financial data record to be monitored has abnormal financial behaviors.
In a second aspect, the present application provides a method of financial data monitoring. The method comprises the following steps:
acquiring financial data to be monitored;
determining a corresponding financial data value of the financial data to be monitored in a target variable;
inputting the financial data value into a target financial data monitoring model to obtain an abnormal probability corresponding to the financial data to be monitored; the target financial data monitoring model is obtained according to a training method of the financial data monitoring model;
and if the abnormal probability is greater than a preset abnormal probability threshold value, judging that the financial data record to be monitored has abnormal financial behaviors.
In a third aspect, the application further provides a training device for the financial data monitoring model. The device comprises:
a first determining module for determining a first financial data set and a second financial data set in a historical financial data set; the first financial data set comprises anomalous financial data in which anomalous financial behavior is recorded; the second financial data set comprises financial data for which no abnormal financial behavior is recorded;
the second determining module is used for determining a target recording time point corresponding to each abnormal financial data;
the screening module is used for screening financial data with preset data volume in the second financial data set according to each target recording time point to serve as a third financial data set; recording time points corresponding to the financial data in the third financial data set are matched with the target recording time points;
the training module is used for training a financial data monitoring model to be trained according to the first financial data set and the third financial data set to obtain a target financial data monitoring model; and the target financial data monitoring model is used for determining whether abnormal financial behaviors are recorded in the financial data to be monitored.
In a fourth aspect, the present application further provides a financial data monitoring device. The device comprises:
the acquisition module is used for acquiring financial data to be monitored;
the data value determining module is used for determining the corresponding financial data value of the financial data to be monitored in the target variable;
the input module is used for inputting the financial data value into a target financial data monitoring model to obtain the abnormal probability corresponding to the financial data to be monitored; the target financial data monitoring model is obtained according to a training method of the financial data monitoring model;
and the judging module is used for judging that the financial data record to be monitored has abnormal financial behaviors if the abnormal probability is greater than a preset abnormal probability threshold value.
In a fifth aspect, the present application further provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the method of training a financial data monitoring model according to the first aspect or any one of the embodiments of the first aspect, or the method of financial data monitoring according to the second aspect, when the computer program is executed.
In a sixth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements a method of training a financial data monitoring model according to the first aspect or any one of the embodiments of the first aspect, or the method of financial data monitoring according to the second aspect.
In a seventh aspect, the present application further provides a computer program product. The computer program product comprising a computer program which, when executed by a processor, implements a method of training a financial data monitoring model according to the first aspect or any one of the embodiments of the first aspect, or a method of financial data monitoring according to the second aspect.
The training method, apparatus, computer device, storage medium and computer program product of the financial data monitoring model described above, by determining, in a historical financial data set, a first financial data set in which abnormal financial data of abnormal financial behavior is recorded and a second financial data set in which financial data of abnormal financial behavior is not recorded; then, determining target recording time points corresponding to the abnormal financial data; then, according to each target recording time point, screening financial data with preset data volume in the second financial data set to serve as a third financial data set; recording time points corresponding to the financial data in the third financial data set are matched with the target recording time points; finally, training the financial data monitoring model to be trained according to the first financial data set and the third financial data set to obtain a target financial data monitoring model for determining whether abnormal financial behaviors are recorded in the financial data to be monitored; therefore, since the abnormal financial data recorded with the abnormal financial behaviors are few in the actual historical financial data set, in the training process of the financial data monitoring model, a large amount of financial data which are not recorded with the abnormal financial behaviors are screened through the target recording time points corresponding to the abnormal financial data, so that a third financial data set which is matched with the target recording time points and meets the preset amount of financial data can be obtained, a second financial data set with huge data volume is not required to be used as a sample set for training the model, the phenomenon that the data volume difference between the abnormal financial data in the sample set and the financial data which are not recorded with the abnormal financial behaviors is too large is prevented, and the problem of low model monitoring accuracy caused by unbalanced data sets in the training process of the model is solved; meanwhile, the characteristic of the abnormal financial behavior can be embodied through the recording time point corresponding to the financial data, so that a large amount of financial data which are not recorded with the abnormal financial behavior are screened through the target recording time point corresponding to the abnormal financial data, sample data for model training is guaranteed to be concentrated, the abnormal financial data are matched with the recording time point corresponding to the financial data which are not recorded with the abnormal financial behavior, the influence on a monitoring result caused by mismatching of the sample recording time point can be reduced, the error of the monitoring result is reduced, and the monitoring accuracy of the abnormal financial data of the model is further improved.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating a method for training a financial data monitoring model, according to one embodiment;
FIG. 2 is a schematic flow chart illustrating a method for training a financial data monitoring model according to another embodiment;
FIGS. 3(a) and (b) are schematic diagrams of the correlation analysis in two cases of the variable to be screened in one embodiment, respectively;
fig. 4(a) to 4(k) are frequency distribution histograms of the rejected processed variables to be screened in the embodiment under two conditions, respectively;
FIG. 5 is a diagram that illustrates the results of attribute importance scores for variables in one embodiment;
FIG. 6 is a schematic illustration of a confusion matrix for a model in one embodiment;
FIG. 7 is a schematic diagram of a ROC curve of the model in one embodiment;
FIG. 8 is a block diagram of an exemplary financial data monitoring model training apparatus;
FIG. 9 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, as shown in fig. 1, a method for training a financial data monitoring model is provided, which is described by taking the method as an example for application to a computer device, where the computer device may be a terminal, an independent server, or a server cluster composed of a plurality of servers, and the terminal may be, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The method comprises the following steps:
step S110, determining a first financial data set and a second financial data set in the historical financial data set.
Wherein the first financial data set includes anomalous financial data in which anomalous financial behavior is recorded.
Wherein the second set of financial data includes financial data for which no abnormal financial behavior has been recorded.
Wherein the historical financial data set includes historical financial data, which may be, but is not limited to, bank card transaction data.
The abnormal financial behaviors can be fraudulent behaviors, such as credit card behaviors, bank card embezzlement behaviors, fake card binding behaviors and the like.
In a specific implementation, the computer device may obtain financial data of a certain time period in a certain area as a historical financial data set, determine whether abnormal financial behaviors are recorded in the historical financial data according to data type tags corresponding to the historical financial data in the historical financial data set, divide the historical financial data set into a first financial data set consisting of the abnormal financial data in which the abnormal financial behaviors are recorded and a second financial data set consisting of the financial data in which the abnormal financial behaviors are not recorded.
And step S120, determining target recording time points corresponding to the abnormal financial data.
Wherein, the target recording time point may be a transaction occurrence time corresponding to the abnormal financial data.
In a specific implementation, the computer device may determine a target recording time point corresponding to each abnormal financial data in the first financial data set.
And step S130, screening out financial data with preset data volume in the second financial data set according to each target recording time point to serve as a third financial data set.
And the recording time point corresponding to each financial data in the third financial data set is matched with each target recording time point.
In specific implementation, the computer device can traverse the second financial data set at least once according to each target recording time point, and filter out financial data matched with the corresponding recording time point and the target recording time point in each traversal until the filtered financial data is a preset data volume, so that a third financial data set consisting of the financial data obtained by at least one traversal screening can be obtained.
And step S140, training the financial data monitoring model to be trained according to the first financial data set and the third financial data set to obtain a target financial data monitoring model.
The target financial data monitoring model is used for determining whether abnormal financial behaviors are recorded in the financial data to be monitored.
In specific implementation, the computer device may divide the data according to a preset data dividing ratio, for example, the preset data dividing ratio may be 4: 1, dividing the obtained first financial data set recorded with abnormal financial behaviors and a third financial data set not recorded with abnormal financial behaviors into a training sample set and a testing sample set, wherein each training sample comprises a corresponding data type sample label, and whether the training sample records the abnormal financial behaviors or not can be determined through the data type sample labels; then, the computer equipment can train the financial data monitoring model to be trained through the training sample set, and when the trained financial data monitoring model meets the preset training condition, the financial data monitoring model used for determining whether the financial data to be monitored records the abnormal financial behavior target financial data monitoring model is obtained; the precision of the target financial data monitoring model can be calculated through the test sample set; so, can discern the financial data that the record has unusual financial behavior through target financial data monitoring model to can select the customer that receives the influence of unusual financial behavior before the bank is put money, through modes such as reducing the amount of money of putting and not putting money even, reduce the loss in the aspect of the bank.
In the training method of the financial data monitoring model, a first financial data set with abnormal financial data of abnormal financial behaviors recorded and a second financial data set without the abnormal financial data of abnormal financial behaviors recorded are determined in a historical financial data set; then, determining target recording time points corresponding to the abnormal financial data; then, according to each target recording time point, screening financial data with preset data volume in the second financial data set to serve as a third financial data set; recording time points corresponding to the financial data in the third financial data set are matched with the target recording time points; finally, training the financial data monitoring model to be trained according to the first financial data set and the third financial data set to obtain a target financial data monitoring model for determining whether abnormal financial behaviors are recorded in the financial data to be monitored; therefore, due to the fact that the actual historical financial data set has few abnormal financial data recorded with abnormal financial behaviors, in the training process of the financial data monitoring model, a large amount of financial data which do not record the abnormal financial behaviors are screened through the target recording time points corresponding to the abnormal financial data, a third financial data set which is matched with the target recording time points and meets the preset amount of financial data can be obtained, the second financial data set with huge data volume is not required to be used as a sample set for training the model, the phenomenon that the difference between the data volume of the abnormal financial data in the sample set and the data volume of the financial data which do not record the abnormal financial behaviors is too large is prevented, and the problem of low model monitoring accuracy caused by unbalanced data sets in the training process of the model is solved; meanwhile, the characteristic of the abnormal financial behavior can be embodied through the recording time point corresponding to the financial data, so that a large amount of financial data which are not recorded with the abnormal financial behavior are screened through the target recording time point corresponding to the abnormal financial data, the sample data for model training is ensured to be concentrated, the abnormal financial data are matched with the recording time points corresponding to the financial data which are not recorded with the abnormal financial behavior, the influence on the monitoring result caused by mismatching of the sample recording time points can be reduced, the error of the monitoring result is reduced, and the abnormal financial data monitoring accuracy of the model is further improved.
In one embodiment, according to each target recording time point, screening out the financial data with a preset data volume in the second financial data set as a third financial data set, the method comprises: screening the financial data in the second financial data set according to each target recording time point, and screening out one piece of financial data matched with each target recording time point to obtain a financial data subset; if the financial data volume in the financial data subset is less than the preset data volume, the step of screening the financial data of the second financial data set according to each target recording time point is executed until the financial data volume in the financial data subset is equal to the preset data volume, and a third financial data set is obtained.
In a specific implementation, in the process that the computer device screens out financial data with a preset data volume in the second financial data set according to each target recording time point to serve as a third financial data set, the computer device may perform at least one traversal on the financial data in the second financial data set according to the target recording time point corresponding to each abnormal financial data, screen out only one piece of financial data matched with each target recording time point in each traversal, that is, the recording time point corresponding to the screened financial data is the same as the target recording time point, and add the financial data screened out from the second financial data set to the financial data subset; and if the financial data volume in the financial data subsets is less than the preset data volume, continuously executing the step of traversing and screening the financial data of the second financial data set according to each target recording time point until the financial data volume in the financial data subsets is equal to the preset data volume, and taking the financial data subsets with the financial data volume as the preset data volume as third financial data subsets.
According to the technical scheme of the embodiment, the financial data in the second financial data set are screened according to each target recording time point, so that one piece of financial data matched with each target recording time point is screened out, and a financial data subset is obtained; if the financial data volume in the financial data subset is smaller than the preset data volume, performing a step of screening the financial data of the second financial data set according to each target recording time point until the financial data volume in the financial data subset is equal to the preset data volume to obtain a third financial data set; in this way, the second financial data set with large data volume is screened at least once through the target recording time point corresponding to the abnormal financial data with small data volume, and only one piece of financial data matched with each target recording time point is screened each time until the financial data volume in the financial data subset is equal to the preset data volume to obtain a third financial data set; therefore, in the model training process, the corresponding financial data which is not recorded with abnormal financial behaviors can be ensured to be arranged at each target recording time point, the data volume of the financial data which is not recorded with the abnormal financial behaviors and is corresponding to each target recording time point is relatively average, the recording time points corresponding to the training samples are prevented from being concentrated on certain numerical values in a large quantity, and the problem of low model monitoring accuracy caused by unbalanced data sets in the model training process is solved.
In one embodiment, the step of screening the financial data in the second financial data set according to each target recording time point to screen out a piece of financial data matching each target recording time point to obtain the financial data subset comprises: if the financial data matched with the current target recording time point does not exist in the second financial data set, screening the financial data in the second financial data set according to the next target recording time point of the current target recording time point; and if a plurality of candidate financial data matched with the current target recording time point exist in the second financial data set, adding any candidate financial data into the financial data subset.
Wherein the target recording time point includes a current target recording time point.
In the specific implementation, the computer device screens the financial data in the second financial data set according to each target recording time point to screen out a piece of financial data matched with each target recording time point, and in the process of obtaining the financial data subset, the target recording time point comprises a current target recording time point, if the computer device detects that no financial data matched with the current target recording time point exists in the second financial data set, the current target recording time point is skipped, the financial data in the second financial data set is continuously screened according to a next target recording time point of the current target recording time point, and whether financial data matched with the next target recording time point exists is detected; if the computer equipment detects that a plurality of candidate financial data matched with the current target recording time point exist in the second financial data set, any one candidate financial data is selected to be added into the financial data subset, and it is guaranteed that only one candidate financial data matched with each target recording time point is selected at most when the second financial data set is subjected to traversal screening at least once.
According to the technical scheme of the embodiment, the target recording time point comprises a current target recording time point, and if financial data matched with the current target recording time point does not exist in the second financial data set, the financial data in the second financial data set is screened according to the next target recording time point of the current target recording time point; if a plurality of candidate financial data matched with the current target recording time point exist in the second financial data set, adding any candidate financial data into the financial data subset; so, according to each target recording time point, carry out at least once screening process to the financial data in the second financial data set, select at most only one with each target recording time point assorted financial data at every turn to can guarantee that the data bulk of the financial data that has not recorded unusual financial behavior that each target recording time point corresponds is comparatively average, prevent the unbalanced problem of data set that leads to because the recording time point that the training sample corresponds concentrates on some numerical values in a large number, improve the monitoring accuracy of model.
In one embodiment, the method further comprises: dividing the historical financial data set into a first financial data set and a second financial data set according to the data type labels corresponding to the historical financial data in the historical financial data set; determining variables to be screened corresponding to historical financial data; performing abnormal correlation screening on the variables to be screened according to the distribution condition of the variables to be screened in the first financial data set and the distribution condition of the variables to be screened in the second financial data set to obtain screened variables; determining a target variable according to the screened variable; the target variables are used for training the financial data monitoring model to be trained.
Wherein the data type tag is used to determine whether the historical financial data records abnormal financial behavior.
The variable to be screened may be a characteristic variable corresponding to historical financial data, and may include a client name, a business type, a transaction amount, a transaction number, an age, an occupation, and other variables.
In a specific implementation, the computer device may determine whether abnormal financial behaviors are recorded in the historical financial data according to data type tags corresponding to the historical financial data in the historical financial data set, so that the historical financial data may be divided into the abnormal financial data in which the abnormal financial behaviors are recorded and the financial data in which the abnormal financial behaviors are not recorded, and a first financial data set composed of the abnormal financial data and a second financial data set composed of the financial data in which the abnormal financial behaviors are not recorded may be obtained.
The computer equipment can also determine variables to be screened corresponding to the historical financial data, determine the distribution condition of the variables to be screened in the first financial data set and the distribution condition of the variables to be screened in the second financial data set, and perform abnormal correlation screening on the variables to be screened according to the distribution condition of the variables to be screened in the first financial data set and the distribution condition of the variables to be screened in the second financial data set, so as to screen out the variables with higher correlation with abnormal financial behaviors as screened variables; and then, the computer equipment can determine a target variable for training the financial data monitoring model to be trained according to the screened variable and the time variable and the amount variable corresponding to the historical financial data.
According to the technical scheme of the embodiment, the historical financial data set is divided into a first financial data set and a second financial data set according to the data type label corresponding to each historical financial data in the historical financial data set; determining variables to be screened corresponding to historical financial data; according to the distribution condition of each variable to be screened in the first financial data set and the distribution condition of each variable to be screened in the second financial data set, carrying out abnormal correlation screening on the variable to be screened to obtain a screened variable; determining a target variable for training the financial data monitoring model to be trained according to the screened variable; therefore, the variables to be screened corresponding to the historical financial data can be screened to obtain screened variables, the target variables used for training the financial data monitoring model to be trained are determined according to the screened variables, the model is not required to be trained by all the variables corresponding to the historical financial data, and the training efficiency of the model is improved.
In one embodiment, the obtaining the filtered variables by performing abnormal correlation filtering on the variables to be filtered according to the distribution of the variables to be filtered in the first financial data set and the distribution of the variables to be filtered in the second financial data set includes: performing characteristic dimension reduction processing on the variable to be screened to obtain the processed variable to be screened; acquiring a first frequency distribution condition of each processed variable to be screened in a first financial data set and a second frequency distribution condition of each processed variable to be screened in a second financial data set; determining the difference between the first frequency distribution condition corresponding to each processed variable to be screened and the corresponding second frequency distribution condition; and taking the processed variable to be screened, in which the difference between the corresponding first frequency distribution condition and the corresponding second frequency distribution condition meets the preset difference condition, as the screened variable.
The preset difference condition may be greater than a preset difference threshold.
In the specific implementation, the computer equipment can perform feature dimensionality reduction on the variable to be screened to obtain a processed variable to be screened in the process of performing abnormal correlation screening on the variable to be screened according to the distribution condition of each variable to be screened in the first financial data set and the distribution condition of each variable to be screened in the second financial data set to obtain the screened variable; specifically, the feature dimension reduction treatment of the variable to be screened can be realized through a principal component analysis method to obtain the treated variable to be screened; then, the computer device may draw a first frequency distribution diagram of each processed variable to be screened in the first financial data set to obtain a first frequency distribution situation of each processed variable to be screened in the first financial data set, and draw a second frequency distribution diagram of each processed variable to be screened in the second financial data set to obtain a second frequency distribution situation of each processed variable to be screened in the second financial data set, so that the computer device may determine a difference between the first frequency distribution situation corresponding to each processed variable to be screened and the corresponding second frequency distribution situation, and set a difference between the corresponding first frequency distribution situation and the corresponding second frequency distribution situation to satisfy a preset difference condition for the processed variable to be screened, that is, the processed variable to be screened whose difference is greater than a preset difference threshold value as the screened variable, that is, the screened variables have great influence on the characteristic of abnormal financial behavior; and simultaneously rejecting the processed variables to be screened, the differences of which are smaller than the preset difference threshold value, namely rejecting the processed variables to be screened, the corresponding first frequency distribution conditions of which have no great difference with the corresponding second frequency distribution conditions of which.
According to the technical scheme of the embodiment, the variable to be screened is subjected to feature dimension reduction processing to obtain the processed variable to be screened; acquiring a first frequency distribution condition of each processed variable to be screened in a first financial data set and a second frequency distribution condition of each processed variable to be screened in a second financial data set; determining the difference between the first frequency distribution condition corresponding to each processed variable to be screened and the corresponding second frequency distribution condition; processing variables to be screened, wherein the differences between the corresponding first frequency distribution conditions and the corresponding second frequency distribution conditions meet preset difference conditions, and taking the processed variables as screened variables; therefore, the variables to be screened after processing are screened according to the difference between the distribution situation of the variables to be screened in the abnormal financial data and the distribution situation of the variables to be screened in the financial data without recording abnormal financial behaviors, and the screened variables with large influence on the characteristic of the abnormal financial behaviors can be obtained, so that the target variables with large influence on the monitoring result can be selected for the financial data monitoring model more accurately, and the monitoring accuracy of the model is improved.
In one embodiment, determining the target variable from the filtered variables comprises: determining an initial time variable corresponding to historical financial data; performing unit conversion processing on the initial time variable to obtain a conversion time variable; the time unit of the conversion time variable is larger than the time unit of the initial time variable; standardizing the conversion time variable to obtain a preprocessing time variable; and determining a target variable according to the screened variable and the pretreatment time variable.
The time unit of the initial time variable may be "seconds", and the time unit of the conversion time variable may be "hours".
In the specific implementation, in the process that the computer device determines the target variable according to the screened variable, the computer device can determine an initial time variable corresponding to the historical financial data, and perform unit conversion processing on the initial time variable to obtain a conversion time variable of which the time unit is greater than the initial time variable, so that the value corresponding to the time variable can be compressed in a smaller range; and then, the computer equipment can carry out standardization processing on the initial time variable to obtain a preprocessing time variable, and determines a target variable for training the financial data monitoring model to be trained according to the screened variable and the preprocessing time variable.
In practical application, the computer device can also determine an initial amount variable corresponding to the historical financial data, perform standardization processing on the initial amount variable to obtain a pre-processing amount variable, and take the screened variable, the pre-processing time variable and the pre-processing amount variable as target variables.
In practical application, the computer device can also perform data division on the first financial data set and the third financial data set according to a preset data division ratio to obtain a training sample set, each training sample comprises a corresponding target variable and a corresponding data type sample label, the data type sample label is used for determining whether an abnormal financial behavior is recorded in the training sample, and the financial data monitoring model to be trained is trained through the target variable and the data type sample label corresponding to each training sample.
According to the technical scheme of the embodiment, the initial time variable corresponding to historical financial data is determined; performing unit conversion processing on the initial time variable to obtain a conversion time variable; the time unit of the conversion time variable is greater than the time unit of the initial time variable; standardizing the conversion time variable to obtain a preprocessing time variable; determining a target variable according to the screened variable and the pretreatment time variable; therefore, the conversion time variable can be obtained by performing unit conversion processing on the initial time variable, the numerical value corresponding to the time variable is compressed in a smaller range, the conversion time variable is subjected to standardization processing to obtain the preprocessing time variable, the variable of the input model is ensured to be the fully preprocessed variable, and the precision of the model can be improved.
In another embodiment, a financial data monitoring method is provided, which is described by taking the method as an example applied to a computer device, and the method comprises the following steps: acquiring financial data to be monitored; determining a corresponding financial data value of the financial data to be monitored in a target variable; inputting the financial data value into a target financial data monitoring model to obtain the abnormal probability corresponding to the financial data to be monitored; and if the abnormal probability is greater than a preset abnormal probability threshold value, judging that the financial data records to be monitored have abnormal financial behaviors.
And the target financial data monitoring model is obtained according to the training method of the financial data monitoring model.
And the target variable is the same as that in the training method of the financial data monitoring model.
In the specific implementation, the computer device can acquire the financial data to be monitored, and then screen the data values corresponding to the variables in the financial data to be monitored according to the target variable to obtain the data values corresponding to the screened variables, the data values corresponding to the initial time variables and the data values corresponding to the initial amount variables; and then, performing characteristic dimension reduction processing on the data values corresponding to the screened variables, performing unit conversion processing and standardization processing on the data values corresponding to the initial time variables, and performing standardization processing on the data values corresponding to the initial amount variables to obtain financial data values corresponding to the target variables. Meanwhile, the computer equipment can acquire a target financial data monitoring model obtained according to the training method of the financial data monitoring model, and input a financial data value into the target financial data monitoring model to obtain an abnormal probability corresponding to the financial data to be monitored; if the abnormal probability is larger than a preset abnormal probability threshold value, the predicted value output by the target financial data monitoring model is 1, and the fact that the financial data to be monitored has abnormal financial behaviors is judged to represent the abnormal financial data; if the abnormal probability is smaller than the preset abnormal probability threshold value, the predicted value output by the target financial data monitoring model is 0, and the fact that the abnormal financial behavior is not recorded in the financial data to be monitored is represented and judged, so that the financial data to be monitored is normal financial data.
In the financial data monitoring method, the financial data to be monitored is acquired; determining a corresponding financial data value of the financial data to be monitored in a target variable; inputting the financial data value into a target financial data monitoring model to obtain the abnormal probability corresponding to the financial data to be monitored; the target financial data monitoring model is obtained according to the training method of the financial data monitoring model; if the abnormal probability is larger than a preset abnormal probability threshold value, judging that the financial data record to be monitored has abnormal financial behaviors; so, treat monitoring financial data through the target financial data monitoring model that the training was accomplished and carry out abnormal monitoring, and target financial data monitoring model is in the training process, and the data volume of unusual financial data and the financial data that do not have recorded unusual financial behavior in the sample set is comparatively balanced, has ensured the precision of target financial data monitoring model, can accurately judge whether treat monitoring financial data for having the unusual financial data of financial behavior in the record.
In another embodiment, as shown in fig. 2, a method for training a financial data monitoring model is provided, which is described by taking the method as an example applied to a computer device, and comprises the following steps:
step S202, according to the data type labels corresponding to the historical financial data in the historical financial data set, the historical financial data set is divided into a first financial data set and a second financial data set.
And step S204, determining variables to be screened corresponding to the historical financial data.
And S206, performing feature dimension reduction processing on the variable to be screened to obtain the processed variable to be screened.
Step S208, a first frequency distribution situation of each processed variable to be screened in the first financial data set and a second frequency distribution situation of each processed variable to be screened in the second financial data set are obtained.
Step S210, determining a difference between a first frequency distribution situation corresponding to each processed variable to be screened and a corresponding second frequency distribution situation.
Step S212, the processed variable to be screened, in which the difference between the corresponding first frequency distribution condition and the corresponding second frequency distribution condition satisfies the preset difference condition, is used as the screened variable.
And step S214, determining a target variable for training the financial data monitoring model to be trained according to the screened variable.
It should be noted that, the above specific definition of the steps can refer to the above specific definition of the training method of the financial data monitoring model.
It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.
In one embodiment, a method for training a financial data monitoring model is provided, comprising the steps of:
step S301, determining a data source.
The historical financial data set in the application example can be the financial data of the credit card in a month in a certain region, and comprises a first financial data set consisting of the abnormal financial data recorded with abnormal financial behaviors and a second financial data set consisting of the financial data not recorded with abnormal financial behaviors. The historical financial data has 30 corresponding initial characteristic variables and a dependent variable (corresponding to the data type label in the above embodiment), the 30 initial characteristic variables include 28 variables to be screened, such as customer name, business type, transaction Amount, transaction number, age, occupation, etc., and an initial Time variable Time and an initial Amount variable Amount, the 28 variables to be screened are subjected to feature dimension reduction processing by a principal component analysis method, so as to obtain 28 processed variables to be screened V1-V28, and for the consideration of protecting user data privacy and security, the corresponding relationship between each processed variable to be screened and each variable name is not disclosed here. In addition, in order to compress the value corresponding to the Time variable in a small range, it is necessary to divide the value corresponding to the initial Time variable Time in seconds by 3600 to obtain the transition Time variable Hour in hours.
And step S302, determining the correlation of the variables to be screened after each treatment.
To facilitate understanding by those skilled in the art, fig. 3(a) and 3(b) provide the variables V1 to V28 to be screened after the above-mentioned processing, the initial Time variable Time, the transition Time variable Hour, and the initial Amount variable Amount, the correlation analysis map in the abnormal financial data in which the abnormal financial behavior is recorded, and the correlation analysis map in the financial data in which the abnormal financial behavior is not recorded, respectively. It can be seen that the correlation of some variables is more obvious in the abnormal financial data recorded with abnormal financial behavior. Wherein the correlation degree between V1, V2, V3, V4, V5, V6, V7, V9, V10, V11, V12, V14, V16, V17, V18 and V19 has obvious correlation in the sample recorded with the abnormal financial data of the abnormal financial behavior.
Step S303, establishing a frequency distribution graph of the processed variables to be screened.
In order to determine the influence of the 28 processed variables to be screened on the abnormal financial behavior, and to screen the screened variables having a large influence on the characteristic of the abnormal financial behavior, it is required to determine a first frequency distribution of the processed variables to be screened in the abnormal financial data in which the abnormal financial behavior is recorded and a second frequency distribution of the processed variables in the financial data in which the abnormal financial behavior is not recorded, to eliminate the processed variables to be screened having no significant difference between the corresponding first frequency distribution and the corresponding second frequency distribution, and to use the remaining 17 processed variables to be screened, i.e., V1, V2, V3, V4, V5, V6, V7, V9, V10, V11, V12, V14, V16, V17, V18, V19, and V26, as the screened variables. For the convenience of understanding by those skilled in the art, fig. 4(a), fig. 4(b), fig. 4(c), fig. 4(d), fig. 4(e), fig. 4(f), fig. 4(g), fig. 4(h), fig. 4(i), fig. 4(j), and fig. 4(k) provide frequency distribution histograms of the rejected processed variables to be screened, V8, V13, V15, V20, V21, V22, V23, V24, V25, V27, and V28 in two cases, and it can be seen that the first frequency distribution case corresponding to the rejected processed variables to be screened has no obvious difference from the corresponding second frequency distribution case.
Step S304, data preprocessing.
In the historical financial data sets, the data volume in the first financial data set is only hundreds of data volumes, and the data volume in the second financial data set is more and can reach hundreds of thousands of data volumes, so in order to ensure the data balance of the sample set for model training, the financial data in the second financial data set needs to be screened. In the step, the conversion time variable Hour and the initial Amount variable Amount need to be subjected to standardized processing to obtain a preprocessed Amount variable amunt and a preprocessed time variable Hour, and financial data of preset data volume matched with the Hour value are screened out in the second financial data set according to the Hour value corresponding to each abnormal financial data in the second financial data set to obtain a third financial data set. The screening principle is that data with the same hour numerical value are taken out in a second financial data set in a head traversal mode, only one data is taken at a time, if matched financial data cannot be found, the hour numerical value is skipped, matching is continued to be carried out on the next hour numerical value, after the hour numerical values corresponding to different financial data are matched once, the second financial data set is traversed in the head again until the financial data with preset data volume are taken out.
In step S305, the data set is divided.
After a third financial data set is obtained through screening, dividing the data into a plurality of parts according to a preset data dividing proportion, such as 4: and 1, carrying out data division on the first financial data set and the third financial data set into a training sample set and a testing sample set. Training the financial data monitoring model to be trained through the target variable and the data type sample label corresponding to the training sample to obtain the target financial data monitoring model, and calculating the precision of the target financial data monitoring model according to the test sample set.
The target variables include the 17 screened variables and the pretreatment amount variable amount, and the pretreatment time variable hour has 19 variables.
And S306, setting parameters of the financial data monitoring model to be trained.
Because the XGboost model solves the problem of overfitting and improves the model precision, the XGboost model can be adopted to establish the financial data monitoring model.
The XGBoost model inherits the characteristics of the decision tree, and generally sets four parameters including Objective function, Max _ depth, nrouns and Eta. Max _ depth represents the depth of the tree, and since the target variable of the application is 19 variables, the depth of the general tree setting is not higher than one third of the number of the variables, so that the depth of the XGboost model tree in the application is 6. Objective represents an Objective function, and since the model monitoring result of the application is only two types, the Objective function adopted in the model training process is binary logistic (regression analysis). Nrounds is the number of trees, and the number of trees is set to 25, with an average of 4 branches at one level, as calculated by Max _ depth of 6. Eta is a learning rate, the overfitting condition of the model can be caused by an excessively high learning rate, and the poor training condition of the model can be caused by an excessively low learning rate, so that the learning rate of the XGboost model in the application is set to be 0.5.
And step S307, modeling.
The XGboost model is established by taking 19 variables of V1, V2, V3, V4, V5, V6, V7, V9, V10, V11, V12, V14, V16, V17, V18, V19, V26, hour and amuunt as independent variables and Y (whether abnormal financial behaviors are recorded or not) as dependent variables. The XGboost model can sequence the importance of the prediction model, and the XGboost model uses a gradient lifting algorithm, so that the characteristic importance is essentially calculated by adopting the gradient lifting algorithm. In general, the importance score measures the value of a feature in the construction of a boosted decision tree in a model. The more an attribute (i.e., variable) is applied to construct a decision tree in the model, the higher its importance is.
The attribute importance is obtained by counting and sorting each attribute in the data set. The importance of attributes is computed in a single decision tree by the amount that each attribute split point improves the performance metric, weighted and recorded by the node. That is, the larger the improved performance metric of an attribute to a split point (the closer to the root node), the larger the weight; the more promotion trees that are selected, the more important the attribute is. The performance metric may be Gini (kini) purity of the selected split node or other metric function.
Finally, the results of an attribute in all the lifting trees are weighted and summed and then averaged to obtain an importance score, and the Gain is used in the present application to measure the contribution of each variable to the model, and the result is shown in fig. 5. As can be seen from the attribute importance scores corresponding to the variables in FIG. 5, the importance score of the V14 variable exceeds 75%, which is the most important index for determining whether abnormal financial behavior is recorded in the financial data to be monitored. As can be seen from fig. 5, the three variables V14, V10 and V17 all exceed 2% of the dependent variable for abnormal financial behavior, and the influence on the dependent variable is obvious.
Step S308, model application and precision calculation
In order to calculate the accuracy of the monitoring result of whether the trained XGBoost model (target financial data monitoring model) records abnormal financial behavior on the financial data to be monitored, a test sample set is substituted into the trained XGBoost model, and the accuracy of the model is calculated in a confusion matrix manner, where an actual result is shown in fig. 6. A0 indicates that the financial data has no abnormal financial behavior recorded, and a 1 indicates that the financial data has abnormal financial behavior recorded.
As can be seen from the results in fig. 6, the accuracy of the trained XGBoost model is 97.24%. The probability that the financial data has no abnormal financial behavior recorded but is mispredicted as having abnormal financial behavior recorded is 0.65%, and the probability that the financial data has abnormal financial behavior recorded but is mispredicted as having no abnormal financial behavior recorded is 10.23%.
To further determine the accuracy of the model, the accuracy of the model is further confirmed by using an AUC (Area Under Curve, defined as the Area Under the ROC (receiver operating characteristic Curve)) Curve. To facilitate understanding by those skilled in the art, fig. 7 provides ROC curves of the trained XGBoost model, with specificity (specificity) on the abscissa and sensitivity (sensitivity) on the ordinate, and it can be seen that the AUC value of the model in this application is 0.946. Generally, the AUC value is less than 0.7, which indicates that the model precision is not high and the practical significance is not large. If the AUC value is more than or equal to 0.7 and less than or equal to 0.9, the model precision is higher, the performance is good, and whether abnormal financial behaviors are recorded in the financial data to be monitored can be well predicted. If the value is more than 0.9, the model precision is high, and the performance is excellent. As can be seen from fig. 7, the AUC value of the model in the present application is 0.946 and is greater than 0.9, which indicates that the trained XGBoost model has strong prediction capability and high model prediction accuracy.
Based on the same inventive concept, the embodiment of the application also provides a training device of the financial data monitoring model, which is used for realizing the training method of the financial data monitoring model. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme recorded in the method, so specific limitations in the following embodiment of the training device for one or more financial data monitoring models can be referred to the limitations on the training method for the financial data monitoring models, and are not described herein again.
In one embodiment, as shown in fig. 8, there is provided a training device for a financial data monitoring model, comprising: a first determination module 810, a second determination module 820, a screening module 830, and a training module 840, wherein:
a first determining module 810 for determining a first financial data set and a second financial data set among the historical financial data sets; the first financial data set comprises abnormal financial data recorded with abnormal financial behaviors; the second financial data set includes financial data for which no abnormal financial behavior is recorded.
A second determining module 820, configured to determine a target recording time point corresponding to each of the abnormal financial data.
A screening module 830, configured to screen out financial data of a preset data amount in the second financial data set according to each target recording time point, so as to serve as a third financial data set; and the recording time point corresponding to each financial data in the third financial data set is matched with each target recording time point.
The training module 840 is used for training the financial data monitoring model to be trained according to the first financial data set and the third financial data set to obtain a target financial data monitoring model; and the target financial data monitoring model is used for determining whether abnormal financial behaviors are recorded in the financial data to be monitored.
In another embodiment, there is provided a financial data monitoring apparatus comprising: the acquisition module is used for acquiring financial data to be monitored; the data value determining module is used for determining the corresponding financial data value of the financial data to be monitored in the target variable; the input module is used for inputting the financial data value into a target financial data monitoring model to obtain the abnormal probability corresponding to the financial data to be monitored; the target financial data monitoring model is obtained according to a training method of the financial data monitoring model; and the judging module is used for judging that the financial data record to be monitored has abnormal financial behaviors if the abnormal probability is greater than a preset abnormal probability threshold value.
The modules in the training device of the financial data monitoring model can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 9. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store historical financial data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of training a financial data monitoring model.
It will be appreciated by those skilled in the art that the configuration shown in fig. 9 is a block diagram of only a portion of the configuration associated with the present application, and is not intended to limit the computing device to which the present application may be applied, and that a particular computing device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In an embodiment, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.

Claims (12)

1. A method of training a financial data monitoring model, the method comprising:
determining a first financial data set and a second financial data set in the historical financial data set; the first financial data set comprises anomalous financial data in which anomalous financial behavior is recorded; the second financial data set comprises financial data for which no abnormal financial behavior is recorded;
determining a target recording time point corresponding to each abnormal financial data;
according to each target recording time point, screening financial data with preset data volume in the second financial data set to serve as a third financial data set; recording time points corresponding to the financial data in the third financial data set are matched with the target recording time points;
training a financial data monitoring model to be trained according to the first financial data set and the third financial data set to obtain a target financial data monitoring model; and the target financial data monitoring model is used for determining whether abnormal financial behaviors are recorded in the financial data to be monitored.
2. The method according to claim 1, wherein the screening out financial data of a preset data amount in the second financial data set as a third financial data set according to each target recording time point comprises:
screening the financial data in the second financial data set according to each target recording time point, and screening out a piece of financial data matched with each target recording time point to obtain a financial data subset;
and if the financial data volume in the financial data subset is smaller than the preset data volume, performing the step of screening the financial data of the second financial data set according to each target recording time point until the financial data volume in the financial data subset is equal to the preset data volume to obtain a third financial data set.
3. The method of claim 2, wherein the target recording time point comprises a current target recording time point; the step of screening the financial data in the second financial data set according to each target recording time point to screen out a piece of financial data matched with each target recording time point to obtain a financial data subset includes:
if the financial data matched with the current target recording time point does not exist in the second financial data set, screening the financial data in the second financial data set according to the next target recording time point of the current target recording time point;
and if a plurality of candidate financial data matched with the current target recording time point exist in the second financial data set, adding any one of the candidate financial data into the financial data subset.
4. The method of claim 1, further comprising:
dividing the historical financial data set into the first financial data set and the second financial data set according to the data type labels corresponding to the historical financial data in the historical financial data set; the data type tag is used for determining whether abnormal financial behaviors are recorded in the historical financial data;
determining variables to be screened corresponding to the historical financial data;
performing abnormal correlation screening on the variables to be screened according to the distribution condition of the variables to be screened in the first financial data set and the distribution condition of the variables to be screened in the second financial data set to obtain screened variables;
determining a target variable according to the screened variable; the target variable is used for training the financial data monitoring model to be trained.
5. The method according to claim 4, wherein the performing abnormal relevance screening on the variables to be screened according to the distribution of the variables to be screened in the first financial data set and the distribution of the variables to be screened in the second financial data set to obtain screened variables comprises:
performing characteristic dimension reduction processing on the variable to be screened to obtain the processed variable to be screened;
acquiring a first frequency distribution condition of each processed variable to be screened in the first financial data set and a second frequency distribution condition of each processed variable to be screened in the second financial data set;
determining the difference between the first frequency distribution condition corresponding to each processed variable to be screened and the second frequency distribution condition corresponding to each processed variable to be screened;
and taking the processed variable to be screened, in which the difference between the corresponding first frequency distribution condition and the corresponding second frequency distribution condition meets a preset difference condition, as the screened variable.
6. The method of claim 4, wherein said determining target variables from said filtered variables comprises:
determining an initial time variable corresponding to the historical financial data;
performing unit conversion processing on the initial time variable to obtain a conversion time variable; the time unit of the conversion time variable is greater than the time unit of the initial time variable;
standardizing the conversion time variable to obtain a preprocessing time variable;
and determining the target variable according to the screened variable and the pretreatment time variable.
7. A method of financial data monitoring, the method comprising:
acquiring financial data to be monitored;
determining a corresponding financial data value of the financial data to be monitored in a target variable;
inputting the financial data value into a target financial data monitoring model to obtain an abnormal probability corresponding to the financial data to be monitored; the target financial data monitoring model is obtained according to the training method of the financial data monitoring model of any one of claims 1 to 6;
and if the abnormal probability is greater than a preset abnormal probability threshold value, judging that the financial data record to be monitored has abnormal financial behaviors.
8. A training device for a financial data monitoring model, the device comprising:
a first determining module for determining a first financial data set and a second financial data set in a historical financial data set; the first financial data set comprises anomalous financial data in which anomalous financial behavior is recorded; the second financial data set comprises financial data for which no abnormal financial behavior is recorded;
the second determining module is used for determining a target recording time point corresponding to each abnormal financial data;
the screening module is used for screening financial data with preset data volume in the second financial data set according to each target recording time point to serve as a third financial data set; recording time points corresponding to the financial data in the third financial data set are matched with the target recording time points;
the training module is used for training a financial data monitoring model to be trained according to the first financial data set and the third financial data set to obtain a target financial data monitoring model; and the target financial data monitoring model is used for determining whether abnormal financial behaviors are recorded in the financial data to be monitored.
9. A financial data monitoring device, wherein the device comprises:
the acquisition module is used for acquiring financial data to be monitored;
the data value determining module is used for determining the corresponding financial data value of the financial data to be monitored in the target variable;
the input module is used for inputting the financial data value into a target financial data monitoring model to obtain the abnormal probability corresponding to the financial data to be monitored; the target financial data monitoring model is obtained according to the training method of the financial data monitoring model of any one of claims 1 to 6;
and the judging module is used for judging that the financial data record to be monitored has abnormal financial behaviors if the abnormal probability is greater than a preset abnormal probability threshold value.
10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.
11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
12. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 7 when executed by a processor.
CN202210334899.5A 2022-03-31 2022-03-31 Training method and device of financial data monitoring model and computer equipment Pending CN114626553A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210334899.5A CN114626553A (en) 2022-03-31 2022-03-31 Training method and device of financial data monitoring model and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210334899.5A CN114626553A (en) 2022-03-31 2022-03-31 Training method and device of financial data monitoring model and computer equipment

Publications (1)

Publication Number Publication Date
CN114626553A true CN114626553A (en) 2022-06-14

Family

ID=81905012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210334899.5A Pending CN114626553A (en) 2022-03-31 2022-03-31 Training method and device of financial data monitoring model and computer equipment

Country Status (1)

Country Link
CN (1) CN114626553A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116757334A (en) * 2023-08-16 2023-09-15 江西科技学院 Financial data processing method, system, readable storage medium and computer
CN117436496A (en) * 2023-11-22 2024-01-23 深圳市网安信科技有限公司 Training method and detection method of anomaly detection model based on big data log

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116757334A (en) * 2023-08-16 2023-09-15 江西科技学院 Financial data processing method, system, readable storage medium and computer
CN116757334B (en) * 2023-08-16 2023-11-24 江西科技学院 Financial data processing method, system, readable storage medium and computer
CN117436496A (en) * 2023-11-22 2024-01-23 深圳市网安信科技有限公司 Training method and detection method of anomaly detection model based on big data log

Similar Documents

Publication Publication Date Title
CN107025596B (en) Risk assessment method and system
US10482079B2 (en) Data de-duplication systems and methods
CN108763277B (en) Data analysis method, computer readable storage medium and terminal device
CN114626553A (en) Training method and device of financial data monitoring model and computer equipment
CN113177700B (en) Risk assessment method, system, electronic equipment and storage medium
CN112559900A (en) Product recommendation method and device, computer equipment and storage medium
CN116307227A (en) Service information processing method, device and computer equipment
CN117010914A (en) Identification method and device for risk group, computer equipment and storage medium
CN114782201A (en) Stock recommendation method and device, computer equipment and storage medium
Mukherjee et al. Detection of defaulters in P2P lending platforms using unsupervised learning
CN113762294B (en) Feature vector dimension compression method, device, equipment and medium
Lai Default Prediction of Internet Finance Users Based on Imbalance-XGBoost
CN114140246A (en) Model training method, fraud transaction identification method, device and computer equipment
CN114881761A (en) Determination method of similar sample and determination method of credit limit
CN114792007A (en) Code detection method, device, equipment, storage medium and computer program product
CN113744042A (en) Credit default prediction method and system based on optimized Boruta and XGboost
CN114626940A (en) Data analysis method and device and electronic equipment
CN114722941A (en) Credit default identification method, apparatus, device and medium
Karlos et al. Semi-supervised forecasting of fraudulent financial statements
Li Credit card fraud identification based on unbalanced data set based on fusion model
CN113627653B (en) Method and device for determining activity prediction strategy of mobile banking user
EP4372593A1 (en) Method and system for anonymizsing data
CN117407691A (en) Construction method of user risk prediction model, user risk prediction method and device
CN116823430A (en) Critical credit granting system and method, electronic equipment and medium
CN118195769A (en) Product overdue state prediction method, device, computer equipment, storage medium and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination