CN113011973B - Method and equipment for financial transaction supervision model based on intelligent contract data lake - Google Patents

Method and equipment for financial transaction supervision model based on intelligent contract data lake Download PDF

Info

Publication number
CN113011973B
CN113011973B CN202110084721.5A CN202110084721A CN113011973B CN 113011973 B CN113011973 B CN 113011973B CN 202110084721 A CN202110084721 A CN 202110084721A CN 113011973 B CN113011973 B CN 113011973B
Authority
CN
China
Prior art keywords
data
transaction
result
supervision
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110084721.5A
Other languages
Chinese (zh)
Other versions
CN113011973A (en
Inventor
王乾宇
蔡维德
王荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202110084721.5A priority Critical patent/CN113011973B/en
Publication of CN113011973A publication Critical patent/CN113011973A/en
Application granted granted Critical
Publication of CN113011973B publication Critical patent/CN113011973B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Artificial Intelligence (AREA)
  • Development Economics (AREA)
  • Technology Law (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Economics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The application discloses a financial transaction supervision model, a system and equipment based on an intelligent contract data lake, wherein the model comprises four parts, namely a prophetic machine, the intelligent contract data lake, an intelligent controller and a machine learning engine. The specific flow is as follows: acquiring transaction data; the predictor judges whether the predictor accords with the uplink condition or not; the transaction data is judged to be compliant by a predictor and then is transmitted into an intelligent contract data lake, the intelligent contract data lake consists of a MySQL database, a cache database and an intelligent contract database, wherein the MySQL database stores transaction data, the cache database stores high-frequency high-risk transaction data, and the intelligent contract database stores supervision flow data; the transaction data sequentially execute the operations of early supervision, middle supervision and later supervision in the machine learning engine under the command of the intelligent controller; the machine learning engine classifies risks of the transaction by combining a KYC (user identity identification) method based on rules, an AML (money laundering) method and a machine learning algorithm, so that functions of link analysis, behavior modeling, risk early warning, anomaly detection and transaction scoring are realized, and a final judging result is obtained; the intelligent controller performs a transaction pass or a transaction withdraw operation according to the result.

Description

Method and equipment for financial transaction supervision model based on intelligent contract data lake
Technical Field
The application belongs to the field of blockchains, and particularly relates to a financial transaction supervision model, a system and equipment based on an intelligent contract data lake.
Background
As the important content of the modern financial supervision system, the money back washing is an important guarantee for maintaining the stability of the economy and society, is an important grip for practically preventing financial risks and optimizing the supervision effect of the industry, and is also an important means for participating in global treatment and expanding the bidirectional opening of the financial industry.
The common financial supervision platform consists of two parts: a customer identity recognition module and a money back-flushing module. The client identity recognition carries out identity recognition authentication on the client by a method of carrying out full job investigation, enhancing full job investigation, SWIFT filtering, fingerprint face and other biological information recognition on the client. The step needs to use a large amount of tools such as texts, electronic forms and the like to record and compare the client information, which consumes labor cost and time cost, reduces the viscosity of the client and the transaction efficiency, and causes privacy problems in certain countries and regions. The money back-flushing module comprises a traditional rule engine-based detection method and a detection scheme using machine learning technology. The traditional detection method based on the rule engine can not identify suspicious behaviors in low-frequency transfer transactions and complex transaction money laundering behaviors in massive transactions. And the machine learning algorithm has lower diagnostic accuracy for complex associated transactions and sporadic low frequency transactions.
From the present counter-money laundering mechanism of each financial institution, the problems of low customer identification efficiency, low informatization degree of counter-money laundering work, high counter-money laundering supervision cost, asynchronous and unshared related data among financial institutions and the like still exist. Therefore, innovation of intra-department supervision is realized by introducing a blockchain technology into links of daily customer identity identification, money laundering detection, transaction audit and the like of a financial institution, and digitization, automation and intelligent real-time supervision of supervision rules before, during and after transaction are realized by utilizing an intelligent contract.
Disclosure of Invention
In order to solve the problems in the prior art, the application provides a financial transaction supervision model, a system and equipment based on an intelligent contract data lake, which improve the supervision accuracy and efficiency and reduce the manpower resource and time cost.
The application provides a financial transaction supervision model based on an intelligent contract data lake, which comprises the following steps:
step S10, a machine learning data set with financial characteristic attributes is arranged from a UCI database, and a training data set and a test data set required by an experiment are obtained after data preprocessing;
step S20, the experimental data set is used as a data source to send a contract call to the Oraclize predictor, the Oraclize predictor is used for inquiring and checking compliance, then the experimental data set is imported into an intelligent contract data lake, and if the experimental data set does not meet the rule, the transaction is terminated;
in step S30, the experimental data are stored in MySQL database, cache database, and Smart contact database, respectively, according to the attribute, feature, category, and processing stage. The MySQL database stores all data types before experimental data execute supervision operation, and the Cache database stores data types called by short-interval high-frequency fine granularity, such as: relationship data, account data, tax data, history data, scoring data, blacklist/whitelist data, and the like. The Smart contact database stores the data types of the transaction characteristic attributes;
step S40, under the command of the intelligent controller, the characteristic data of different areas on the intelligent contract data are sequentially transmitted into the machine learning engine to execute the operations of early supervision, middle supervision and later supervision, which correspond to the following steps: KYC (customer identification), AML (backwash money detection), credit rating (Credit risk score). The intelligent controller firstly calls account data, grading data, blacklist/white list and other data cached in the Cache database, and performs KYC operation on the account data, grading data, blacklist/white list and other data. If the execution result passes, the step S50 is entered, otherwise, the transaction is terminated;
and step S50, under the command of the intelligent controller, feature attribute data in the Smart Contract database is subjected to AML operation, and transaction data is subjected to triple method arbitration decision of behavior modeling, link analysis and anomaly detection to determine an execution result. If the execution result passes, the step S60 is entered, otherwise, the transaction is terminated;
and step S60, under the command of the intelligent controller, the data executed in the step S50 and the characteristic attribute data mapped in the Smart contact database are executed with Credit grazing operation, and the transaction data obtain corresponding scores in the judgment of the scoring card model as the Credit score of the transaction. The intelligent controller stores the scoring result in a Cache database;
and step S70, the intelligent controller returns the final judging result of the machine learning engine to the Smart contact database so as to determine the final transaction result and display the transaction condition and the prediction accuracy.
In some preferred embodiments, the data preprocessing method in step S10 is:
and checking the data by adopting a head () method, processing the missing data, correspondingly adding a default value, deleting incomplete rows and columns, normalizing the data types, and storing the result.
In some preferred embodiments, in step S20, the oraclaze predictor calls different layers from bottom to top to perform the ping operation, and the logic structure is:
in the network topology structure of the centralized predictor, a single centralized server controls an intermediary node;
both the execution of the intelligent contracts and the data calls in the operational layer are made on Trusted Execution Environments (TEEs). The integrity is verified by tlsnotarry Proof by AWS acting as an audit role. Relying on multiple signature mechanisms to allow predictors (Oracles) meeting more than a minimum honest node number to simultaneously sign corresponding nodes;
the contract layer includes order matching contracts, service request contracts, data call interfaces, and service standard agreements.
In some preferred embodiments, step S40 "the intelligent controller firstly retrieves account data, score data, blacklist/whitelist and other data cached in the Cache database, and performs KYC operation on the data, the method is as follows:
step S401, performing Digital Onboarding (digital job entry) and SWIFT filtering operation on the incoming data, if the result is positive, entering step S402, and if the result is negative, returning to the step S, namely terminating the transaction;
step S402, CDD (customer due diligence) and EDD (enhanced due diligence) operations are carried out on the incoming data, if the result is positive, step S403 is carried out, and if the result is negative, the transaction is returned, namely, the transaction is terminated;
step S403, the incoming data is subjected to Whitelist/Blacklist Filter operation, if the result is passed, step S50 is entered, otherwise, the transaction is returned, namely, the transaction is terminated.
In some preferred embodiments, step S50 "the transaction data determines the execution result after arbitration decision by the triple methods of behavior modeling, link analysis, and anomaly detection" is as follows:
step S501, performing behavior modeling three-classification operation on transaction data by adopting an SVM algorithm (Support Vector Machine ), wherein the results are respectively: secure transactions, suspicious transactions, pending transactions. If the result is a safe transaction, the step S503 is entered; if the result is suspicious, returning, namely ending the transaction; if the result is pending transaction, entering step S502; the SVM algorithm adopts a Sigmoid kernel function, and the calculation method comprises the following steps:
wherein X is 1 ,X 2 Is data corresponding to two categories, κ (X 1 ,X 2 ) Is the filling condition of the positive definite kernel, a is used for setting gamma parameter in the kernel function, the default value is 1/k (k is category number), b is used for setting coef0 in the kernel function, and the default value is 0;
step S502, performing link analysis operation on the transaction to be ordered by adopting a MaxEnt algorithm (Max Entropy, maximum Entropy), and if the classification result is a safe transaction, entering step S503; if the result is suspicious, returning, namely ending the transaction;
step S503, adoptThe Bayesian algorithm performs anomaly detection on the safe transaction, if the result is positive, the step S60 is entered, and if the result is negative, the transaction is returned, namely, the transaction is terminated.
In some preferred embodiments, step S60 "the transaction data gets a corresponding score in the evaluation of the scoring card model" is performed by:
step S601, a bucket division method is adopted, a corresponding attribute is assigned to each processing value, and the numerical characteristics are converted into classification characteristics;
in step S602, the evidence weight (WoE) of each attribute and the Information Value (IV) of each feature point are calculated, and the calculation formula of the evidence weight is:
[ln(Distr G/Distr B)]×100
wherein G represents a customer transaction pass target variable=0, and b represents a customer transaction reject target variable=1;
the information value calculation formula is:
wherein G represents a customer transaction pass target variable=0, and b represents a customer transaction reject target variable=1;
step S603, modeling by replacing the original variable value with WoE, where the model selects LR (Logistic regression ), and the expression is:
wherein y is the probability of tag A, x is the predicted tag, w is the training parameter, w T Is a weight;
step S604, the cross-validation and grid search adjustment parameters are adopted to convert the cross-validation and grid search adjustment parameters into two classification problems, and the loss function is as follows:
wherein F (w) is a loss function, n is a sample number, y n For a sample label, p is the corresponding probability;
step S605, calculating scoring coefficients of the scoring cards for each attribute to obtain the final scoring cards. The score formula is:
where β is the LR coefficient for a given attribute, α is the LR intercept, woE is the evidence weight for a given attribute, n is the number of model features, factor, offset is the scaling parameter.
In some preferred embodiments, the smart contract processing method of step S70 is:
step S701, checking whether the information submitted when the user is established and the transaction amount are truly legal;
step S702, checking whether the initiator and beneficiary of the transfer are legal users;
step S703, determining whether the contract operation continues to be executed according to the result returned by the machine learning engine.
In some preferred embodiments, the financial transaction regulatory model predictive accuracy information may be presented in a bar graph.
Another aspect of the present application provides a financial transaction monitoring system based on an intelligent contract data lake, the system comprising: the system comprises a data deep processing module, a feature marking module, a prophetic machine module, an intelligent contract data lake module, a machine learning engine module, an intelligent controller module and an accuracy display module;
the data deep processing module is configured to perform data cleaning and data preprocessing operation on the machine learning data set arranged in the UCI database, and store the result as a transaction data set;
the data characteristic marking module is configured to construct a six-dimensional characteristic data set from a transaction data set;
the predictor module is configured to query and check the transaction data through the Oraclize predictor, determine whether the transaction data are compliant, and then execute the next operation;
the intelligent Contract data lake module is configured to store transaction data in a MySQL database, a Cache database and a Smart contact database respectively according to the attribute, the characteristics, the category and the difference of the processing stages of the transaction data;
the machine learning engine module is configured to sequentially transmit the characteristic data of different areas into the machine learning engine to execute the operations of early supervision, middle supervision and later supervision;
the intelligent controller module is configured to enable the intelligent controller to have the joint operation functions of unified command data, algorithms, blocks and databases;
the accuracy display module is configured to display a final transaction structure, a transaction condition and a prediction accuracy.
In a third aspect of the application, a storage device is provided in which a program is stored, the program being adapted to be loaded and executed by a processor to implement a smart contract data lake-based financial transaction regulatory model as described above.
In a fourth aspect of the present application, there is provided a processing apparatus comprising: a processor and a memory;
the processor is adapted to execute a program, and the memory is adapted to store the program;
the program is adapted to be loaded and executed by the processor to implement a financial transaction supervision model based on the smart contract data lake described above.
The application has the beneficial effects that:
the application discloses a financial transaction supervision model based on an intelligent contract data lake. The method can solve the problems that the existing model algorithm has low recognition precision on small-amount high-frequency transactions, complex associated transactions and sporadic low-frequency transactions, and historical transactions are difficult to trace; the money laundering behavior of mass transactions and complex transaction means can be effectively identified; the algorithm singleness and the limitation of unexplainability are improved; the accuracy and efficiency of prediction are improved, the labor cost is reduced, and the pressure of the server is relieved.
Drawings
FIG. 1 is a schematic flow diagram of a financial transaction supervision model based on a smart contract data lake of the present application;
FIG. 2 is a system architecture diagram of a financial transaction supervision model based on an intelligent contract data lake of the present application;
FIG. 3 is an architectural diagram of a financial transaction supervision model based on a smart contract data lake of the present application;
FIG. 4 is a schematic diagram of a smart contract data lake in accordance with an embodiment of the financial transaction supervision model based on the smart contract data lake of the present application;
FIG. 5 is a logic diagram of a machine learning system of an embodiment of a financial transaction supervision model based on a smart contract data lake of the present application;
FIG. 6 is a graph of accuracy of predicted results for an embodiment of the application based on a financial transaction supervision model for a smart contract data lake.
Description of the preferred embodiments
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings.
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.
The application discloses a financial transaction supervision model based on an intelligent contract data lake. The method can solve the problems that the existing model algorithm has low recognition precision on small-amount high-frequency transactions, complex associated transactions and sporadic low-frequency transactions, and historical transactions are difficult to trace; the money laundering behavior of mass transactions and complex transaction means can be effectively identified; the algorithm singleness and the limitation of unexplainability are improved; the accuracy and efficiency of prediction are improved, the labor cost is reduced, and the pressure of the server is relieved.
In order to more clearly describe the financial transaction supervision model based on the intelligent contract data lake, the steps in the method embodiment of the application are described in detail below with reference to fig. 1.
Step S10, a machine learning data set with financial characteristic attributes is arranged from a UCI database, and a training data set and a test data set required by an experiment are obtained after data preprocessing;
the method comprises the steps of storing a machine learning data set with financial characteristic attributes in a UCI database by using the database, wherein the data set comprises six types of characteristics, namely: basic information, customer portrait, account dimension, transaction amount, transaction count dimension, opponent dimension.
In the preferred embodiment of the application, python is used for realizing the data cleaning pretreatment work, the head () method is adopted for checking the data, processing the missing data, correspondingly adding the default value, deleting incomplete rows and columns, normalizing the data types and complementing the missing items to obtain the transaction data.
Step S20, the experimental data set is used as a data source to send a contract call to the Oraclize predictor, the Oraclize predictor is used for inquiring and checking compliance, then the experimental data set is imported into an intelligent contract data lake, and if the experimental data set does not meet the rule, the transaction is terminated;
in the preferred embodiment of the application, the Oraclize predictor sequentially calls different layers from bottom to top to execute checking operation on transaction data, and the logic structure is as follows:
the network protocol is configured to control an intermediary node by a single centralized server; both the execution of the intelligent contracts and the data calls in the operational layer are made on Trusted Execution Environments (TEEs). The integrity is verified by tlsnotarry Proof by AWS acting as an audit role. Relying on a multiple signature mechanism to allow predictors (Oracles) meeting more than a minimum honest node number to simultaneously sign transaction data; the contract layer determines whether the service request contract passes.
In step S30, the experimental data are stored in MySQL database, cache database, and Smart contact database, respectively, according to the attribute, feature, category, and processing stage. The MySQL database stores all data types before experimental data execute supervision operation, and the Cache database stores data types called by short-interval high-frequency fine granularity, such as: relationship data, account data, tax data, history data, scoring data, blacklist/whitelist data, and the like. The Smart contact database stores the data types of the transaction characteristic attributes;
six-dimensional feature data in transaction data are stored in a MySQL database; storing information such as the client type, attribution, offshore account, high risk area, account opening time length and the like in a Cache database; and storing information such as multi-currency transactions, current rising and rapid increasing, large-amount consumption and rapid increasing, loan proportion, small-amount transfer statistics characteristics and the like related to the transaction characteristics into a Smart contact database so as to facilitate the acquisition and the calling of the intelligent controller. The system structure diagram is shown in fig. 2, and the detailed structure diagram is shown in fig. 3.
Step S40, under the command of the intelligent controller, the characteristic data of different areas on the intelligent contract data are sequentially transmitted into the machine learning engine to execute the operations of early supervision, middle supervision and later supervision, which correspond to the following steps: KYC (customer identification), AML (backwash money detection), credit rating (Credit risk score). The intelligent controller firstly calls account data, grading data, blacklist/white list and other data cached in the Cache database, and performs KYC operation on the account data, grading data, blacklist/white list and other data. If the execution result passes, the step S50 is entered, otherwise, the transaction is terminated;
in the step, transaction data are sequentially subjected to operations of KYC (corresponding to early supervision period), AML (corresponding to middle supervision period) and Crodit gradient (corresponding to later supervision period) in a machine learning engine under the command of an intelligent controller, and the trend of the next step is determined according to the result. The specific principle of operation can be seen with reference to fig. 4.
In a preferred embodiment of the present application, step S40 "the intelligent controller firstly retrieves account data, score data, blacklist/whitelist and other data cached in the Cache database, and performs KYC operation on the data, the method thereof is as follows:
step S401, performing Digital Onboarding (digital job entry) and SWIFT filtering operation on the incoming data, if the result is positive, entering step S402, and if the result is negative, returning to the step S, namely terminating the transaction;
in this step, the incoming data first passes through basic information in the model, such as: the information such as name, age, place of birth, transaction area, beneficiary account and the like performs preliminary digital job-entering operation, and then whether the identity of the client is compliant can be further verified by using a biological information identification mode such as face recognition and the like.
Step S402, CDD (customer due diligence) and EDD (enhanced due diligence) operations are carried out on the incoming data, if the result is positive, step S403 is carried out, and if the result is negative, the transaction is returned, namely, the transaction is terminated;
in this step, attributes such as home location, age bias, bank personnel, money laundering risk, multi-currency transaction, etc. in the incoming data perform CDD and EDD operations in the model.
Step S403, the incoming data is subjected to Whitelist/Blacklist Filter operation, if the result is passed, step S50 is entered, otherwise, the transaction is returned, namely, the transaction is terminated.
The system scores different clients in past historical transactions, and obtains a white list and a black list according to the client scores. This step quickly determines whether the transaction client is compliant by way of a query.
And step S50, under the command of the intelligent controller, feature attribute data in the Smart Contract database is subjected to AML operation, and transaction data is subjected to triple method arbitration decision of behavior modeling, link analysis and anomaly detection to determine an execution result. If the execution result passes, the step S60 is entered, otherwise, the transaction is terminated;
in the step, an arbitration model combining triple detection methods of behavior modeling, link analysis and anomaly detection is designed, 3 methods are used for voting together to determine whether the transaction finally passes, any one of the methods throws an anti-vote, and the transaction is terminated. The machine learning engine is formed by combining a KYC model in the early supervision stage and a scoring model in the later supervision stage, and the engine logic diagram is shown in fig. 5.
In a preferred embodiment of the present application, step S50 "the transaction data determines the execution result after arbitration decision by the triple methods of behavior modeling, link analysis, and anomaly detection", the method is as follows:
step S501, performing behavior modeling three-classification operation on transaction data by adopting an SVM algorithm (Support Vector Machine ), wherein the results are respectively: secure transactions, suspicious transactions, pending transactions. If the result is a safe transaction, the step S503 is entered; if the result is suspicious, returning, namely ending the transaction; if the result is pending transaction, entering step S502; the SVM algorithm adopts a Sigmoid kernel function, and the calculation method comprises the following steps:
wherein X is 1 ,X 2 Is data corresponding to two categories, κ (X 1 ,X 2 ) Is the filling condition of the positive definite kernel, a is used for setting gamma parameter in the kernel function, the default value is 1/k (k is category number), b is used for setting coef0 in the kernel function, and the default value is 0;
step S502, performing link analysis operation on the transaction to be ordered by adopting a MaxEnt algorithm (Max Entropy, maximum Entropy), and if the classification result is a safe transaction, entering step S503; if the result is suspicious, returning, namely ending the transaction;
step S503, adoptThe Bayesian algorithm performs anomaly detection on the safe transaction, if the result is positive, the step S60 is entered, and if the result is negative, the transaction is returned, namely, the transaction is terminated.
Taking a few transactions in the test dataset as an example, transactions numbered 0007, 0005, 0524
And the transaction is judged to be a safe transaction, and the transaction directly enters a scoring module in the later supervision stage. The transaction No. 0217, 0479 is judged as suspicious due to cross-region, cross-currency, account opening time period, and age bias of the customer, the transaction is rejected, and the result is returned directly to Smart Contract data lake.
And step S60, under the command of the intelligent controller, the data executed in the step S50 and the characteristic attribute data mapped in the Smart contact database are executed with Credit grazing operation, and the transaction data obtain corresponding scores in the judgment of the scoring card model as the Credit score of the transaction. The intelligent controller stores the scoring result in a Cache database;
in a preferred embodiment of the present application, step S60 "the transaction data obtains a corresponding score in the evaluation of the scoring card model", which is as follows:
step S601, a bucket division method is adopted, a corresponding attribute is assigned to each processing value, and the numerical characteristics are converted into classification characteristics;
in step S602, the evidence weight (WoE) of each attribute and the Information Value (IV) of each feature point are calculated, and the calculation formula of the evidence weight is:
[ ln (DistrG/DistrB) ]. Times.100 formula (2)
Wherein G represents a customer transaction pass target variable=0, and b represents a customer transaction reject target variable=1;
the information value calculation formula is:
wherein G represents a customer transaction pass target variable=0, and b represents a customer transaction reject target variable=1;
step S603, modeling by replacing the original variable value with WoE, where the model selects LR (Logistic regression ), and the expression is:
wherein y is the probability of tag A, x is the predicted tag, w is the training parameter, w T Is a weight;
step S604, the cross-validation and grid search adjustment parameters are adopted to convert the cross-validation and grid search adjustment parameters into two classification problems, and the loss function is as follows:
wherein F (w) is a loss function, n is a sample number, y n For a sample label, p is the corresponding probability;
step S605, calculating scoring coefficients of the scoring cards for each attribute to obtain the final scoring cards. The score formula is:
where β is the LR coefficient for a given attribute, α is the LR intercept, woE is the evidence weight for a given attribute, n is the number of model features, factor, offset is the scaling parameter.
Taking a certain number of transactions with the judgment result being the safe transaction as an example, setting three primary variables, wherein the first variable is a target variable, the binary classification variable and the rest variables are characteristics. In the link of feature prediction, 8 features are selected for model training according to the IV value, and the factor=28.85 and the offset= 487.14 are obtained after calculation. Determination of final scores is performed, for example: a trader is 45 years old, the liability rate is 0.5, and the monthly income is 50000 RMB. The score was 53+55+57=165, which can be whitelisted. Traders who had 2 suspicious transactions within 3 years are blacklisted.
And step S70, the intelligent controller returns the final judging result of the machine learning engine to the Smart contact database so as to determine the final transaction result and display the transaction condition and the prediction accuracy.
In a preferred embodiment of the present application, the smart contract processing method in step S70 is as follows:
step S701, checking whether the information submitted when the user is established and the transaction amount are truly legal;
step S702, checking whether the initiator and beneficiary of the transfer are legal users;
step S703, determining whether the contract operation continues to be executed according to the result returned by the machine learning engine.
In this step, if the transaction category is a safe transaction, the above operation is not manually executed, and if the transaction category is a suspicious transaction, a manual verification is required to determine whether the transaction on the intelligent contract passes. Transactions numbered 0007, 0005, 0524 were all passed. The transactions numbered 0217, 0479 all require a manual secondary decision.
In a preferred embodiment of the application, the prediction accuracy information of the financial transaction supervision model can be displayed by using a bar chart.
As shown in fig. 6, a bar chart of the accuracy of the transaction determined to be correct by the regulatory model according to the time change in this experiment is shown. It can be known that the prediction accuracy does not change with time, but is only related to the corresponding parameters, and the prediction accuracy of the algorithm can be effectively increased by adopting different parameters corresponding to different types of transactions.
The application relates to a financial transaction supervision system based on an intelligent contract data lake, which comprises a data deep processing module, a feature marking module, a predictor module, an intelligent contract data lake module, a machine learning engine module, an intelligent controller module and an accuracy display module, wherein the intelligent contract data lake module is used for processing data of a financial transaction system;
the data deep processing module is configured to perform data cleaning and data preprocessing operation on the machine learning data set arranged in the UCI database, and store the result as a transaction data set;
the data characteristic marking module is configured to construct a six-dimensional characteristic data set from a transaction data set;
the predictor module is configured to query and check the transaction data through the Oraclize predictor, determine whether the transaction data are compliant, and then execute the next operation;
the intelligent Contract data lake module is configured to store transaction data in a MySQL database, a Cache database and a Smart contact database respectively according to the attribute, the characteristics, the category and the difference of the processing stages of the transaction data;
the machine learning engine module is configured to sequentially transmit the characteristic data of different areas into the machine learning engine to execute the operations of early supervision, middle supervision and later supervision;
the intelligent controller module is configured to enable the intelligent controller to have the joint operation functions of unified command data, algorithms, blocks and databases;
the accuracy display module is configured to display a final transaction structure, a transaction condition and a prediction accuracy.
A storage device of a third embodiment of the present application has stored therein a program adapted to be loaded and executed by a processor to implement a financial transaction regulatory model based on a smart contract data lake as described above.
A processing apparatus of a fourth embodiment of the present application includes: a processor and a memory; the processor is adapted to execute a program, and the memory is adapted to store the program; the program is adapted to be loaded and executed by the processor to implement a financial transaction supervision model based on the smart contract data lake described above.
Those of skill in the art will appreciate that the various illustrative modules, method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the program(s) corresponding to the software modules, method steps, may be embodied in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation is not intended to be limiting.
The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus/apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus/apparatus.
Thus far, the technical solution of the present application has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present application is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present application, and such modifications and substitutions will fall within the scope of the present application.

Claims (10)

1. A method of a financial transaction supervision model based on an intelligent contract data lake, the method comprising:
step S10, a machine learning data set with financial characteristic attributes is arranged from a UCI database, and a training data set and a test data set required by an experiment are obtained after data preprocessing;
step S20, the experimental data set is used as a data source to send a contract call to the Oraclize predictor, the Oraclize predictor is used for inquiring and checking compliance, then the experimental data set is imported into an intelligent contract data lake, and if the experimental data set does not meet the rule, the transaction is terminated;
step S30, experimental data are respectively stored in a MySQL database, a Cache database and a Smart contact database according to the attribute, the characteristics, the category and the factors of the processing stage; the MySQL database stores all data types before experimental data execute supervision operation, and the Cache database stores data types called by short-interval high-frequency fine granularity, comprising: relationship data, account data, tax data, history data, scoring data, and blacklist/whitelist data; the Smart contact database stores the data types of the transaction characteristic attributes;
step S40, under the command of the intelligent controller, the characteristic data of different areas on the intelligent contract data are sequentially transmitted into the machine learning engine to execute the operations of early supervision, middle supervision and later supervision, which correspond to the following steps: KYC customer identity recognition, AML backwash money detection, credit rating Credit risk scoring; firstly, the intelligent controller invokes account data, scoring data and blacklist/whitelist data cached in a Cache database, and performs KYC operation on the account data, scoring data and blacklist/whitelist data; if the execution result passes, the step S50 is entered, otherwise, the transaction is terminated;
step S50, under the command of the intelligent controller, feature attribute data in a Smart Contract database are subjected to AML operation, and transaction data are subjected to triple method arbitration decision of behavior modeling, link analysis and anomaly detection to determine an execution result; if the execution result passes, the step S60 is entered, otherwise, the transaction is terminated;
step S60, under the command of the intelligent controller, the data executed in step S50 and the characteristic attribute data mapped in the Smart contact database are executed with Credit grazing operation, and the transaction data obtain corresponding scores in the judgment of the scoring card model as the Credit score of the transaction; the intelligent controller stores the scoring result in a Cache database;
and step S70, the intelligent controller returns the final judging result of the machine learning engine to the Smart contact database so as to determine the final transaction result and display the transaction condition and the prediction accuracy.
2. The method of claim 1, wherein the data preprocessing method in step S10 is as follows: and checking the data by adopting a head () method, processing the missing data, correspondingly adding a default value, deleting incomplete rows and columns, normalizing the data types, and storing the result.
3. The method of claim 1, wherein in step S20, the oraclaze predictor sequentially calls different layers from bottom to top to perform the checking operation, and the logic structure is as follows:
in the network topology structure of the centralized predictor, a single centralized server controls an intermediary node;
the operation and data call of the intelligent contract in the operation layer are carried out on the trusted execution environment TEEs; the AWS plays a role in examination, and the integrity is verified through the TLSNotray Proof; relying on multiple signature mechanisms to enable predictors Oracles meeting more than the minimum honest node number to simultaneously sign corresponding nodes;
the contract layer includes order matching contracts, service request contracts, data call interfaces, and service standard agreements.
4. The method of claim 1, wherein step S40 "the intelligent controller firstly retrieves the account data, the score data and the blacklist/whitelist data cached in the Cache database and performs the KYC operation thereon", and the method comprises:
step S401, performing Digital Onboarding digital job entry and SWIFT filtering operation on the incoming data, if the result is positive, entering step S402, and if the result is negative, returning, namely terminating the transaction;
step S402, performing CDD customer due-job investigation and EDD enhancement due-job investigation on the incoming data, if the result is positive, entering step S403, and if the result is negative, returning to the step S, namely terminating the transaction;
step S403, performing Whitelist/Blacklist Filter white list/Blacklist filtering operation on the incoming data, if the result passes, entering step S50, otherwise, returning to terminate the transaction.
5. The method of claim 1, wherein step S50 "the transaction data determines the execution result after performing three methods of modeling, link analysis, and anomaly detection, respectively, and the method comprises:
step S501, performing behavior modeling three-classification operation on transaction data by adopting an SVM support vector machine algorithm, wherein the results are respectively: secure transactions, suspicious transactions, pending transactions; if the result is a safe transaction, the step S503 is entered; if the result is suspicious, returning, namely ending the transaction; if the result is pending transaction, entering step S502; the SVM algorithm adopts a Sigmoid kernel function, and the calculation method comprises the following steps:
wherein X is 1 ,X 2 Is data corresponding to two categories, κ (X 1 ,X 2 ) Is the filling condition of the positive definite kernel, a is used for setting gamma parameter in the kernel function, the default value is 1/k, k is the category number, -b is used for setting coef0 in the kernel function, and the default value of coef0 is 0;
step S502, performing link analysis operation on the transaction to be ordered by adopting a MaxEnt maximum entropy algorithm, and if the classification result is a safe transaction, entering step S503; if the result is suspicious, returning, namely ending the transaction;
step S503, adoptThe Bayesian algorithm performs anomaly detection on the safe transaction, if the result is positive, the step S60 is entered, and if the result is negative, the transaction is returned, namely, the transaction is terminated.
6. The method of claim 1, wherein step S60 "the transaction data gets the corresponding score in the evaluation of the scoring card model" is as follows:
step S601, a bucket division method is adopted, a corresponding attribute is given to each processing value, and the numerical characteristics are converted into classification characteristics;
in step S602, the evidence weight WoE of each attribute and the information value IV of each feature point are calculated, and the calculation formula of the evidence weight is:
[ln(DistrG/DistrB)]×100
wherein G represents a customer transaction pass target variable=0, and b represents a customer transaction reject target variable=1;
the information value calculation formula is:
wherein G represents a customer transaction pass target variable=0, and b represents a customer transaction reject target variable=1;
step S603, the values of the original variables are replaced by WoE for modeling, and the model selects LR logistic regression, and the expression is as follows:
wherein y is the probability of tag A, x is the predicted tag, w is the training parameter, w T Is a weight;
step S604, the cross-validation and grid search adjustment parameters are adopted to convert the cross-validation and grid search adjustment parameters into two classification problems, and the loss function is as follows:
wherein F (w) is a loss function, m is a sample number, y m For a sample label, p is the corresponding probability;
step S605, calculating scoring coefficients of the scoring cards for each attribute to obtain final scoring cards; the score formula is:
where β is the LR coefficient for a given attribute, α is the LR intercept, woE is the evidence weight for a given attribute, r is the number of model features, factor, offset is the scaling parameter.
7. The method of claim 1, wherein the intelligent contract processing method of step S70 is as follows:
step S701, checking whether the information submitted when the user is established and the transaction amount are truly legal;
step S702, checking whether the initiator and beneficiary of the transfer are legal users;
step S703, determining whether the contract operation continues to be executed according to the result returned by the machine learning engine.
8. The method of claim 1, wherein the prediction accuracy information of the financial transaction supervision model is displayed in a bar chart.
9. A storage device having a program stored therein, wherein the program is adapted to be loaded and executed by a processor to implement a method of a smart contract data lake-based financial transaction supervision model of any one of claims 1-8.
10. A processing apparatus, comprising
A processor adapted to execute a program, and
a memory adapted to store the program;
wherein the program is adapted to be loaded and executed by a processor to implement:
a method of a financial transaction supervision model based on a smart contract data lake as recited in any one of claims 1-8.
CN202110084721.5A 2021-01-21 2021-01-21 Method and equipment for financial transaction supervision model based on intelligent contract data lake Active CN113011973B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110084721.5A CN113011973B (en) 2021-01-21 2021-01-21 Method and equipment for financial transaction supervision model based on intelligent contract data lake

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110084721.5A CN113011973B (en) 2021-01-21 2021-01-21 Method and equipment for financial transaction supervision model based on intelligent contract data lake

Publications (2)

Publication Number Publication Date
CN113011973A CN113011973A (en) 2021-06-22
CN113011973B true CN113011973B (en) 2023-08-29

Family

ID=76384624

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110084721.5A Active CN113011973B (en) 2021-01-21 2021-01-21 Method and equipment for financial transaction supervision model based on intelligent contract data lake

Country Status (1)

Country Link
CN (1) CN113011973B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12045755B1 (en) 2022-12-29 2024-07-23 Consumerinfo.Com, Inc. Pre-data breach monitoring

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10339527B1 (en) 2014-10-31 2019-07-02 Experian Information Solutions, Inc. System and architecture for electronic fraud detection
US10699028B1 (en) 2017-09-28 2020-06-30 Csidentity Corporation Identity security architecture systems and methods
CN113706154A (en) * 2021-08-12 2021-11-26 支付宝(杭州)信息技术有限公司 Transaction risk detection method, device and equipment
CN114219446A (en) * 2021-12-17 2022-03-22 中国建设银行股份有限公司 Information processing method, device, equipment and medium
CN114565443B (en) * 2022-04-28 2022-09-27 深圳高灯计算机科技有限公司 Data processing method, data processing device, computer equipment and storage medium
CN115082076A (en) * 2022-07-04 2022-09-20 北京天德科技有限公司 Three-stage financial violation multiple judgment method based on block chain
CN115170139B (en) * 2022-07-04 2023-07-18 北京天德科技有限公司 Three-stage financial violation multiple judge system based on blockchain data lake

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537667A (en) * 2018-04-09 2018-09-14 深圳前海微众银行股份有限公司 Financial asset anti money washing management-control method, equipment and storage medium based on block chain
CN110210968A (en) * 2019-05-21 2019-09-06 北京航空航天大学 Intelligent Service transaction system
WO2020102395A1 (en) * 2018-11-14 2020-05-22 C3.Ai, Inc. Systems and methods for anti-money laundering analysis
CN111667368A (en) * 2020-05-29 2020-09-15 中国工商银行股份有限公司 Anti-money laundering monitoring system and method
CN111768305A (en) * 2020-06-24 2020-10-13 中国工商银行股份有限公司 Anti-money laundering identification method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107085812A (en) * 2016-12-06 2017-08-22 雷盈企业管理(上海)有限公司 The anti money washing system and method for block chain digital asset
US11257073B2 (en) * 2018-01-31 2022-02-22 Salesforce.Com, Inc. Systems, methods, and apparatuses for implementing machine learning models for smart contracts using distributed ledger technologies in a cloud based computing environment
US20200167860A1 (en) * 2018-11-22 2020-05-28 Maria E. Lau Automated Anti-Money Laundering Compliance SaaS

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537667A (en) * 2018-04-09 2018-09-14 深圳前海微众银行股份有限公司 Financial asset anti money washing management-control method, equipment and storage medium based on block chain
WO2020102395A1 (en) * 2018-11-14 2020-05-22 C3.Ai, Inc. Systems and methods for anti-money laundering analysis
CN110210968A (en) * 2019-05-21 2019-09-06 北京航空航天大学 Intelligent Service transaction system
CN111667368A (en) * 2020-05-29 2020-09-15 中国工商银行股份有限公司 Anti-money laundering monitoring system and method
CN111768305A (en) * 2020-06-24 2020-10-13 中国工商银行股份有限公司 Anti-money laundering identification method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
STO:重新定义证券与泛金融工具的发轫;郭艳 等;经济研究参考(第17期);第59-72页 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12045755B1 (en) 2022-12-29 2024-07-23 Consumerinfo.Com, Inc. Pre-data breach monitoring

Also Published As

Publication number Publication date
CN113011973A (en) 2021-06-22

Similar Documents

Publication Publication Date Title
CN113011973B (en) Method and equipment for financial transaction supervision model based on intelligent contract data lake
EP3627400A1 (en) Continuous learning neural network system using rolling window
WO2019108603A1 (en) Machine learning techniques for evaluating entities
CN106067088A (en) E-bank accesses detection method and the device of behavior
CN110163242B (en) Risk identification method and device and server
CN105308640A (en) Methods and systems for automatically generating high quality adverse action notifications
CN107679997A (en) Method, apparatus, terminal device and storage medium are refused to pay in medical treatment Claims Resolution
CN112150298B (en) Data processing method, system, device and readable medium
Chen et al. CatBoost for fraud detection in financial transactions
CN112488817A (en) Financial default risk assessment method and system based on refusal inference
CN110930038A (en) Loan demand identification method, loan demand identification device, loan demand identification terminal and loan demand identification storage medium
Li et al. Theory and application of artificial intelligence in financial industry
Chukhrova et al. Nonparametric fuzzy hypothesis testing for quantiles applied to clinical characteristics of COVID‐19
CN110751316A (en) Method and device for predicting resolution result and terminal equipment
Haryono et al. Aspect-based sentiment analysis of financial headlines and microblogs using semantic similarity and bidirectional long short-term memory
CN112750038B (en) Transaction risk determination method, device and server
CN112801780A (en) Method, device and system for identifying international and international risk customers based on federal learning
CN117132383A (en) Credit data processing method, device, equipment and readable storage medium
Yangyudongnanxin Financial credit risk control strategy based on weighted random forest algorithm
CN116541792A (en) Method for carrying out group partner identification based on graph neural network node classification
CN109636572A (en) Risk checking method, device, equipment and the readable storage medium storing program for executing of bank card
Zang Construction of Mobile Internet Financial Risk Cautioning Framework Based on BP Neural Network
Zhao et al. Detecting fake reviews via dynamic multimode network
Minnoor et al. Nifty price prediction from Nifty SGX using machine learning, neural networks and sentiment analysis
Deng et al. Financial futures prediction using fuzzy rough set and synthetic minority oversampling technique

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant