CN110852884A - Data processing system and method for anti-money laundering recognition - Google Patents

Data processing system and method for anti-money laundering recognition Download PDF

Info

Publication number
CN110852884A
CN110852884A CN201911122194.1A CN201911122194A CN110852884A CN 110852884 A CN110852884 A CN 110852884A CN 201911122194 A CN201911122194 A CN 201911122194A CN 110852884 A CN110852884 A CN 110852884A
Authority
CN
China
Prior art keywords
data item
data
money laundering
module
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911122194.1A
Other languages
Chinese (zh)
Inventor
林佳仪
陈文�
沈思丞
曾途
吴桐
周凡吟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Business Big Data Technology Co Ltd
Original Assignee
Chengdu Business Big Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Business Big Data Technology Co Ltd filed Critical Chengdu Business Big Data Technology Co Ltd
Priority to CN201911122194.1A priority Critical patent/CN110852884A/en
Publication of CN110852884A publication Critical patent/CN110852884A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Technology Law (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a data processing system and a method for anti-money laundering identification, wherein the method comprises the following steps: preliminarily screening data items for anti-money laundering identification according to the related information of the enterprise; the related information includes non-funding transaction data; screening out final data items from the preliminarily screened data items based on a screening model; and constructing a money laundering identification model based on the final data item. The system or the method can improve the efficiency of anti-money laundering identification, is not limited by user fund transaction data, and has stronger applicability.

Description

Data processing system and method for anti-money laundering recognition
Technical Field
The invention relates to the technical field of data processing, in particular to a data processing system for anti-money laundering identification.
Quilt scenery technology
Currently, commercial banks rely heavily on single-side transaction data for anomaly monitoring. However, the current money laundering criminal means is continuously renewed and upgraded, the networking trend is obvious, online and offline are combined with each other, the money laundering criminal means does not need to be in direct contact with banks and is not limited by time, place and distance, in such a situation, a financial institution is difficult to supervise the fund flow direction of customers, and then money laundering behaviors are identified in a mode of monitoring fund transaction data and are not suitable any more. And the monitoring mode based on the transaction data of the fund end is offline monitoring, and depends on the discrimination of first-line staff to a great extent, such as customer identification, so that the monitoring efficiency is low and errors are easy to make.
Disclosure of Invention
The invention aims to overcome the defects of low efficiency, limited implementation and the like in the prior art, and provides a data processing system and a data processing method for anti-money laundering identification, which can improve the anti-money laundering monitoring efficiency and are not limited by money laundering means.
In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:
in one aspect, an embodiment of the present invention provides a data processing method for anti-money laundering identification, including the following steps:
preliminarily screening data items for anti-money laundering identification according to the related information of the enterprise; the related information includes non-funding transaction data;
screening out final data items from the preliminarily screened data items based on a screening model;
and constructing a money laundering identification model based on the final data item.
In the method, the related information comprises non-fund transaction data, namely, the method does not depend on or does not depend on the fund transaction data completely, and money laundering behavior identification is carried out based on other data of an enterprise, so that the method is more suitable for various money laundering forms at present, does not depend on manual operation, and is high in efficiency and high in accuracy.
In one embodiment, the step of screening out a final data item from the preliminarily screened-out data items based on a screening model includes: performing WOE binning for each data item; and calculating the IV value of each data item according to the WOE value of each box, and determining whether the data item is reserved according to the IV value, wherein the reserved data item is the final data item.
In another aspect, embodiments of the present invention also provide a data processing system for anti-money laundering identification, a data item primary-selection module, a data item culling module, and a model building module, wherein,
the data item primary selection module is used for primarily screening out data items for anti-money laundering identification according to the related information of the enterprise; the related information includes non-funding transaction data;
the data input end of the data item fine selection module is connected with the data output end of the data item primary selection module and is used for screening out the final data item from the preliminarily screened data items on the basis of the screening model;
and the data input end of the model construction module is connected with the data output end of the data item selection module and is used for constructing the money laundering identification model based on the final data item.
In still another aspect, the present invention also provides a computer-readable storage medium including computer-readable instructions, which, when executed, cause a processor to perform the operations of the method described in the present invention.
In another aspect, an embodiment of the present invention also provides an electronic device, including: a memory storing program instructions; and the processor is connected with the memory and executes the program instructions in the memory to realize the steps of the method in the embodiment of the invention.
Compared with the prior art, the system or the method provided by the invention is a new method, and the money laundering probability estimation is carried out based on the relevant information of the monitored enterprise, and the relevant information is the information which is inevitably existed and is unrelated with whether the enterprise has transaction data or not, so the system or the method provided by the invention is not limited by the transaction data of the enterprise, and can reliably monitor whether the money laundering behavior exists or not in the monitored enterprise. In addition, the system has universality, does not depend on the professional skill and/or the responsible degree of workers, is consistent in result obtained based on the same data, can realize automatic analysis after data is input, and improves the monitoring efficiency.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flowchart of a data processing method for anti-money laundering identification described in the embodiment.
FIG. 2 is a block diagram showing the configuration of a data processing system for anti-money laundering identification described in the embodiment.
Fig. 3 is a block diagram showing the components of the electronic apparatus described in the embodiment.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Before describing the data processing system for money laundering identification provided in the present embodiment, a brief description will be given of several terms referred to hereinafter.
The related party: the method refers to an object associated with the enterprise, which may be a natural person or an enterprise, and the association mainly refers to the relationship between the enterprises connected through investments or invested relationships and the relationship between the natural person and the enterprise connected through investments or incumbent relationships. The arbitrary relationship between the natural people and the enterprise includes legal people, board of directors, high management, supervision, etc.
Degree of association: the degree of association includes first degree association, second degree association, third degree association, and the like, the first degree association refers to direct association, the second degree association refers to indirect association based on the first degree association (primary indirect association), and the third degree association refers to indirect association based on the second degree association (secondary indirect association). For example, if Zhao is a corporate of company A and is also a prisoner of company B, then company A and Zhao are in a one-degree relationship, that is, Zhao is a one-degree relationship of company A, and company B is a two-degree relationship of company A.
Referring to fig. 1, the data processing method for anti-money laundering identification provided in the embodiment includes the steps of:
s101, preliminarily screening out data items for anti-money laundering identification according to the related information of the enterprises. Here, the related information of the enterprise includes non-fund transaction data or only non-fund transaction data, such as business information, official documents and court bulletins mainly referring to the enterprise and its associated parties.
The business information includes information such as business industry, region, business type, listed company information, registered capital, stockholder, etc. Therefore, data items capable of reflecting background strength of enterprises, such as enterprise types, industries, whether to be listed, whether to be a national enterprise, the number of unnatural shareholders and the number of national enterprises, the number of listed legal shareholders, the number of natural shareholders or legal representatives of enterprises investing movie and television and the number of similar financial companies and the like can be screened out.
The information included in the management information includes, for example, intellectual property information of an enterprise, bid and tender information, and business change information, and thus, data items reflecting the management risk of the enterprise, such as the number of patents, the bid and tender number, the number of business changes of the enterprise, the number of stockholder changes, and the like, can be screened out.
The information included in the referee document and the division announcement includes, for example, the number of enterprise referee documents, the total amount of money executed, the latest time of division announced, the number of division announcements, the number of information subjected to administrative penalties, the number of times of being listed in the business anomaly directory, and the number of times of losing credit, and thus, data items capable of reflecting the enterprise honest risk, such as the number of enterprise referee documents, the number of division announcements, the number of information subjected to administrative penalties, the number of times of being listed in the business anomaly directory, the number of times of losing credit, and the like, can be screened out. There are also subdivided fields for cases in official documents and court bulletin tables, such as case type, amount of execution, etc., from which the data values of these data items can be calculated.
Further, the business information, the official document, and the division announcement information include information of the target enterprise and information of the related party of the target enterprise, and thus, in addition to the data item reflecting the risk of the target enterprise itself, data items of the related party, such as the number of related parties at one time, the number of people who have executed a related method at one time, the number of times of losing credit of a related method at one time, the number of suspension businesses in a related party at one time, and the like, can be constructed.
It should be noted that the data items primarily screened out are not designed at will, but are summarized by analyzing the enterprise with money laundering behavior, and have certain statistical significance and natural attributes, so that the operability is achieved. For example, the number of national enterprises in the unnatural stockholders can reflect whether the target enterprise has a national resource background, and the probability that the enterprise with the national resource background has money laundering behavior is lower. For another example, the decision of the actual controller of the enterprise, such as a legal person and a stockholder, often has an important influence on the development of the enterprise, and the investment risk of the actual controller of the enterprise can be effectively evaluated by paying attention to the investment information of the actual controller of the enterprise, so that, for example, the data items of the number of natural people stockholders of the target company, such as video and other similar financial companies, have significance. For another example, paying attention to litigation information of a business and its associated parties can effectively evaluate the credit level and the breach risk of the business and its associated parties, so that, for example, a data item of the number of persons who are performed by the first-degree association method who lose credit is significant.
Of course, what the data items are primarily screened is, what the number of the data items is, and the method is not particularly limited, and the method provides a new idea that money laundering behavior recognition can be performed by means of non-fund transaction data except fund transaction data without depending on the fund transaction data.
S102, screening out final data items from the preliminarily screened data items based on the screening model. In this step, the final data items are screened mainly by IV (Information Value, which means Information Value or Information amount) Value.
Specifically, first, for each data item, WOE binning processing is performed. The values of the data items are also continuous, and some are discontinuous. For a data item with discontinuous values, it is generally the case that how many values are divided into how many bins, but the more bins are divided, the larger the calculation amount is, so when the discrete values are too many, values with similar or less values are merged, so that the number of bins is controlled within a threshold, for example, the number of bins is 6. For data items with continuous values, the data items are divided into a plurality of discrete classes, and one class is a sub-box. For example, the industry registered capital is a continuity data item, the registered capital is segmented (2000000,10000000,50000000), the missing value (d _ nan) is treated as a separate bin separately, and then the bins are [0, 2000000), [2000000, 10000000), [10000000,50000000), [50000000, + inf respectively]D _ nan, counting the number of samples in each segment of the bin, the number of bad samples and the number of good samples to calculate the woe value of the bin. Assuming good as a good sample (no money laundering action) and bad as a bad sample (money laundering action), woe is given by:
Figure BDA0002275747240000061
wherein # good (i) tableNumber of good samples labeled in ith bin, # good (t) total number of good samples; # bad (i) indicates the number of bad samples labeled in the ith bin, and # bad (T) indicates the total number of bad samples.
From the collected samples, the statistics are shown in table 1 below:
TABLE 1
Box separating label Number of samples Number of bad samples Number of good samples Bad sample ratio woe value
[0,2000000) 5539 300 5239 5.4% 1.174
[2000000,10000000) 5334 921 4413 17.3% -0.120
[10000000,50000000) 3382 1122 2260 33.2% -0.986
[50000000,+inf) 2136 824 1312 38.6% -1.221
d_nan 4942 166 4776 3.4% 1.673
Then, an IV value of each data item is calculated according to the WOE value of each bin, whether the data item is reserved is determined according to the IV value, and the reserved data item is the final data item. The IV value is the distinguishing capability of the data item on the good and bad samples, and represents the proportion difference of the good and bad samples of the data item in different value groups, and the larger the IV value is, the larger the distinguishing capability of the data item on the good and bad samples is. Typically, the IV values are measured as follows: IV is at [0.02,0.1), data items have weak discriminative power, IV is at [0.1,0.3), data items have medium discriminative power, IV is greater than or equal to 0.3, data items have strong discriminative power. The IV value is calculated by the formula:
Figure BDA0002275747240000071
still taking the data item of the registered capital as an example, the IV value of the registered capital is 1.02 calculated according to table 1, which shows that the registered capital variable has stronger distinguishing capability for good and bad samples and is reserved.
In addition, taking the data item of the number of times of enterprise type change as an example, one bin is set as below 1 time (including 1 time), and another bin is set as more than one time, if the statistics of the calculation results are shown in the following table 2:
TABLE 2
Box separating label Number of samples Number of bad samples Number of good samples Bad sample ratio woe value
[0,1] 21327 3330 17997 15.6% 0.001
(1,+inf) 6 3 3 50.0% -1.686
Then the calculation yields an IV value of 0.0012 for the registered capital field, indicating that the registered capital data item has very weak ability to distinguish between good and bad samples, and the indicator is rejected.
S103, constructing a money laundering identification model based on the final data item. The data items eventually used for anti-money laundering identification have been screened out through step S102, and then a money laundering identification model is built based on these data items.
The principle of establishing the money laundering identification model in the step is obtained based on continuous training of machine learning. Firstly, constructing and labeling samples, wherein the samples comprise money laundering samples and non-money laundering samples, and each sample comprises a data value of the final data item; and then modeling is carried out based on two-classification supervised learning and logistic regression to obtain the money laundering identification model.
In the process of constructing the sample, all enterprises in the illegal fund collection blacklist in the database and similar financial enterprises related to money laundering, masking, concealing crime results, earnings obtained by crimes, illegally absorbing public deposit crimes and the like in the Chinese referee document network library are extracted, and the enterprises with the registration state of 'persistent' are used as blacklist samples, and different types of similar financial companies are extracted from the list of the similar financial companies in the database according to the proportion and used as white lists. The samples used in the WOE value calculation may be used as the samples, and the number of the samples may be larger. The blacklist or whitelist enterprise samples each include data values of the data items screened in step S102.
During training, firstly randomly extracting data values of part of samples, inputting the data values into an initialized model, outputting to obtain a predicted value, calculating an error according to the predicted value and a true value, and adjusting model parameters according to the error, wherein the parameters comprise: and (3) a regularization coefficient (C), a regularization form (dependency), a tolerance (tolerance) and the like, randomly selecting data values of part of samples again, inputting the data values into the model after parameter adjustment, and repeatedly training until the error reaches a set threshold or the training frequency reaches a preset value. During the training process, businesses in the training sample that have money laundering behavior may be labeled as 1, and businesses that do not have money laundering behavior may be labeled as 0. During training, the data values of the data items of the training samples are input into the two-classification supervised learning model, the probability value of money laundering behaviors of each enterprise is output, and the two-classification supervised learning model with strong classification capability is finally obtained by continuously modifying the model parameters. For example, in the experimental examples, the AUC value of the model after training was 0.86, and the K-S value was 0.57. The trained meal money laundering identification module is utilized to identify whether the monitored enterprise has money laundering behaviors or not, and the probability value of the enterprise having the money laundering behaviors is predicted.
The data processing system for anti-money laundering identification is used for carrying out money laundering probability estimation based on relevant information of a monitored enterprise, and is not used for judging whether abnormal transaction behaviors exist based on traditional fund transaction data. The money laundering probability estimation is carried out based on the related information of the monitored enterprise, and the related information is the information which is inevitably existed and is unrelated with whether the enterprise has transaction data, so that the money laundering probability estimation is not limited by the fund transaction data of the enterprise, and whether the money laundering behavior of the monitored enterprise exists or not can be reliably monitored. In addition, the system has universality, does not depend on the professional skill and/or the responsibility degree of workers, and can achieve consistent results only based on the same data and realize automatic analysis after the data is input, so that the monitoring efficiency can be greatly improved.
Referring to fig. 2, based on the same inventive concept, the present embodiment also provides a data processing system for money laundering identification, which includes a data item primary selection module, a data item culling module, and a model construction module. Wherein the content of the first and second substances,
the data item primary selection module is used for primarily screening out data items for anti-money laundering identification according to the related information of the enterprise; the related information includes non-funding transaction data;
the data input end of the data item fine selection module is connected with the data output end of the data item primary selection module and is used for screening out the final data item from the preliminarily screened data items on the basis of the screening model;
and the data input end of the model construction module is connected with the data output end of the data item selection module and is used for constructing the money laundering identification model based on the final data item.
In one embodiment, the data item culling module includes a WOE binning module and an IV value calculation module, wherein the WOE binning module is to perform WOE binning for each data item; the IV value calculation module is used for calculating the IV value of each data item according to the WOE value of each sub-box, determining whether the data item is reserved according to the IV value, and the reserved data item is the final data item.
In one embodiment, the model building module comprises a sample building module and a model training module, wherein the sample building module is used for building and labeling samples, the samples comprise money laundering samples and non-money laundering samples, and each sample comprises a data value of the final data item; and the model training module carries out modeling based on two-classification supervised learning and logistic regression to obtain the money laundering identification model.
Since the data processing system for money laundering identification and the data processing method for money laundering identification are based on the same inventive concept, nothing in the description of the system herein is referred to, and the related contents in the description of the method are referred to.
As shown in fig. 3, the present embodiment also provides an electronic device, which may include a processor 51 and a memory 52, wherein the memory 52 is coupled to the processor 51. It is noted that this diagram is exemplary and that other types of structures may be used in addition to or in place of this structure to implement data extraction, report generation, communication, or other functionality.
As shown in fig. 3, the electronic device may further include: an input unit 53, a display unit 54, and a power supply 55. It is to be noted that the electronic device does not necessarily have to comprise all the components shown in fig. 3. Furthermore, the electronic device may also comprise components not shown in fig. 3, reference being made to the prior art.
The processor 51, also sometimes referred to as a controller or operational control, may comprise a microprocessor or other processor device and/or logic device, the processor 51 receiving input and controlling operation of the various components of the electronic device.
The memory 52 may be one or more of a buffer, a flash memory, a hard drive, a removable medium, a volatile memory, a non-volatile memory, or other suitable devices, and may store the configuration information of the processor 51, the instructions executed by the processor 51, the recorded table data, and other information. The processor 51 may execute a program stored in the memory 52 to realize information storage or processing, or the like. In one embodiment, a buffer memory, i.e., a buffer, is also included in the memory 52 to store the intermediate information.
The input unit 53 is used, for example, to provide the processor 51 with basic information of the respective users. The display unit 54 is used for displaying various results in the processing procedure, such as various index data, estimated probability values, etc., and may be, for example, an LCD display, but the present invention is not limited thereto. The power supply 55 is used to provide power to the electronic device.
Embodiments of the present invention further provide a computer readable instruction, where when the instruction is executed in an electronic device, the program causes the electronic device to execute the operation steps included in the method of the present invention.
Embodiments of the present invention further provide a storage medium storing computer-readable instructions, where the computer-readable instructions cause an electronic device to execute the operation steps included in the method of the present invention.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Those of ordinary skill in the art will appreciate that the various illustrative modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in the present application, it should be understood that the disclosed system may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A data processing method for anti-money laundering identification, comprising the steps of:
preliminarily screening data items for anti-money laundering identification according to the related information of the enterprise; the related information includes non-funding transaction data;
screening out final data items from the preliminarily screened data items based on a screening model;
and constructing a money laundering identification model based on the final data item.
2. The method of claim 1, wherein the step of screening out a final data item from the preliminarily screened-out data items based on a screening model comprises:
performing WOE binning for each data item;
and calculating the IV value of each data item according to the WOE value of each box, and determining whether the data item is reserved according to the IV value, wherein the reserved data item is the final data item.
3. The method of claim 1, wherein the step of constructing a money laundering identification model based on the final data item comprises:
constructing and labeling samples, wherein the samples comprise money laundering samples and non-money laundering samples, and each sample comprises the data value of the final data item;
and modeling based on two-classification supervised learning and logistic regression to obtain the money laundering identification model.
4. A data processing system for anti-money laundering identification, comprising a data item primary selection module, a data item culling module, and a model construction module, wherein,
the data item primary selection module is used for primarily screening out data items for anti-money laundering identification according to the related information of the enterprise; the related information includes non-funding transaction data;
the data input end of the data item fine selection module is connected with the data output end of the data item primary selection module and is used for screening out the final data item from the preliminarily screened data items on the basis of the screening model;
and the data input end of the model construction module is connected with the data output end of the data item selection module and is used for constructing the money laundering identification model based on the final data item.
5. The system of claim 4, wherein the data item culling module includes a WOE binning module and an IV value calculation module, wherein,
the WOE binning module is used for performing WOE binning processing on each data item;
the IV value calculation module is used for calculating the IV value of each data item according to the WOE value of each sub-box, determining whether the data item is reserved according to the IV value, and the reserved data item is the final data item.
6. The system of claim 4, wherein the model building module comprises a sample building module and a model training module, wherein,
the sample construction module is used for constructing and labeling samples, wherein the samples comprise money laundering samples and unwashed samples, and each sample comprises the data value of the final data item;
and the model training module carries out modeling based on two-classification supervised learning and logistic regression to obtain the money laundering identification model.
7. A computer readable storage medium comprising computer readable instructions that, when executed, cause a processor to perform the operations of the method of any of claims 4-6.
8. An electronic device, comprising:
a memory storing program instructions;
a processor coupled to the memory and executing the program instructions in the memory to implement the steps of the method of any of claims 1-3.
CN201911122194.1A 2019-11-15 2019-11-15 Data processing system and method for anti-money laundering recognition Pending CN110852884A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911122194.1A CN110852884A (en) 2019-11-15 2019-11-15 Data processing system and method for anti-money laundering recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911122194.1A CN110852884A (en) 2019-11-15 2019-11-15 Data processing system and method for anti-money laundering recognition

Publications (1)

Publication Number Publication Date
CN110852884A true CN110852884A (en) 2020-02-28

Family

ID=69601718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911122194.1A Pending CN110852884A (en) 2019-11-15 2019-11-15 Data processing system and method for anti-money laundering recognition

Country Status (1)

Country Link
CN (1) CN110852884A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113094407A (en) * 2021-03-11 2021-07-09 广发证券股份有限公司 Anti-money laundering identification method, device and system based on horizontal federal learning
CN113592499A (en) * 2021-01-29 2021-11-02 微梦创科网络科技(中国)有限公司 Internet money laundering confrontation method and device
CN113590597A (en) * 2021-07-16 2021-11-02 成都无糖信息技术有限公司 Identification method and equipment for analysis hierarchy division of network abnormal behavior key personnel

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8078517B1 (en) * 2008-10-21 2011-12-13 United Services Automobile Association Systems and methods for monitoring remittances for reporting requirements
CN104813355A (en) * 2012-08-27 2015-07-29 Y-S·宋 Transactional monitoring system
CN107358360A (en) * 2017-07-14 2017-11-17 成都农村商业银行股份有限公司 The abnormal traffic data screening method of anti money washing system
CN109345368A (en) * 2018-08-22 2019-02-15 中国平安人寿保险股份有限公司 Credit estimation method, device, electronic equipment and storage medium based on big data
CN109583773A (en) * 2018-12-04 2019-04-05 税友软件集团股份有限公司 A kind of method, system and relevant apparatus that taxpaying credit integral is determining
CN109597936A (en) * 2018-11-30 2019-04-09 成都数联铭品科技有限公司 A kind of new user's screening system and method
CN109741173A (en) * 2018-12-27 2019-05-10 深圳前海微众银行股份有限公司 Recognition methods, device, equipment and the computer storage medium of suspicious money laundering clique
CN110046993A (en) * 2018-12-15 2019-07-23 深圳壹账通智能科技有限公司 Illicit gain legalizes behavior monitoring method, system, computer installation and medium
WO2019157946A1 (en) * 2018-02-13 2019-08-22 阿里巴巴集团控股有限公司 Anti-money laundering method, apparatus, and device
CN110276618A (en) * 2019-06-28 2019-09-24 第四范式(北京)技术有限公司 The method and system for generating money laundering ancestor prediction model, predicting money laundering ancestor

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8078517B1 (en) * 2008-10-21 2011-12-13 United Services Automobile Association Systems and methods for monitoring remittances for reporting requirements
CN104813355A (en) * 2012-08-27 2015-07-29 Y-S·宋 Transactional monitoring system
CN107358360A (en) * 2017-07-14 2017-11-17 成都农村商业银行股份有限公司 The abnormal traffic data screening method of anti money washing system
WO2019157946A1 (en) * 2018-02-13 2019-08-22 阿里巴巴集团控股有限公司 Anti-money laundering method, apparatus, and device
CN109345368A (en) * 2018-08-22 2019-02-15 中国平安人寿保险股份有限公司 Credit estimation method, device, electronic equipment and storage medium based on big data
CN109597936A (en) * 2018-11-30 2019-04-09 成都数联铭品科技有限公司 A kind of new user's screening system and method
CN109583773A (en) * 2018-12-04 2019-04-05 税友软件集团股份有限公司 A kind of method, system and relevant apparatus that taxpaying credit integral is determining
CN110046993A (en) * 2018-12-15 2019-07-23 深圳壹账通智能科技有限公司 Illicit gain legalizes behavior monitoring method, system, computer installation and medium
CN109741173A (en) * 2018-12-27 2019-05-10 深圳前海微众银行股份有限公司 Recognition methods, device, equipment and the computer storage medium of suspicious money laundering clique
CN110276618A (en) * 2019-06-28 2019-09-24 第四范式(北京)技术有限公司 The method and system for generating money laundering ancestor prediction model, predicting money laundering ancestor

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
孙景;李志伟;刘炜;: "基于逻辑回归的企业大额可疑外汇资金交易识别模型", 上海金融, no. 06, pages 58 - 61 *
孙景等: "基于逻辑回归的企业大额可疑外汇资金交易识别模型" *
汪素南: "智能技术在金融市场溢出效应和反洗钱中的应用研究", 信息科技辑, pages 76 - 95 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113592499A (en) * 2021-01-29 2021-11-02 微梦创科网络科技(中国)有限公司 Internet money laundering confrontation method and device
CN113592499B (en) * 2021-01-29 2023-08-25 微梦创科网络科技(中国)有限公司 Internet money laundering countermeasure method and device
CN113094407A (en) * 2021-03-11 2021-07-09 广发证券股份有限公司 Anti-money laundering identification method, device and system based on horizontal federal learning
CN113590597A (en) * 2021-07-16 2021-11-02 成都无糖信息技术有限公司 Identification method and equipment for analysis hierarchy division of network abnormal behavior key personnel
CN113590597B (en) * 2021-07-16 2024-03-29 成都无糖信息技术有限公司 Identification method and equipment for analysis hierarchical division of key personnel of network abnormal behaviors

Similar Documents

Publication Publication Date Title
Yeh et al. Going-concern prediction using hybrid random forests and rough set approach
Lin et al. Detecting the financial statement fraud: The analysis of the differences between data mining techniques and experts’ judgments
US10552735B1 (en) Applied artificial intelligence technology for processing trade data to detect patterns indicative of potential trade spoofing
JP4897253B2 (en) Method for detecting business behavior patterns related to business entities
CN109242499A (en) A kind of processing method of transaction risk prediction, apparatus and system
CN110852884A (en) Data processing system and method for anti-money laundering recognition
Noviandy et al. Credit card fraud detection for contemporary financial management using xgboost-driven machine learning and data augmentation techniques
US20210373721A1 (en) Artificial intelligence assisted evaluations and user interface for same
Omidi et al. The efficacy of predictive methods in financial statement fraud
CN110163413A (en) Enterprise supervision and method for early warning, device, computer equipment and readable storage medium storing program for executing
Papik et al. Detection models for unintentional financial restatements
Rizki et al. Data mining application to detect financial fraud in Indonesia's public companies
CN111985937A (en) Method, system, storage medium and computer equipment for evaluating value information of transaction traders
WO2020118019A1 (en) Adaptive transaction processing system
Wong et al. Financial accounting fraud detection using business intelligence
CN112116464B (en) Abnormal transaction behavior analysis method and system based on event sequence frequent item set
Chen et al. Application of random forest, rough set theory, decision tree and neural network to detect financial statement fraud–taking corporate governance into consideration
CN106408325A (en) User consumption behavior prediction analysis method based on user payment information and system
Van Thiel et al. Artificial intelligent credit risk prediction: An empirical study of analytical artificial intelligence tools for credit risk prediction in a digital era
CN115526700A (en) Risk prediction method and device and electronic equipment
Abdullah et al. The product market strategy, value creation, and competitive advantages as a determinant factor of marketing performance
CN114997975A (en) Abnormal enterprise identification method, device, equipment, medium and product
Mendes et al. Insolvency prediction in the presence of data inconsistencies
CN112634048A (en) Anti-money laundering model training method and device
Jimeno-García et al. The failure processes and their relation to the business interruption moment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200228