CN110458376A - A kind of suspicious risk trade screening method and corresponding system - Google Patents

A kind of suspicious risk trade screening method and corresponding system Download PDF

Info

Publication number
CN110458376A
CN110458376A CN201810427250.1A CN201810427250A CN110458376A CN 110458376 A CN110458376 A CN 110458376A CN 201810427250 A CN201810427250 A CN 201810427250A CN 110458376 A CN110458376 A CN 110458376A
Authority
CN
China
Prior art keywords
client
transaction
feature
trading
suspicious
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810427250.1A
Other languages
Chinese (zh)
Inventor
王子剑
严武
陈龙
曹磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Noyue Intelligent Technology Co Ltd
Original Assignee
Shanghai Noyue Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Noyue Intelligent Technology Co Ltd filed Critical Shanghai Noyue Intelligent Technology Co Ltd
Priority to CN201810427250.1A priority Critical patent/CN110458376A/en
Publication of CN110458376A publication Critical patent/CN110458376A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Development Economics (AREA)
  • Artificial Intelligence (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Technology Law (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Probability & Statistics with Applications (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

A kind of suspicious risk trade screening method (200), this method comprises the following steps: known customer profile data (201) is obtained, wherein the known customer profile data includes known client trading information data and known client personal information data;Known client trading feature (202) is calculated by the known customer profile data;According to neighbouring sampling random forest method training client's suspicious actions screening model (203) of the known client trading characteristic use of the calculating;(204) are screened to the trading activity of object filtering client according to trained client's suspicious actions screening model.

Description

A kind of suspicious risk trade screening method and corresponding system
Technical field
The present invention relates to financial security analysis technical fields, more particularly to a kind of suspicious risk trade screening method and phase Answer system.
Background technique
Document CN201510947459.7 discloses a kind of suspicious transaction node set method for detecting and device, wherein by filling Divide the relevance considered between node and node, the relationship in the network topology structure between node between each node is come true Make the set of the suspicious transaction node in banking network.
In addition, document CN201510857280.2 disclose it is a kind of applied to anti money washing processing data processing system and side Method accesses different wherein carrying out Message processing, information receiving and transmitting, state confirmation by the message handled pending anti money washing Anti money washing system.
Furthermore document CN201610647003.3 discloses a kind of determination method and device of money laundering account, wherein basis exists The weight of each data sample in multiple data samples is classified, and target data sample is then determined, according to target Whether the corresponding account of data sample meets preset money laundering account standard, to determine whether for money laundering account.
In addition, a kind of determination method and device of suspicious money laundering account is also disclosed in document CN201610522577.8, wherein For for indicating that multiple data samples of Transaction Information of the account in the setting duration of set period of time are classified, To obtain optimal classification, isolated point data sample is determined from certain kinds data sample, and this is isolated into point data sample Originally it is determined as suspicious money laundering account.
In addition, a kind of abnormal traffic data screening method of anti money washing system is also disclosed in document CN201710576257.5, The wherein method by taking abnormal index, Exception Model screening, while screening technique is combined with business scenario, passes through satisfaction The triggering mode of the Exception Model trigger condition is screened, to filter out abnormal traffic data.
Summary of the invention
Technical problem to be solved by the present invention lies in provide after a kind of suspicious trading activity quantization by client and by It carries out sentencing method for distinguishing and corresponding system in artificial intelligence Classification and Identification model, it is possible thereby to based in bank client trading behavior Known client trading feature, more accurately the client of significant suspicious trading activity is provided in screening from target customer.
Therefore, according to a first aspect of the present invention it is proposed a kind of suspicious risk trade screening method, this method include as follows Step:
Known customer profile data is obtained, wherein the known customer profile data includes known client trading information data With known client personal information data;
Known client trading feature is calculated by the known customer profile data;
According to the neighbouring sampling random forest method training suspicious row of client of the known client trading characteristic use of the calculating For screening model;
The trading activity of object filtering client is screened according to trained client's suspicious actions screening model.
It is according to the basic conception of suspicious risk trade screening method proposed by the present invention, it is random by means of neighbouring sampling Forest method is classified for non-equilibrium data collection, to screen out part normal clients, retains most suspicious transaction, effectively Reduce the workload of artificial screening.Classification can be interpreted as, will be divided into the client of bank transaction with suspicious transaction herein Client and arm's length dealing client.In classification task, the sample that we will generally identify is known as positive sample, with Opposite be known as negative sense sample.Here, so-called non-equilibrium data collection refers to positive sample in data set and negative sense sample Data volume differs greatly, so that the prediction of model or classification results seriously tend to take up most sample classes, obtains in this way Accuracy rate be incredible, such as when the ratio of sample positive in data set and negative sense sample is 9:1.Even if its is accurate Rate is 90%, also have reason to suspect it by all classifications be all judged as more than data that is a kind of, to have ignored data completely Measure small sample.Usually in the client trading of bank, it is clear that the client with suspicious transaction only occupies one of wherein very little Point, and the client trading of the overwhelming majority is all normal, therefore the customer transactional data of bank belongs to typical non-equilibrium data Collection, thus using to this very targetedly neighbouring sampling random forest method carry out processing can sufficiently avoid it is mentioned above Deviation, to effectively improve the accuracy of prediction or screening.
Advantageously, it is known that client trading information data may include: exchange hour, transaction amount, transaction I P, transaction account Family information and transaction message etc..
Advantageously, it is known that client personal information data may include: customer account number, customer name, open an account the date and Contact method etc..
It advantageously, may include the known customer information number to the acquisition after obtaining known customer profile data According to data cleansing means are carried out, to make data structured.So-called data cleansing may include: data connection, merging, processing Missing values and exceptional value etc..So-called missing values processing will lack for example it is to be understood that if lacking the amount of money in Transaction Information Few amount of money fills up 0.So-called outlier processing people is it is to be understood that for example, under normal circumstances, " trade gold in tran list Volume " and " opponent's transaction amount " Ying Xiangtong, with " transaction amount " for standard, will be " right if once there is inconsistent situation Hand transaction amount " is changed to " transaction amount ", so as to make this two it is consistent.
Advantageously, the client trading feature may include following 12 features: fast frequency transaction feature, the amount of money in the period It passes in and out consistent features, identical I P feature, cumulative transaction amount feature, add up transaction count feature, to the private number of public client revolution Feature, the private Ratio Features that revolve to public client, transaction number feature, number is produced in transaction and accounting feature, transaction are transferred to number And accounting feature, transaction produce the amount of money and accounting feature and transaction is transferred to the amount of money and accounting feature.Wherein, 12 features Description and calculation difference it is as follows:
1. fast frequency transaction feature is calculated by following formula in the period:
Wherein p is the period, and trxMoney is each transaction amount, and a is coefficient, for increasing between fast frequency and slow frequency Every.
2. amount of money disengaging consistent features are calculated by following formula:
Wherein p is the period, and p+1 be that disengaging amount of money period in one section of translation, (such as p was positive negative and positive negative cycle, then p+1 is The negative and positive negative and positive period), trxMoneyp,inTo be transferred to the amount of money, money in the periodp,outTo produce the amount of money in the period.
3. identical IP feature is calculated by following formula:
MaxSameIPcountp=max (countip(trxallp,cust.groupby(ip,cust_id)))
trxallp,cust∈trxp,cust∪trxp,opp
trxp,opp∈trxp
trxp,opp~trxp.cust_id=trxp,cust.opp_cust_id
Wherein, IP address field SLF_EQP_ADR/chnl_srl, statistics have with upstream and downstream transaction with existing customer There is maximum company/customer quantity of identical IP.
4. cumulative transaction amount feature is calculated by following formula:
SumTrxMoneyp=∑ abs (trx.trxMoneyp)
Wherein, SumTrxMoneypFor client, cumulative transaction amount absolute value summation, p can choose 1 day, 7 within the p period It, 14 days, 30 days, 90 days, 180 days.
5. accumulative transaction count feature is calculated by following formula:
SumTrxCountp=∑ trx.countp
Wherein, SumTrxCountpAdd up transaction count summation within the p period for client, p can choose 1 day, 7 days, 14 It, 30 days, 90 days, 180 days.
6. a couple public client is revolved, a private number feature is to calculate in network to transfer accounts number to public client with to private client, wherein right Public client definition is client of the length of name less than 6, is calculated by following formula:
Com2CusTrxCountp=∑ Countp,trx_opp_cus
Trx_opp_cus.opp_cust_name.length < 6
Wherein, public affairs are average to the private number highest n numbers to public client of transferring accounts in network, n=1, and 2,3.
It always trades to public client with to the private client Zhan that transfers accounts in private Ratio Features calculating network 7. a couple public client is revolved Ratio is calculated wherein being client of the length of name less than 6 to public client definition by following formula:
Trx_opp_cus.opp_cust_name.length < 6
Public affairs are average to the private ratio highest n ratios to public client of transferring accounts in network, n=1, and 2,3
8. number feature of trading is the quantity statistics of different names in network, calculated by following formula, wherein Distinct is to count unduplicated title:
TrxOppNump=∑ distinct (trx.opp_cust_name)
9. number and accounting feature are produced in transaction
TrxCountNump,out=∑ trxout.count
trxout∈ trx.trxMoney < 0
The transaction count produced in network accounts for the ratio of total number of transactions number.
10. transaction is transferred to number and accounting feature, direction is transferred to by the positive negative judgement of transaction amount, wherein transaction amount is big In 0 to be transferred to, calculated to produce by following formula less than 0:
TrxCountNump,in=∑ trxin.count
trxin∈ trx.trxMoney > 0
The transaction count being transferred in network accounts for the ratio of total number of transactions number.
11. the amount of money and accounting feature are produced in transaction:
SumTrxMoneyp,out=∑ abs (trxout.trxMoney)
trxout∈ trx.trxMoney < 0
The feature of the transaction total ratio of exchange of Zhan of the amount of money is produced in network.
12. transaction is transferred to the amount of money and accounting feature:
SumTrxMoneyp,in=∑ trxin.trxMoney
trxout∈ trx.trxMoney > 0
The feature for the ratio that the transaction Zhan of the amount of money always trades is transferred in network.
Be experimentally confirmed, using the above-mentioned all or part according in the suspicious transaction feature of 12 class proposed by the present invention and Its calculation method has better effects in suspicious classification of business transaction and screening.
Advantageously, in the neighbouring sampling random forest method training visitor of known client trading characteristic use according to the acquisition May include as follows in the step of family suspicious actions screening model:
Firstly, distinguishing the client and the normal client of trading activity with suspicious trading activity.According to the present invention, will come from Such as all known customer transactional datas (precisely thus calculated client trading feature) of bank are empty as sample Between, wherein client's (that is, client trading feature) with suspicious trading activity is regarded into positive sample, trading activity is normal Client's (that is, client trading feature) is considered as negative sense sample, therefore, herein relates to two classification situations.For example, the differentiation can lead to It crosses under type such as to realize, i.e., basis is marked manually by historical data or automatic marking determines whether client is suspicious and is each Client be arranged label, for example, if the suspicious then label of client be 1, if customer action normally if label be 0.It herein can will be positive The features described above of sample and negative sense sample is denoted as X, and corresponding label is that positive sample size can be denoted as by Y after differentiation A, negative sense sample size are B.
Then, Unsupervised clustering is carried out using K-MEANS clustering algorithm to all samples, wherein cluster classification quantity is N. Positive sample size is A in each classification1,A2,....,AN.So-called Unsupervised clustering refers to the input number for cluster According to only feature, without label, so that data object can be divided into multiple groups, and similar sample is automatic in cluster process Cluster is gathered to form, so that the object in cluster has very high similitude, but very dissimilar with the object in other clusters.From And disclose potential classification rule.In so-called K-MEANS clustering algorithm, receive input parameter k, it will be in data set by cluster N object be divided into k classification to make classification obtained meet following condition: the object similarity i.e. in same category It is higher;And it is different classes of in object similarity it is smaller.Object similarity described herein is to utilize the equal of middle object of all categories Value obtains one " center object " come what is calculated.The course of work of k-means clustering algorithm is described as follows: first from n Arbitrarily select k a as initial cluster center in a object;And for remaining other objects, then it is initial with these according to them The similarity (distance) of cluster centre assigns these to (cluster centre representated by) cluster most like with it respectively, away from From calculating generally carried out using Euclidean distance, Euclidean distance refers to the actual distance in m-dimensional space between two points (herein not It repeats again).Then each cluster centre (mean values of all objects in the cluster) for obtaining and newly clustering is calculated again;Constantly repeat this One process is until canonical measure function starts to tend to a certain fixed value.Mean square deviation namely standard deviation are generally used as mark Quasi- measure function.
Then, down-sampling is carried out to negative sample according to above-mentioned cluster result.So-called negative sample down-sampling is in the following way It carries out, i.e. stochastical sampling BSA negative sense sample, wherein BSIt is calculated by following formula:
BS=A × R
As described above, A is current positive sample size, R is positive negative sample ratio after conversion.It is adopted at random from each classification Sample BSP negative sense sample, wherein i is classification number, 1≤i≤N;P is the probability of stability, and 0≤P≤1, stochastical sampling is i.e. from sample sky Between middle randomly drawing sample, as described above according to the present invention, the transaction data of all clients of bank constitutes sample space, often The transaction data of one client is a sample, and sampling process is exactly to select the process of customer transactional data.Then, then from its In his negative sense sample, stochastical sampling BS(1-P) a negative sense sample.
Then, after extracting negative sense sample with top sampling method, positive sample size is A, and negative sense sample size is BS.Make Positive sample and negative sense sample after sampling are trained with random forest method, trained model is exactly client's suspicious actions Screening model it is possible thereby to which the trading activity for new data that is, object filtering client is screened, that is, is classified.
So-called random forest method, which refers to, establishes a forest with random manner, and forest the inside is by many decision tree groups At being not associated between each decision tree of random forest.After obtaining forest, when there is a new input sample Into when, just allow each decision tree in forest once to be judged respectively, look at which this sample should belong to Then class looks at which kind of at most, just predicts that this sample is that is a kind of by selection.Random forest can both can handle attribute For the amount of classification, the amount that attribute is numerical value also can handle.So-called decision tree (decision tree) is a tree construction.Its Each nonleaf node is indicated that each branch represents output of this characteristic attribute in some codomain by a characteristic attribute, and Each leaf node then stores a classification.It the use of the process that decision tree carries out decision is exactly to be tested to be sorted since root node Corresponding characteristic attribute in, and output branch is selected according to its value, until reaching leaf node, by the class of leaf node storage It Zuo Wei not the result of decision.Decision tree is actually that a kind of method for dividing space all will be current every time when segmentation Space be divided into two.
Therefore, it is according to a kind of the advantages of suspicious risk trade screening method proposed by the present invention, on the one hand, according to this The method of invention uses neighbouring sampling random forest method and combines Unsupervised clustering especially suitable for complicated non-equilibrium data The case where classification or screening of collection, less and positive and negative sample vector space segment is overlapped hence for positive sample, has preferable Classifying quality;Two on the one hand, can overcome non-equilibrium data collection point according to suspicious risk trade screening method proposed by the present invention Class is biased to problem, can guarantee to screen out part normal clients, retains most suspicious transaction, to reduce artificial screening Amount.
Advantageously, include according to the targeted suspicious type of transaction of method proposed by the present invention and be not limited to Chinese people's silver The suspicious type of transaction of 12 classes of row definition it is one or more: it is doubtful illegal private bank, doubtful corruption, doubtful Drug-related crimes, doubtful Smuggling, doubtful swindle, doubtful fund-raising, doubtful multiple level marketing, doubtful arbitrage, doubtful crime concerning tax, doubtful terrified financing, doubtful gambling, Doubtful false capital contribution (capital flight).
In addition, second aspect according to the invention also proposes a kind of corresponding suspicious risk trade screening system, the device Include:
Acquiring unit consists of, for obtaining known customer profile data, wherein the known customer profile data packet Include known client trading information data and known client personal information data;
Computing unit consists of, and calculates known client trading feature by the known customer profile data;
Analysis and processing unit consists of, random according to the neighbouring sampling of the known client trading characteristic use of the acquisition Forest method trains client's suspicious actions screening model;
In addition, analysis and processing unit may be also constructed to, according to trained client's suspicious actions screening model pair The trading activity of object filtering client is screened.
Wherein the acquiring unit can be communicated via wired or wireless way with computing unit and/or analysis and processing unit Connection.
According to the present invention, acquiring unit can be understood as any type of data input and acquisition device, available It numerical data and/or analogue data and is deposited by other wired or wireless communication modes with other data processing units or data Storage unit is communicated, for use in the processing and storage of data.
According to the present invention, computing unit and/or analysis and processing unit can be understood as alone or in combination any type of Central processing unit perhaps processing unit its can receive data and/or signal and by corresponding algorithm or software to it It is handled, furthermore can also export corresponding control signal to controlled device or displays signal to corresponding display.
Advantageously, the result of the screening can be exported by output device.It can be vision, the sense of hearing in this output device And the output device of any other suitable type.For example, the output device is configured to loudspeaker, display and such as It is such.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described.It should be evident that the accompanying drawings in the following description only describes A part of the embodiments of the present invention.These attached drawings are not restrictive for the present invention, but are served illustrative. Wherein:
Fig. 1 shows the flow chart according to suspicious risk trade screening method proposed by the present invention;
The suspicious risk trade screening method proposed Fig. 2 shows one embodiment according to the invention further refines Flow chart;
Fig. 3 shows the differentiation of the middle generation for the suspicious risk trade screening method that one embodiment according to the invention proposes Customer information form later;
Fig. 4 a-4c shows the model training for the suspicious risk trade screening method that one embodiment according to the invention proposes The change procedure of middle sample vector space projection figure;
Fig. 5 shows the screening results for the suspicious risk trade screening method that one embodiment according to the invention proposes;
Fig. 6 is shown according to a kind of block diagram of suspicious risk trade screening system 100 proposed by the present invention.
Specific embodiment
Fig. 1 shows the flow chart according to suspicious risk trade screening method 200 proposed by the present invention.This method 200 includes:
First step 201 obtains known customer profile data, wherein the known customer profile data includes known client Trading information data and known client personal information data;
Second step 202 calculates known client trading feature by the known customer profile data;
Third step 203, according to the neighbouring sampling random forest method instruction of the known client trading characteristic use of the calculating Practice client's suspicious actions screening model;
Four steps 204, the transaction according to trained client's suspicious actions screening model to object filtering client Behavior is screened.
Fig. 2 shows the further thin of the suspicious risk trade screening method 200 of one embodiment according to the invention proposition The flow chart of change.Wherein advantageously, in the neighbouring sampling random forest side of known client trading characteristic use according to the acquisition Include following sub-step in the step 203 of method training client's suspicious actions screening model:
First sub-step 2031 distinguishes the client and the normal client of trading activity with suspicious trading activity;
Second sub-step 2032 carries out without prison all known client trading features using K-MEANS clustering algorithm Superintend and direct cluster;
Third sub-step 2033 carries out down-sampling to the normal client of the trading activity according to above-mentioned cluster result;
4th sub-step 2034, using random forest method to the client with suspicious trading activity and after down-sampling The normal client of trading activity is trained, and obtains client's suspicious actions screening model.
Below according to a specific embodiment, to according to suspicious risk trade screening method 200 of the present invention into Row illustrates.
It in this embodiment, mainly include that data importing, feature extraction, model training and suspicious actions client screen four Part or step.Wherein data lead-in portion can be understood as obtaining known customer profile data, wherein known client's letter Ceasing data includes known client trading information data and known client personal information data, for example these data can come from bank The data record of database or security department.
In data lead-in portion, by Transaction Information (time, the amount of money, transaction I P, account information, transaction are left a message), client The data such as personal information (customer ID, customer name, date of opening an account, contact method) import data processing system, at the data It is finally obtained by means such as data cleansings by data structured, including by data processing missing values, exceptional value etc. in reason system The every a line of structural data be a client, the corresponding essential informations such as client and its transaction of each column of every a line.Wherein lack Mistake value processing such as: lack the amount of money in Transaction Information, then will lack the amount of money fills up 0;And exceptional value is such as: " trade gold in tran list Volume " answers situation that is identical, but sometimes will appear inconsistent with " opponent's transaction amount " under normal circumstances, then with " transaction amount " For standard, " opponent's transaction amount " is changed to " transaction amount ".
Then, it in second step 202, according to the above 12 category features definition and calculates, respectively from importing number According to frequency transaction feature fast in the middle period for extracting each client, amount of money disengaging consistent features, identical I P feature, accumulative friendship Easy amount of money feature, accumulative transaction count feature, the private Ratio Features that revolve to public client, are handed over the private number feature that revolves to public client Easy number feature, transaction produce number and accounting feature, transaction are transferred to number and accounting feature, transaction produce the amount of money and accounting is special Sign and transaction are transferred to the amount of money and accounting feature.
Then, into model training part, which is mainly used for special according to 12 known client tradings of above-mentioned calculating Sign for example utilizes neighbouring sampling random forest method training client's suspicious actions screening model, certainly, other similar side herein Method is also admissible.
May include: in model training part
Firstly, according to passing through, historical data marks manually or automatic marking determines after calculating each client characteristics Whether client is suspicious and label is arranged for each client, and label is 1 if client is suspicious, if normally label is customer action 0, obtained feature and label are as shown in Figure 3.
By means of above-mentioned neighbouring sampling random forest method, Fig. 4 a-4c shows the sample vector space in model training and throws The change procedure of shadow figure, wherein round is positive sample, rectangular is negative sense sample.The sample space perspective view initially obtained first As shown in 4a.
Then, it being clustered using K-MEANS, clusters classification quantity N=2, it can be seen that cluster result is as shown in Figure 4 b, and two Class all respectively has 1 positive sample respectively.
Then, third step conversion negative sense sample is positive sample, positive sample size A=4, negative sense sample size B=6, R =1, P=0.5 can calculate BS=4, BSP=2 then distinguishes 2 negative sense samples of stochastical sampling in two classifications, then surplus 2 negative sense samples of stochastical sampling in remaining negative sense sample, as a result as illustrated in fig. 4 c.
Then we generate and have instructed using random forest grader to current 4 positive samples and 4 negative sense sample trainings Practice model.
Finally, for the new calculated feature of data, that is, above-mentioned 12 features, trained model can be passed through Calculate its tag along sort, classification results that is, screening results are as shown in Figure 5.
Wherein it is possible to think that client 3,4,7,8,10 is that normal clients are excluded, remaining client 1,2,5,6,9 may There are a suspicious actions, retains and achieve and meet at professional staff and further verify and qualitative.
Fig. 6 is shown according to a kind of block diagram of suspicious risk trade screening system 100 proposed by the present invention, the device 100 Include:
Acquiring unit 101, consists of, for obtaining known customer profile data, wherein the known customer information number According to including known client trading information data and known client personal information data;
Computing unit 102, consists of, and calculates known client trading feature by the known customer profile data;
Analysis and processing unit 103, consists of, according to the neighbouring sampling of the known client trading characteristic use of the acquisition with Machine forest method trains client's suspicious actions screening model;
In addition, analysis and processing unit 103 is also configured as, according to trained client's suspicious actions screening model to mesh The trading activity of mark screening client is screened.
Wherein the acquiring unit is via wired or wireless way and computing unit and/or analysis and processing unit communication link It connects.
Above description to the embodiment proposed, enables those skilled in the art to implement or use the present invention. It should be appreciated that the feature disclosed in above embodiments individually or can be tied mutually other than the situation for having special instruction Ground is closed to use.Various modifications to these embodiments will be readily apparent to those skilled in the art, herein Defined in General Principle can realize in other embodiments without departing from the spirit or scope of the present invention. Therefore, invention disclosed herein is not limited to disclosed specific embodiment, but is intended to appended right such as and wants Ask the modification within the spirit and scope of the present invention defined by book.

Claims (10)

1. a kind of suspicious risk trade screening method (200), this method comprises the following steps:
Known customer profile data (201) is obtained, wherein the known customer profile data includes known client trading Information Number According to known client personal information data;
Known client trading feature (202) is calculated by the known customer profile data;
According to the neighbouring sampling random forest method training client's suspicious actions sieve of the known client trading characteristic use of the calculating Modeling type (203);
The trading activity of object filtering client is screened according to trained client's suspicious actions screening model (204)。
2. according to the method for claim 1 (200), which is characterized in that special according to the known client trading of the acquisition It include following sub-step in the step of sign trains client's suspicious actions screening model using neighbouring sampling random forest method (203) It is rapid:
Distinguish the client and the normal client of trading activity (2031) with suspicious trading activity;
Unsupervised clustering (2032) are carried out using K-MEANS clustering algorithm to all known client trading features;
Down-sampling (2033) are carried out to the normal client of the trading activity according to above-mentioned cluster result;
Using random forest method to the client with suspicious trading activity and the normal client of trading activity after down-sampling It is trained (2034), obtains client's suspicious actions screening model.
3. method according to claim 1 or 2 (200), which is characterized in that the known client trading information packet It includes: exchange hour, transaction amount, transaction IP, transaction account information and transaction message.
4. method according to claim 1 or 2 (200), which is characterized in that the known client personal information data packet It includes: customer account number, customer name, open an account date and contact method.
5. method according to claim 1 or 2 (200), which is characterized in that after obtaining known customer profile data, Data cleansing is carried out to the known customer profile data of the acquisition to make data structured.
6. according to the method for claim 5 (200), which is characterized in that the data cleansing include: data connection, merging, Handle missing values and exceptional value.
7. method according to claim 1 or 2 (200), which is characterized in that the client trading feature includes the following: week Fast frequency transaction feature, amount of money disengaging consistent features, identical IP feature, cumulative transaction amount feature, accumulative transaction count in phase Number is produced in feature, the private number feature that revolves to public client, the private Ratio Features that revolve to public client, transaction number feature, transaction And accounting feature, transaction are transferred to number and accounting feature, transaction produce the amount of money and accounting feature and transaction is transferred to the amount of money and accounts for Compare feature.
8. method according to claim 1 or 2 (200), which is characterized in that the suspicious risk trade type includes as follows It is one or more in type of transaction: doubtful illegal private bank, doubtful corruption, doubtful Drug-related crimes, doubtful smuggling, doubtful swindle, Doubtful fund-raising, doubtful multiple level marketing, doubtful arbitrage, doubtful crime concerning tax, doubtful terrified financing, doubtful gambling, doubtful false capital contribution.
9. a kind of suspicious risk trade screening system (100), the device include:
Acquiring unit (101), consists of, for obtaining known customer profile data, wherein the known customer profile data Including known client trading information data and known client personal information data;
Computing unit (102), consists of, and calculates known client trading feature by the known customer profile data;
Analysis and processing unit (103), consists of, random according to the neighbouring sampling of the known client trading characteristic use of the acquisition Forest method trains client's suspicious actions screening model;
In addition, analysis and processing unit (103) is also configured as, according to trained client's suspicious actions screening model to target The trading activity of screening client is screened.
Wherein the acquiring unit is communicated to connect via wired or wireless way and computing unit and/or analysis and processing unit.
10. suspicious risk trade screening system (100) according to claim 9, the device further include: for exporting screening Result output unit.
CN201810427250.1A 2018-05-07 2018-05-07 A kind of suspicious risk trade screening method and corresponding system Pending CN110458376A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810427250.1A CN110458376A (en) 2018-05-07 2018-05-07 A kind of suspicious risk trade screening method and corresponding system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810427250.1A CN110458376A (en) 2018-05-07 2018-05-07 A kind of suspicious risk trade screening method and corresponding system

Publications (1)

Publication Number Publication Date
CN110458376A true CN110458376A (en) 2019-11-15

Family

ID=68471962

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810427250.1A Pending CN110458376A (en) 2018-05-07 2018-05-07 A kind of suspicious risk trade screening method and corresponding system

Country Status (1)

Country Link
CN (1) CN110458376A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110895758A (en) * 2019-12-02 2020-03-20 中国银行股份有限公司 Screening method, device and system for credit card account with cheating transaction
CN111145027A (en) * 2019-12-31 2020-05-12 众安信息技术服务有限公司 Suspected money laundering transaction identification method and device
CN111275416A (en) * 2020-01-15 2020-06-12 中国人民解放军国防科技大学 Digital currency abnormal transaction detection method and device, electronic equipment and medium
CN111310784A (en) * 2020-01-14 2020-06-19 支付宝(杭州)信息技术有限公司 Resource data processing method and device
CN112101952A (en) * 2020-09-27 2020-12-18 中国建设银行股份有限公司 Bank suspicious transaction evaluation and data processing method and device
CN112801800A (en) * 2021-04-14 2021-05-14 深圳格隆汇信息科技有限公司 Behavior fund analysis system, behavior fund analysis method, computer equipment and storage medium
CN112907351A (en) * 2021-02-05 2021-06-04 中国工商银行股份有限公司 Financial message abnormity identification method and device
CN113159923A (en) * 2021-04-29 2021-07-23 中国工商银行股份有限公司 Risk screening method and device
CN116402512A (en) * 2023-05-31 2023-07-07 无锡锡商银行股份有限公司 Account security check management method based on artificial intelligence

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105761112A (en) * 2016-02-23 2016-07-13 国元证券股份有限公司 Securities margin trading and asset management target customer mining method
CN106897931A (en) * 2016-06-12 2017-06-27 阿里巴巴集团控股有限公司 A kind of recognition methods of abnormal transaction data and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105761112A (en) * 2016-02-23 2016-07-13 国元证券股份有限公司 Securities margin trading and asset management target customer mining method
CN106897931A (en) * 2016-06-12 2017-06-27 阿里巴巴集团控股有限公司 A kind of recognition methods of abnormal transaction data and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张亮: "《关于银行卡产业防范跨境洗钱与恐怖融资的研究》", 《中国优秀硕士学位论文全文数据库 经济与管理科学辑》 *
浮盼盼: "《大规模不均衡数据分类方法研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110895758A (en) * 2019-12-02 2020-03-20 中国银行股份有限公司 Screening method, device and system for credit card account with cheating transaction
CN111145027A (en) * 2019-12-31 2020-05-12 众安信息技术服务有限公司 Suspected money laundering transaction identification method and device
CN111310784A (en) * 2020-01-14 2020-06-19 支付宝(杭州)信息技术有限公司 Resource data processing method and device
CN111310784B (en) * 2020-01-14 2021-07-20 支付宝(杭州)信息技术有限公司 Resource data processing method and device
CN111275416B (en) * 2020-01-15 2024-02-27 中国人民解放军国防科技大学 Digital currency abnormal transaction detection method, device, electronic equipment and medium
CN111275416A (en) * 2020-01-15 2020-06-12 中国人民解放军国防科技大学 Digital currency abnormal transaction detection method and device, electronic equipment and medium
CN112101952A (en) * 2020-09-27 2020-12-18 中国建设银行股份有限公司 Bank suspicious transaction evaluation and data processing method and device
CN112101952B (en) * 2020-09-27 2024-05-10 中国建设银行股份有限公司 Bank suspicious transaction evaluation and data processing method and device
CN112907351A (en) * 2021-02-05 2021-06-04 中国工商银行股份有限公司 Financial message abnormity identification method and device
CN112801800A (en) * 2021-04-14 2021-05-14 深圳格隆汇信息科技有限公司 Behavior fund analysis system, behavior fund analysis method, computer equipment and storage medium
CN113159923A (en) * 2021-04-29 2021-07-23 中国工商银行股份有限公司 Risk screening method and device
CN116402512A (en) * 2023-05-31 2023-07-07 无锡锡商银行股份有限公司 Account security check management method based on artificial intelligence
CN116402512B (en) * 2023-05-31 2023-08-22 无锡锡商银行股份有限公司 Account security check management method based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN110458376A (en) A kind of suspicious risk trade screening method and corresponding system
CN106650273B (en) A kind of behavior prediction method and apparatus
WO2018014610A1 (en) C4.5 decision tree algorithm-based specific user mining system and method therefor
CN112053221A (en) Knowledge graph-based internet financial group fraud detection method
CN109768985A (en) A kind of intrusion detection method based on traffic visualization and machine learning algorithm
CN106530078A (en) Loan risk early warning method and system based on multi-industry data
CN105931068A (en) Cardholder consumption figure generation method and device
CN109657978A (en) A kind of Risk Identification Method and system
CN107146089A (en) The single recognition methods of one kind brush and device, electronic equipment
CN112559771A (en) Intelligent capital transaction monitoring method and system based on knowledge graph
CN101989327A (en) Image analyzing apparatus and image analyzing method
CN105719045A (en) Retention risk determiner
CN112700324A (en) User loan default prediction method based on combination of Catboost and restricted Boltzmann machine
CN108764943B (en) Suspicious user monitoring and analyzing method based on fund transaction network
CN113283902B (en) Multichannel blockchain phishing node detection method based on graphic neural network
CN110443120A (en) A kind of face identification method and equipment
CN114186626A (en) Abnormity detection method and device, electronic equipment and computer readable medium
CN104123368A (en) Big data attribute significance and recognition degree early warning method and system based on clustering
CN104850868A (en) Customer segmentation method based on k-means and neural network cluster
CN108197795A (en) The account recognition methods of malice group, device, terminal and storage medium
CN108429776A (en) Method for pushing, device, client, interactive device and the system of network object
CN115545103A (en) Abnormal data identification method, label identification method and abnormal data identification device
WO2022143431A1 (en) Method and apparatus for training anti-money laundering model
CN105930430B (en) Real-time fraud detection method and device based on non-accumulative attribute
CN110442621A (en) Classified statistic method, apparatus and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191115