CN111369339A - Over-sampling improved svdd-based bank client transaction behavior abnormity identification method - Google Patents

Over-sampling improved svdd-based bank client transaction behavior abnormity identification method Download PDF

Info

Publication number
CN111369339A
CN111369339A CN202010137063.7A CN202010137063A CN111369339A CN 111369339 A CN111369339 A CN 111369339A CN 202010137063 A CN202010137063 A CN 202010137063A CN 111369339 A CN111369339 A CN 111369339A
Authority
CN
China
Prior art keywords
abnormal
data
behaviors
behavior
svdd
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010137063.7A
Other languages
Chinese (zh)
Inventor
杨健颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Suoxinda Data Technology Co ltd
Original Assignee
Shenzhen Suoxinda Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Suoxinda Data Technology Co ltd filed Critical Shenzhen Suoxinda Data Technology Co ltd
Priority to CN202010137063.7A priority Critical patent/CN111369339A/en
Publication of CN111369339A publication Critical patent/CN111369339A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Engineering & Computer Science (AREA)
  • Finance (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Technology Law (AREA)
  • Development Economics (AREA)
  • Computer Security & Cryptography (AREA)
  • Complex Calculations (AREA)

Abstract

A bank client transaction behavior abnormity identification method based on oversampling improved svdd relates to the technical field of bank wind control data processing, and comprises the following steps: s1, carrying out consistency check on the original data; s2, setting a numerical value p, and expanding data with abnormal behaviors in the data by p times by using a smote oversampling algorithm; s3, establishing an svdd model for the data with abnormal behaviors after p times of expansion, and calculating the center a and the radius R of the svdd model; s4, calculating the distance from the data of the abnormal behavior to the center a of the svdd model, judging the transaction behavior of which the distance is less than the radius R of the svdd model as the abnormal behavior, otherwise, judging the transaction behavior as the abnormal behavior. The defects of the existing abnormal recognition algorithm in the bank customer transaction behaviors are overcome, so that the abnormal bank customer transaction behaviors are recognized. And finally, reporting the identified abnormal transaction behavior to a verification module for further security verification so as to achieve the purpose of better preventing the bank transaction risk.

Description

Over-sampling improved svdd-based bank client transaction behavior abnormity identification method
Technical Field
The invention relates to the technical field of bank wind control data processing, in particular to an improvement aspect of a data analysis method for customer transaction behavior abnormity identification in bank wind control.
Background
Wind control is one of the most important links in the banking industry, and the wind control capability and level of a bank can be effectively improved by identifying the abnormity of the transaction behaviors of customers.
The general method for identifying abnormal behaviors of clients is to construct a supervised classification model for analysis, wherein one type is abnormal and the other type is abnormal. This approach has a significant drawback: for a transaction behavior of a customer, for example, a credit card of a bank customer is stolen and swiped, it can be determined that the transaction behavior is abnormal, but the transaction behavior without abnormality can only be regarded as that no abnormality occurs temporarily, and an abnormality may occur later. Supervised models are not suitable in this case because this type of data without anomalies is not completely accurate. In this case, we can use the semi-supervised model svdd to identify abnormal transaction behaviour.
The semi-supervised model svdd needs to be more accurate under the condition of large data quantity such as the label. In the identification of abnormal transaction behaviors of customers at banks, tagged data refers to data that is determined to be at risk, such as transactions in which credit cards are swiped illegally, and this data is rare.
Disclosure of Invention
The invention aims to overcome the defects of the existing abnormal recognition algorithm in bank client transaction behaviors, and provides an over-sampling improved svdd-based bank client transaction behavior abnormal recognition method, which is an effective semi-supervised algorithm. The data with abnormal behaviors are expanded by using a smote oversampling algorithm, and whether the abnormal transaction behaviors are abnormal or not is judged by analyzing rules in the transaction behavior data of the clients, so that the abnormal transaction behaviors of the clients of the bank are identified. And finally, reporting the identified abnormal transaction behavior to a verification module for further security verification so as to achieve the purpose of better preventing the bank transaction risk.
In order to solve the technical problems provided by the invention, the technical scheme is as follows: a bank customer transaction behavior abnormity identification method based on oversampling improved svdd is characterized in that: the method comprises the following steps:
s1, giving the original data of the bank customer transaction behaviors, carrying out consistency check on the original data, removing invalid data and repeated data, filling missing values, converting category variables into numerical variables, and classifying the original data into two types of abnormal behaviors and non-abnormal behaviors temporarily according to the result recorded in the original data; regarding the behavior that is not abnormal temporarily as the abnormality;
s2, setting a numerical value p, and expanding data with abnormal behaviors in the data by p times by using a smote oversampling algorithm;
s3, establishing an svdd model for the data with abnormal behaviors after p times of expansion, and calculating the center a and the radius R of the svdd model;
s4, calculating the distance from the data of the abnormal behavior to the center a of the svdd model, judging the transaction behavior of which the distance is less than the radius R of the svdd model as the abnormal behavior, otherwise, judging the transaction behavior as the abnormal behavior.
The technical scheme for further limiting the invention comprises the following steps:
the step S2 includes:
the data set with abnormal behaviors is Q, and Q samples are in total;
calculating each sample x in the data set Q with abnormal behavioriM neighbors of (i ═ 1, 2.. q), randomly selecting a sample point x from the m neighborsitThen a random number lambda of 0 to 1 is generatedjBased on xiGenerated j-th new sample point
Figure BDA0002397702470000021
For each xiAnd performing p times of operation to obtain the data set with abnormal behaviors after the data set is expanded by p times based on the smote oversampling algorithm.
The step S3 includes:
the data set with abnormal behavior is represented by (x, y), x represents the feature, and y represents the abnormality. Constructing a hypersphere for a dataset (x, y) with abnormal behavior, which hypersphere can be described as
Figure BDA0002397702470000022
So that (x)i-a)T(xi-a)≤R2iWhere C is a penalty parameter, ξiIs the relaxation variable.
Converting the description of the hypersphere into the following form L- ∑αiK(xi,xj)-∑αiαjK(xi,xj) Where K is a kernel function, αiIs a Lagrange multiplier calculated using convex optimization αi
Calculating radius R of the hypersphere2=K(xi,xi)-2∑αiK(xi,xj)+∑∑αiαjK(xi,xj) And the center of sphere a is ∑αixi
And calculating the distance from the data which is unknown whether abnormal behaviors exist to the center a of the sphere, judging the behaviors of which the distance is less than the radius R of the hypersphere as abnormal behaviors, and reporting the identified abnormal behaviors to a verification module for further safety verification.
The invention has the beneficial effects that: the oversampling improvement svdd used by the invention is an effective semi-supervised method when the data classes are unbalanced. From the data acquirability, the abnormal transaction behavior of the bank customer can be usually determined only, but the transaction behavior is difficult to ensure to be not abnormal, svdd is an efficient semi-supervised method, the abnormal data is only needed to be known in the method, modeling is carried out on the data, and the established model is used for analyzing the transaction behavior which is unknown whether the abnormal transaction behavior exists, so that the method is very consistent with the actual situation of the transaction behavior data of the bank customer, and an accurate result is obtained. From the viewpoint of data category balance, svdd can be more accurate under the condition of large data volume such as the label, in order to ensure the accuracy, a smote oversampling algorithm is used for expanding abnormal behavior data before modeling, and then the expanded data with larger sample volume is used for modeling, so that a more accurate result can be obtained.
Drawings
Fig. 1 is a flow chart of a bank customer transaction behavior abnormality identification method based on oversampling improved svdd in the invention.
Detailed Description
In order that the invention may be more readily understood, reference will now be made in detail to specific embodiments thereof, and the accompanying drawings will be used to illustrate the invention:
referring to fig. 1, the invention discloses a bank customer transaction behavior abnormity identification method based on oversampling improved svdd, which comprises the following steps:
and S1, giving the original data of the transaction behaviors of the bank customers, carrying out consistency check on the original data, removing invalid data and repeated data, filling missing values, converting category variables into numerical variables, and classifying the original data into two types of abnormal behaviors and non-abnormal behaviors temporarily according to the results recorded in the original data. Regarding the behavior that is not abnormal temporarily as the abnormality;
s2, setting a numerical value p, and expanding data with abnormal behaviors in the data by p times by using a smote oversampling algorithm;
s3, establishing an svdd model for the data with abnormal behaviors after p times of expansion, and calculating the center a and the radius R of the svdd model;
s4, calculating the distance from the data of the abnormal behavior to the center a of the svdd model, judging the transaction behavior of which the distance is less than the radius R of the svdd model as the abnormal behavior, otherwise, judging the transaction behavior as the abnormal behavior.
The technical scheme for further limiting the invention comprises the following steps:
the step S2 includes:
the data set with abnormal behaviors is Q, and Q samples are in total;
calculating each sample x in the data set Q with abnormal behavioriM neighbors of (i ═ 1, 2.. q), randomly selecting a sample point x from the m neighborsitThen a random number lambda of 0 to 1 is generatedjBased on xiGenerated j-th new sample point
Figure BDA0002397702470000041
For each xiAnd performing operation p times to obtain a data set with abnormal behaviors after p times of oversampling and expansion based on smote.
The step S3 includes:
the data set with abnormal behavior is represented by (x, y), x represents the feature, and y represents the abnormality. Constructing a hypersphere for a dataset (x, y) with abnormal behavior, which hypersphere can be described as
Figure BDA0002397702470000042
So that (x)i-a)T(xi-a)≤R2iWhere C is a penalty parameter, ξiIs the relaxation variable.
Converting the description of the hypersphere into the following form L- ∑αiK(xi,xj)-∑αiαjK(xi,xj) Where K is a kernel function, αiIs a Lagrange multiplier calculated using convex optimization αi
Calculating radius R of the hypersphere2=K(xi,xi)-2∑αiK(xi,xj)+∑∑αiαjK(xi,xj) And the center of sphere a is ∑αixi
And calculating the distance from the data which is unknown whether abnormal behaviors exist to the center a of the sphere, judging the behaviors of which the distance is less than the radius R of the hypersphere as abnormal behaviors, and reporting the identified abnormal behaviors to a verification module for further safety verification.
The method uses a smote oversampling algorithm to expand the data with the abnormality, and then uses svdd to identify the expanded data with the abnormality.
The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (3)

1. A bank customer transaction behavior abnormity identification method based on oversampling improved svdd is characterized in that: the method comprises the following steps:
s1, giving the original data of the bank customer transaction behaviors, carrying out consistency check on the original data, removing invalid data and repeated data, filling missing values, converting category variables into numerical variables, and classifying the original data into two types of abnormal behaviors and non-abnormal behaviors temporarily according to the result recorded in the original data; regarding the behavior that is not abnormal temporarily as the abnormality;
s2, setting a numerical value p, and expanding data with abnormal behaviors in the data by p times by using a smote oversampling algorithm;
s3, establishing an svdd model for the data with abnormal behaviors after p times of expansion, and calculating the center a and the radius R of the svdd model;
s4, calculating the distance from the data of the abnormal behavior to the center a of the svdd model, judging the transaction behavior of which the distance is less than the radius R of the svdd model as the abnormal behavior, otherwise, judging the transaction behavior as the abnormal behavior.
2. The method for identifying abnormal bank customer transaction behaviors based on over-sampling improved svdd as claimed in claim 1, wherein said step S2 comprises:
the data set with abnormal behaviors is Q, and Q samples are in total;
calculating each sample x in the data set Q with abnormal behavioriM neighbors of (i ═ 1, 2.. q), randomly selecting a sample point x from the m neighborsitThen a random number lambda of 0 to 1 is generatedjBased on xiGenerated j-th new sample point
Figure FDA0002397702460000011
For each sample xiAnd performing p times of linear interpolation operation, and generating a new sample every time to obtain the data set with abnormal behaviors after the data set is expanded by p times based on the smote oversampling algorithm.
3. The method for identifying abnormal bank customer transaction behaviors based on over-sampling improved svdd as claimed in claim 1, wherein said step S3 comprises:
the abnormal row is represented by (x, y)Is a data set, x represents a feature, and y represents an anomaly; constructing a hypersphere for the data set (x, y) with abnormal behavior, the hypersphere is described as
Figure FDA0002397702460000012
So that (x)i-a)T(xi-a)≤R2iWhere C is a penalty parameter, ξiIs a relaxation variable;
converting the description of the hypersphere into the following form L- ∑αiK(xi,xj)-∑αiαjK(xi,xj) Where K is a kernel function, αiCalculating the center a and the radius R of the hyper-sphere by using a Lagrange multiplier;
calculating radius R of the hypersphere2=K(xi,xi)-2∑αiK(xi,xj)+∑∑αiαjK(xi,xj) And the center of sphere a is ∑αixi
And calculating the distance from the data which is unknown whether abnormal behaviors exist to the center a of the sphere, judging the behaviors of which the distance is less than the radius R of the hypersphere as abnormal behaviors, and reporting the identified abnormal behaviors to a verification module for further safety verification.
CN202010137063.7A 2020-03-02 2020-03-02 Over-sampling improved svdd-based bank client transaction behavior abnormity identification method Pending CN111369339A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010137063.7A CN111369339A (en) 2020-03-02 2020-03-02 Over-sampling improved svdd-based bank client transaction behavior abnormity identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010137063.7A CN111369339A (en) 2020-03-02 2020-03-02 Over-sampling improved svdd-based bank client transaction behavior abnormity identification method

Publications (1)

Publication Number Publication Date
CN111369339A true CN111369339A (en) 2020-07-03

Family

ID=71206532

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010137063.7A Pending CN111369339A (en) 2020-03-02 2020-03-02 Over-sampling improved svdd-based bank client transaction behavior abnormity identification method

Country Status (1)

Country Link
CN (1) CN111369339A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112306835A (en) * 2020-11-02 2021-02-02 平安科技(深圳)有限公司 User data monitoring and analyzing method, device, equipment and medium
CN113191409A (en) * 2021-04-20 2021-07-30 国网江苏省电力有限公司营销服务中心 Method for detecting abnormal electricity consumption behaviors of residents through tag data expansion and deep learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130097103A1 (en) * 2011-10-14 2013-04-18 International Business Machines Corporation Techniques for Generating Balanced and Class-Independent Training Data From Unlabeled Data Set
CN104091073A (en) * 2014-07-11 2014-10-08 中国人民解放军国防科学技术大学 Sampling method for unbalanced transaction data of fictitious assets
CN107563431A (en) * 2017-08-28 2018-01-09 西南交通大学 A kind of image abnormity detection method of combination CNN transfer learnings and SVDD
CN108848068A (en) * 2018-05-29 2018-11-20 上海海事大学 Based on deepness belief network-Support Vector data description APT attack detection method
CN109766956A (en) * 2018-07-19 2019-05-17 西北工业大学 Method for detecting abnormality based on express delivery big data
CN110825545A (en) * 2019-08-31 2020-02-21 武汉理工大学 Cloud service platform anomaly detection method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130097103A1 (en) * 2011-10-14 2013-04-18 International Business Machines Corporation Techniques for Generating Balanced and Class-Independent Training Data From Unlabeled Data Set
CN104091073A (en) * 2014-07-11 2014-10-08 中国人民解放军国防科学技术大学 Sampling method for unbalanced transaction data of fictitious assets
CN107563431A (en) * 2017-08-28 2018-01-09 西南交通大学 A kind of image abnormity detection method of combination CNN transfer learnings and SVDD
CN108848068A (en) * 2018-05-29 2018-11-20 上海海事大学 Based on deepness belief network-Support Vector data description APT attack detection method
CN109766956A (en) * 2018-07-19 2019-05-17 西北工业大学 Method for detecting abnormality based on express delivery big data
CN110825545A (en) * 2019-08-31 2020-02-21 武汉理工大学 Cloud service platform anomaly detection method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张浩等: "基于数据增强和模型更新的异常流量检测技术", 《信息网络安全》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112306835A (en) * 2020-11-02 2021-02-02 平安科技(深圳)有限公司 User data monitoring and analyzing method, device, equipment and medium
CN112306835B (en) * 2020-11-02 2024-05-28 平安科技(深圳)有限公司 User data monitoring and analyzing method, device, equipment and medium
CN113191409A (en) * 2021-04-20 2021-07-30 国网江苏省电力有限公司营销服务中心 Method for detecting abnormal electricity consumption behaviors of residents through tag data expansion and deep learning

Similar Documents

Publication Publication Date Title
JP7102344B2 (en) Machine learning model modeling methods and devices
WO2018103456A1 (en) Method and apparatus for grouping communities on the basis of feature matching network, and electronic device
US8543522B2 (en) Automatic rule discovery from large-scale datasets to detect payment card fraud using classifiers
CN102291392B (en) Hybrid intrusion detection method based on Bagging algorithm
CN110998608B (en) Machine learning system for various computer applications
CN111798312A (en) Financial transaction system abnormity identification method based on isolated forest algorithm
Klerx et al. Model-based anomaly detection for discrete event systems
CN110084609B (en) Transaction fraud behavior deep detection method based on characterization learning
CN114818999B (en) Account identification method and system based on self-encoder and generation countermeasure network
CN117155706B (en) Network abnormal behavior detection method and system
CN111369339A (en) Over-sampling improved svdd-based bank client transaction behavior abnormity identification method
CN116307671A (en) Risk early warning method, risk early warning device, computer equipment and storage medium
Sun et al. Intrusion detection system based on in-depth understandings of industrial control logic
CN113283901A (en) Byte code-based fraud contract detection method for block chain platform
Ezeme et al. An imputation-based augmented anomaly detection from large traces of operating system events
CN116805245A (en) Fraud detection method and system based on graph neural network and decoupling representation learning
CN116400168A (en) Power grid fault diagnosis method and system based on depth feature clustering
CN115567224A (en) Method for detecting abnormal transaction of block chain and related product
CN114792007A (en) Code detection method, device, equipment, storage medium and computer program product
CN114140246A (en) Model training method, fraud transaction identification method, device and computer equipment
CN113781056A (en) Method and device for predicting user fraud behavior
CN111401783A (en) Power system operation data integration feature selection method
Balne et al. Credit card fraud detection using autoencoders
CN115907954A (en) Account identification method and device, computer equipment and storage medium
CN115996133B (en) Industrial control network behavior detection method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination