CN111833174A - Internet financial application anti-fraud identification method based on LOF algorithm - Google Patents

Internet financial application anti-fraud identification method based on LOF algorithm Download PDF

Info

Publication number
CN111833174A
CN111833174A CN202010493203.4A CN202010493203A CN111833174A CN 111833174 A CN111833174 A CN 111833174A CN 202010493203 A CN202010493203 A CN 202010493203A CN 111833174 A CN111833174 A CN 111833174A
Authority
CN
China
Prior art keywords
data
lof
abnormal
local
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010493203.4A
Other languages
Chinese (zh)
Inventor
江远强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baiweijinke Shanghai Information Technology Co ltd
Original Assignee
Baiweijinke Shanghai Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baiweijinke Shanghai Information Technology Co ltd filed Critical Baiweijinke Shanghai Information Technology Co ltd
Priority to CN202010493203.4A priority Critical patent/CN111833174A/en
Publication of CN111833174A publication Critical patent/CN111833174A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Technology Law (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an Internet financial application anti-fraud identification method based on an LOF algorithm, which comprises the steps of collecting data and preprocessing the data; selecting data characteristics to obtain a data set of an LOF algorithm and randomly dividing the data set into different data subsets; calculating local reachable distance, local density reachable density and local outlier LOF value of the data points; the LOF value is used to determine whether the data point is an outlier as to whether the requested action is fraudulent. By implementing the technical scheme of the invention, the running time of abnormal point detection is effectively shortened, the efficiency of abnormal value detection of the high-dimensional large data set is improved, the Internet application behaviors can be monitored in real time, the abnormal application fraud behaviors can be timely and accurately detected and found, the credit loss is reduced, and the method and the system are more suitable for the current requirements of large data wind control.

Description

Internet financial application anti-fraud identification method based on LOF algorithm
Technical Field
The invention relates to the technical field of wind control in the Internet financial industry, in particular to a wind control system.
Background
Along with the development of internet finance, the types and modes of fraud behaviors such as grey products, black products and the like are more and more, according to incomplete statistics, the loss caused by fraud can reach 500 to 1000 billion every year, and the fraud risk becomes the important factor of internet finance prevention risk. Statistically, fraud belongs to outliers relative to normal behavior, and in a scatter plot of data, their attribute values are far from other data points, and significantly deviate from expected or common attribute values, and outlier detection is a common method for financial anti-fraud, and how to effectively detect fraud at a high probability becomes the main work of anti-fraud of large financial institutions.
In the prior art, there are three main methods for outlier detection: an outlier detection method based on statistics (HBOS: histogram-based outlier score), an outlier detection method based on distance (such as K nearest neighbor KNN), an outlier detection method based on clustering (such as K-means clustering K-means and DBSCAN) and the like, but the algorithms in the prior art are complex, large in computation amount, large in time complexity, low in precision and the like, and the detection efficiency for high-dimensional and large data is low. How to reduce the calculation amount and the operation time of outlier detection becomes a technical problem to be solved urgently.
The LOF algorithm (Local Outlier Factor) is an abnormal data detection method based on density, and introduces the concepts of the reachable distance and the reachable density of each data object to judge whether one data object is an Outlier or not, calculates a Local abnormal Factor LOF for each data in a data set to reflect the abnormal degree of one data, because the LOF algorithm calculates the density by the kth neighborhood of the point, only carries out mining on the Outlier of a boundary unit where the Outlier is likely to appear, but not carries out global calculation, and can accurately find the Outlier under the condition that the sample space data is not uniformly distributed, thereby effectively reducing the data volume, the calculated amount and the running time length of the Outlier to be detected, having higher detection efficiency for high-dimensional large data, and being more suitable for the current large data pneumatic control requirement.
Disclosure of Invention
In order to solve the technical problem, the invention discloses an internet financial application anti-fraud identification method based on an LOF algorithm, and the technical scheme of the invention is implemented as follows:
an Internet financial application anti-fraud identification method based on an LOF algorithm comprises the following steps: the method comprises the following steps: collecting operation buried point data, personal basic information and client authorized third party data which are submitted by a client on a client; step two: data preprocessing, including abnormal value processing and normalization processing; step three: selecting data characteristics according to behavior characteristic types of credit fraud to obtain a data set of an LOF algorithm, and randomly dividing the data set into different data subsets; step four: based on the data subset, calculating the Kth distance field of the object p in the data subset through an LOF algorithm, and then calculating the local reachable distance of the object p; step five: calculating the local reachable density of the object p according to the local reachable distance; step six: calculating the LOF value of the local abnormal factor of the object p according to the local reachable density; step seven: and a recursion step I to a step six, wherein in the loop calculation, the obtained LOF value is compared with a set threshold psi, the object with the LOF value smaller than the threshold psi is judged as a normal point, the object is continuously removed, the object with the LOF value larger than the threshold psi is judged as an abnormal point, and the abnormal point is output.
Further, the outlier processing includes culling data of the extraneous dimension and deleting outliers in the data.
Further, the normalization process adopts a dispersion normalization method.
Further, the kth distance domain, the local reachable distance, and the local reachable density are only calculated in the data subset where the object p is located.
Further, the threshold ψ is dynamically set and adjusted depending on empirical values or actual traffic variations.
According to the technical scheme, in the anti-fraud identification of the Internet financial application based on the LOF algorithm, the outlier threshold psi is set according to experience and actual business, non-outliers with high density and outliers with high probability of outputting the outliers are continuously removed in recursive computation, the running time of outlier detection is effectively shortened, the efficiency of detecting outliers of high-dimensional large data sets is improved, the Internet application behavior can be monitored in real time, the application abnormal fraud behavior can be timely and accurately detected, and credit loss is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only one embodiment of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An Internet financial application anti-fraud identification method based on an LOF algorithm comprises the following steps: the method comprises the following steps: collecting operation buried point data, personal basic information and client authorized third party data which are submitted by a client on a client; step two: data preprocessing, including abnormal value processing and normalization processing; step three: selecting data characteristics according to behavior characteristic types of credit fraud to obtain a data set of an LOF algorithm, and randomly dividing the data set into different data subsets; step four: based on the data subset, calculating the Kth distance field of the object p in the data subset through an LOF algorithm, and then calculating the local reachable distance of the object p; step five: calculating the local reachable density of the object p according to the local reachable distance; step six: calculating the LOF value of the local abnormal factor of the object p according to the local reachable density; step seven: and in the loop calculation, comparing the obtained LOF value with a set threshold psi, determining the object with the LOF value smaller than the threshold psi as a normal point, continuously eliminating the object with the LOF value larger than the threshold psi as an abnormal point, and outputting the abnormal point.
In the embodiment, data can be acquired through the flow acquisition equipment deployed on the network node, and the acquired data characteristics can comprehensively reflect the comprehensive conditions of the repayment capacity and the repayment willingness of the application user; the personal basic information includes traditional data such as personal and family status, work and income levels, etc.
In this embodiment, the data set of the LOF algorithm is divided into different data sets, including a training set and a verification set, in the high-dimensional data set, some data dimensions are divided into n segments, the data set is divided along a dividing point connecting line labeled by each dimension, the divided irregular section is a grid boundary, and a specific boundary value of the grid boundary needs to be determined according to the dimensions and the size of the data set and a given dividing interval n.
In this embodiment, the subdata set in which the object p is located is defined as pi(ii) a The distance d between the object p and its k-th nearest neighbork(p) then there are at least k objects oiSatisfy d (o)i,p)≤d(okP), there are at most k-1 objects ojAnd satisfies the following conditions: d (o)j,p)<d(okP); the k neighbor of the object p is represented by the distance between all the k neighbors and the object p being less than dk(p) and then averaging the distances from the object p to k neighbors, i.e., the m-distance of p, the calculation formula is:
Figure BDA0002521880280000041
the m-neighbors of object p represent the set of all objects whose distance from p is less than m, the reachable distance reach _ dist of object p with respect to object om(o, p) represents the maximum of the m-distance of the object p and the distance between the objects p and o, the local achievable density lrd of the object pm(p) is the inverse of the average reachable distance from a point within the Kth distance neighborhood of object p to p, then the local reachable density of p lrdm(p) the value is:
Figure BDA0002521880280000042
the local anomaly factor for object p is then:
Figure BDA0002521880280000043
in a preferred embodiment, the outlier processing includes culling data of the extraneous dimension and removing outliers in the data.
In a preferred embodiment, the normalization process uses a dispersion normalization method, and the normalization process enables data to be mapped to [0, 1 ]]In the interval, the dispersion normalization formula is:
Figure BDA0002521880280000044
wherein x' is the normalized value, x is the data before normalization, xminIs the minimum value, x, in the featuremaxIs the maximum value in the feature;
in a preferred embodiment, the kth distance domain, the local reachable distance and the local reachable density are calculated only in the subset of data where the object p is located.
In a preferred embodiment, the threshold ψ is dynamically adjusted depending on empirical values or actual traffic variations. The threshold ψ is 1 by default in this embodiment.
It should be understood that the above-described embodiments are merely exemplary of the present invention, and are not intended to limit the present invention, and that any modification, equivalent replacement, or improvement made without departing from the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (5)

1. An Internet financial application anti-fraud identification method based on an LOF algorithm is characterized by comprising the following steps:
the method comprises the following steps: collecting operation buried point data, personal basic information and client authorized third party data which are submitted by a client on a client;
step two: data preprocessing, including abnormal value processing and normalization processing;
step three: selecting data characteristics according to behavior characteristic types of credit fraud to obtain a data set of an LOF algorithm, and randomly dividing the data set into different data subsets;
step four: based on the data subset, calculating the Kth distance field of the object p in the data subset through an LOF algorithm, and then calculating the local reachable distance of the object p;
step five: calculating the local reachable density of the object p according to the local reachable distance;
step six: calculating the LOF value of the local abnormal factor of the object p according to the local reachable density;
step seven: and a recursion step I to a step six, wherein in the loop calculation, the obtained LOF value is compared with a set threshold psi, the object with the LOF value smaller than the threshold psi is judged as a normal point, the object is continuously removed, the object with the LOF value larger than the threshold psi is judged as an abnormal point, and the abnormal point is output.
2. The method for identifying internet financial application fraud prevention based on LOF algorithm of claim 1, wherein the abnormal value processing includes removing data of irrelevant dimension and deleting abnormal value in data.
3. The method for identifying internet financial application fraud prevention based on LOF algorithm of claim 1, wherein the normalization process adopts a dispersion normalization method.
4. The method for recognizing internet financial application fraud prevention based on LOF algorithm of claim 1, wherein the Kth distance field, the local reachable distance and the local reachable density are calculated only in the data subset where the object p is located.
5. The method for identifying internet financial application fraud prevention based on LOF algorithm of claim 1, wherein the threshold ψ is dynamically set and adjusted depending on empirical values or actual traffic variation.
CN202010493203.4A 2020-06-03 2020-06-03 Internet financial application anti-fraud identification method based on LOF algorithm Pending CN111833174A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010493203.4A CN111833174A (en) 2020-06-03 2020-06-03 Internet financial application anti-fraud identification method based on LOF algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010493203.4A CN111833174A (en) 2020-06-03 2020-06-03 Internet financial application anti-fraud identification method based on LOF algorithm

Publications (1)

Publication Number Publication Date
CN111833174A true CN111833174A (en) 2020-10-27

Family

ID=72897546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010493203.4A Pending CN111833174A (en) 2020-06-03 2020-06-03 Internet financial application anti-fraud identification method based on LOF algorithm

Country Status (1)

Country Link
CN (1) CN111833174A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160132886A1 (en) * 2013-08-26 2016-05-12 Verafin, Inc. Fraud detection systems and methods
CN106330624A (en) * 2016-11-07 2017-01-11 国网江苏省电力公司南京供电公司 Method for detecting power information network traffic abnormality
CN109102028A (en) * 2018-08-20 2018-12-28 南京邮电大学 Based on improved fast density peak value cluster and LOF outlier detection algorithm
CN109284371A (en) * 2018-09-03 2019-01-29 平安证券股份有限公司 Anti- fraud method, electronic device and computer readable storage medium
CN109948724A (en) * 2019-03-28 2019-06-28 山东浪潮云信息技术有限公司 A kind of electric business brush single act detection method based on improvement LOF algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160132886A1 (en) * 2013-08-26 2016-05-12 Verafin, Inc. Fraud detection systems and methods
CN106330624A (en) * 2016-11-07 2017-01-11 国网江苏省电力公司南京供电公司 Method for detecting power information network traffic abnormality
CN109102028A (en) * 2018-08-20 2018-12-28 南京邮电大学 Based on improved fast density peak value cluster and LOF outlier detection algorithm
CN109284371A (en) * 2018-09-03 2019-01-29 平安证券股份有限公司 Anti- fraud method, electronic device and computer readable storage medium
CN109948724A (en) * 2019-03-28 2019-06-28 山东浪潮云信息技术有限公司 A kind of electric business brush single act detection method based on improvement LOF algorithm

Similar Documents

Publication Publication Date Title
CN109729090B (en) Slow denial of service attack detection method based on WEDMS clustering
CN109962909B (en) Network intrusion anomaly detection method based on machine learning
CN111798312A (en) Financial transaction system abnormity identification method based on isolated forest algorithm
CN110111113B (en) Abnormal transaction node detection method and device
Dheepa et al. Analysis of credit card fraud detection methods
CN111191720B (en) Service scene identification method and device and electronic equipment
CN112906738B (en) Water quality detection and treatment method
CN110661802A (en) Low-speed denial of service attack detection method based on PCA-SVM algorithm
CN111970259B (en) Network intrusion detection method and alarm system based on deep learning
CN114417971A (en) Electric power data abnormal value detection algorithm based on K nearest neighbor density peak clustering
CN112330158A (en) Method for identifying traffic index time sequence based on autoregressive differential moving average-convolution neural network
CN112185108A (en) Urban road network congestion mode identification method, equipment and medium based on space-time characteristics
Li The intrusion data mining method for distributed network based on fuzzy kernel clustering algorithm
CN115115369A (en) Data processing method, device, equipment and storage medium
CN115622806B (en) Network intrusion detection method based on BERT-CGAN
CN111833174A (en) Internet financial application anti-fraud identification method based on LOF algorithm
CN117527295A (en) Self-adaptive network threat detection system based on artificial intelligence
CN112288561A (en) Internet financial fraud behavior detection method based on DBSCAN algorithm
CN116187423A (en) Behavior sequence anomaly detection method and system based on unsupervised algorithm
Prerau et al. Unsupervised anomaly detection using an optimized K-nearest neighbors algorithm
CN115834156A (en) Abnormal behavior detection method based on web access log
CN115277178A (en) Method, device and storage medium for monitoring abnormity based on enterprise network traffic
CN114666075B (en) Distributed network anomaly detection method and system based on depth feature coarse coding
Baig et al. One-dependence estimators for accurate detection of anomalous network traffic
Chhabra et al. Crime Prediction Patterns Using Hybrid K-Means Hierarchical Clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201027