KR101834260B1

KR101834260B1 - Method and Apparatus for Detecting Fraudulent Transaction

Info

Publication number: KR101834260B1
Application number: KR1020170008815A
Authority: KR
Inventors: 최은영; 고웅; 오성택; 김미주; 이태진
Original assignee: 한국인터넷진흥원
Priority date: 2017-01-18
Filing date: 2017-01-18
Publication date: 2018-03-06

Abstract

Provided are a method and a device for detecting an abnormal transaction, which detect an abnormal transaction based on transaction data used in an electronic financial transaction. The method for detecting an abnormal transaction performed by the device for detecting an abnormal transaction comprises the following steps of: obtaining first transaction data including a classification result indicating an abnormal transaction or a normal transaction and a second transaction data not including the classification result, wherein each of the first transaction data and the second transaction data is composed of a plurality of features; determining a key feature among the plurality of features composing the first transaction data by using a filter method-based feature selection algorithm; and performing an unsupervised learning-based clustering with respect to key feature data for the second transaction data, and constructing an abnormal transaction detection model by using a clustering result.

Description

이상 거래 탐지 방법 및 장치{Method and Apparatus for Detecting Fraudulent Transaction}TECHNICAL FIELD The present invention relates to a method and an apparatus for detecting an abnormal transaction,

본 발명은 이상 거래 탐지 방법 및 장치에 관한 것이다. 보다 자세하게는, 전자 금융 거래에 사용되는 거래 데이터 기반으로 이상 거래를 탐지하는 이상 거래 탐지 방법 및 장치에 관한 것이다.The present invention relates to an abnormal transaction detection method and apparatus. More particularly, the present invention relates to an abnormal transaction detection method and apparatus for detecting abnormal transactions based on transaction data used for electronic financial transactions.

스마트기기 및 인터넷 보급이 확산되고, 이에 대한 위협이 증가함에 따라 전자금융거래에 대한 안전성 강화를 위해 이상 거래 탐지 시스템(Fraud Detection System; FDS)이 활용되고 있다. 이상 거래 탐지 시스템은 전자금융거래에 사용되는 단말 정보, 접속 정보, 거래 내용 등을 종합적으로 분석하여 비정상적인 거래를 탐지하고 차단하는 시스템을 의미한다.Fraud Detection System (FDS) is being used to enhance the security of electronic financial transactions as the spread of smart devices and the Internet spread and threats increase. The abnormal transaction detection system refers to a system that detects and intercepts abnormal transactions by comprehensively analyzing terminal information, access information, and transaction contents used in electronic financial transactions.

종래의 이상 거래 탐지 시스템은 과거의 거래 정보를 분석하여 이상 거래를 탐지하기 위한 규칙(Rule)을 설정하고, 설정된 규칙에 부합하는 거래를 이상 거래로 판단하여 차단하는 방식으로 동작한다.The conventional abnormal transaction detection system operates in a manner of setting a rule for detecting an abnormal transaction by analyzing past transaction information, and determining a transaction corresponding to the set rule as an abnormal transaction and blocking the transaction.

그러나, 이와 같은 규칙 기반의 이상 거래 탐지 시스템은 사용자의 이상 거래 패턴을 충분히 반영하지 못해 이상 거래 탐지의 정확도가 떨어지고, 금융 거래 현실의 변화에 따라 규칙을 계속해서 갱신해야 되는 등의 문제점을 보여주었다.However, such a rule-based abnormal transaction detection system does not sufficiently reflect the abnormal transaction pattern of the user, so that the accuracy of the abnormal transaction detection is low and the rule is continuously updated according to the change of the financial transaction reality .

이에 따라, 최근에는 기계 학습(machine learning) 알고리즘을 이용하여 종래 이상 거래 탐지 시스템의 문제를 해결하고 보다 고도화된 이상 거래 탐지 시스템을 구축하기 위한 연구가 시도 되고 있다. 구체적으로, 정상 거래 또는 이상 거래를 가리키는 분류 결과가 포함된 거래 데이터 기반으로 지도 학습(supervised learning) 방식의 기계 학습을 수행하여 이상 거래 탐지 모델을 구축하고, 구축된 모델 기반으로 이상 거래를 탐지하는 이상 거래 탐지 시스템에 대한 연구가 시도 되고 있다.In recent years, studies have been made to solve the problems of the conventional abnormal transaction detection system using a machine learning algorithm and to build a more advanced abnormal transaction detection system. Specifically, supervisory learning is performed on the basis of transaction data that includes classification results indicating normal transactions or abnormal transactions to construct an abnormal transaction detection model, and abnormal transactions are detected based on the established model A study on the abnormal transaction detection system is attempted.

그러나, 현실적으로 금융 거래 방식에 따라 정상 거래 또는 이상 거래를 가리키는 분류 결과가 포함된 거래 데이터는 많지 않기 때문에, 지도 학습 기반으로 이상 거래를 탐지하는 모델을 충분히 학습시키는 것은 쉽지 않은 실정이다.However, in reality, there are not many transaction data that include classification results indicating normal transactions or abnormal transactions depending on the financial transaction method. Therefore, it is not easy to sufficiently learn a model for detecting abnormal transactions based on map learning.

따라서, 과거의 거래 데이터 기반으로 비지도 학습(unsupervised learning) 방식의 기계 학습을 통해 이상 거래를 탐지할 수 있는 새로운 방식의 이상 거래 탐지 방법 및 장치가 요구된다.Therefore, there is a need for a new method of abnormal transaction detection method and apparatus capable of detecting an abnormal transaction through machine learning of unsupervised learning based on past transaction data.

한국공개특허 제2015-0005126호Korea Patent Publication No. 2015-0005126

본 발명이 해결하고자 하는 기술적 과제는, 비지도 학습 기반의 이상 거래 탐지 모델을 구축하고, 상기 이상 거래 탐지 모델을 이용하여 이상 거래를 탐지할 수 있는 이상 거래 탐지 방법 및 장치를 제공하는 것이다.An object of the present invention is to provide an abnormal transaction detection method and apparatus capable of detecting abnormal transactions using the abnormal transaction detection model by constructing an abnormal transaction detection model based on non-bad learning.

본 발명이 해결하고자 하는 다른 기술적 과제는, 정상 거래 또는 이상 거래를 가리키는 분류 결과가 포함된 일부 거래 데이터를 이용하여 보다 효율적으로 비지도 학습을 수행할 수 있는 이상 거래 탐지 방법 및 장치를 제공하는 것이다.Another technical problem to be solved by the present invention is to provide an abnormal transaction detection method and apparatus capable of more efficiently performing unvisited learning using some transaction data including a classification result indicating a normal transaction or an abnormal transaction .

본 발명이 해결하고자 하는 또 다른 기술적 과제는, 거래 데이터의 종류 및 비지도 학습 알고리즘에 따라 최적의 이상 거래 탐지 모델을 구축할 수 있는 이상 거래 탐지 방법 및 장치를 제공하는 것이다.It is another object of the present invention to provide an abnormal transaction detection method and apparatus capable of constructing an optimal abnormal transaction detection model according to a transaction data type and a non-image degree learning algorithm.

본 발명의 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The technical problems of the present invention are not limited to the above-mentioned technical problems, and other technical problems which are not mentioned can be clearly understood by those skilled in the art from the following description.

상술한 기술적 과제를 해결하기 위한 본 발명의 일 실시예에 따른 이상 거래 탐지 방법은, 이상 거래 탐지 장치가, 이상 거래 또는 정상 거래를 가리키는 분류 결과를 포함하는 제1 거래 데이터와 상기 분류 결과를 포함하지 않는 제2 거래 데이터를 획득하되, 상기 제1 거래 데이터 및 상기 제2 거래 데이터 각각은 복수의 특징으로 구성되는 것인, 단계, 상기 이상 거래 탐지 장치가, 필터 방법 기반의 특징 선택 알고리즘을 이용하여 상기 제1 거래 데이터를 구성하는 복수의 특징 중에서 핵심 특징을 결정하는 단계 및 상기 이상 거래 탐지 장치가, 상기 제2 거래 데이터에 대한 상기 핵심 특징의 데이터에 대하여 비지도 학습(unsupervised learning) 기반의 클러스터링을 수행하고, 상기 클러스터링 결과를 이용하여 이상 거래 탐지 모델을 구축하는 단계를 포함할 수 있다.According to another aspect of the present invention, there is provided an abnormal transaction detection method, including: detecting abnormal transaction based on first transaction data including a classification result indicating an abnormal transaction or a normal transaction; Wherein the first transaction data and the second transaction data each comprise a plurality of features, wherein the abnormal transaction detection device uses a filter method-based feature selection algorithm Determining a key characteristic among a plurality of features constituting the first transaction data, and the abnormal transaction detection device further comprises: an unsupervised learning based on the key characteristic data for the second transaction data; Clustering, and building an abnormal transaction detection model using the clustering result Can.

일 실시예에서, 상기 핵심 특징을 결정하는 단계는, 정상 거래 및 이상 거래를 가리키는 거래 데이터의 비율이 제1 비율이 되도록 상기 제1 거래 데이터를 샘플링하는 단계, 상기 거래 데이터의 비율이 상기 제1 비율과 다른 제2 비율이 되도록 상기 제1 거래 데이터를 샘플링하는 단계, 상기 제1 비율로 샘플링 된 제1 샘플링 데이터 및 상기 제2 비율로 샘플링 된 제2 샘플링 데이터 각각에 대하여 기 설정된 평가 메트릭에 따른 특징 별 스코어를 계산하는 단계, 상기 제1 샘플링 데이터에 대한 특징 별 스코어와 상기 제1 샘플링 데이터에 대한 특징 별 스코어의 평균 값을 이용하여, 상기 기 설정된 개수의 핵심 특징을 선택하는 단계를 포함할 수 있다.In one embodiment, the step of determining the key characteristic comprises the steps of: sampling the first transaction data such that a ratio of transaction data indicating a normal transaction and an abnormal transaction is a first rate; Sampling the first transaction data such that the second rate is different from the first rate and the second rate being different from the first rate; and calculating a second ratio based on a predetermined evaluation metric for each of the first sampling data sampled at the first rate and the second sampling data sampled at the second rate Selecting a predetermined number of key features using a score for each characteristic of the first sampling data and an average value of scores for each characteristic of the first sampling data; .

일 실시예에서, 상기 이상 거래 탐지 모델을 구축하는 단계는, 상기 비지도 학습 기반의 제1 클러스터링 알고리즘을 수행하여 제1 후보 이상 거래 탐지 모델을 구축하는 단계, 상기 비지도 학습 기반의 제2 클러스터링 알고리즘을 수행하여 제2 후보 이상 거래 탐지 모델을 구축하되, 상기 제2 클러스터링 알고리즘은 상기 제1 클러스터링 알고리즘과 서로 다른 알고리즘인 것인, 단계, 기 설정된 검증 메트릭 기준으로 상기 제1 후보 이상 거래 탐지 모델 및 상기 제2 후보 이상 거래 탐지 모델의 성능을 검증하는 단계 및 상기 검증을 통해 산출된 성능 수치를 이용하여, 상기 제1 후보 이상 거래 탐지 모델 및 상기 제2 후보 이상 거래 탐지 모델 중에서 상기 이상 거래 탐지 모델을 결정하는 단계를 포함할 수 있다.In one embodiment, the step of constructing the abnormal transaction detection model may include building a first candidate abnormal transaction detection model by performing a first clustering algorithm based on the non-background learning, Wherein the second clustering algorithm is a different algorithm than the first clustering algorithm, and wherein the second candidate overflow detection model is constructed by performing the first candidate overflow detection model And verifying performance of the second candidate abnormal transaction detection model and using the performance values calculated through the verification to detect the abnormal transaction among the first candidate abnormal transaction detection model and the second candidate abnormal transaction detection model, And determining a model.

상술한 기술적 과제를 해결하기 위한 본 발명의 다른 실시예에 따른 이상 거래 탐지 장치는, 하나 이상의 프로세서, 네트워크 인터페이스, 상기 프로세서에 의하여 수행되는 컴퓨터 프로그램을 로드(Load)하는 메모리 및 상기 컴퓨터 프로그램을 저장하는 스토리지를 포함하되, 상기 컴퓨터 프로그램은, 이상 거래 또는 정상 거래를 가리키는 분류 결과를 포함하는 제1 거래 데이터와 상기 분류 결과를 포함하지 않는 제2 거래 데이터를 획득하되, 상기 제1 거래 데이터 및 상기 제2 거래 데이터 각각은 복수의 특징으로 구성되는 것인, 오퍼레이션, 필터 방법 기반의 특징 선택 알고리즘을 이용하여 상기 제1 거래 데이터를 구성하는 복수의 특징 중에서 핵심 특징을 결정하는 오퍼레이션 및 상기 제2 거래 데이터에 대한 상기 핵심 특징의 데이터에 대하여 비지도 학습(unsupervised learning) 기반의 클러스터링을 수행하고, 상기 클러스터링 결과를 이용하여 이상 거래 탐지 모델을 구축하는 오퍼레이션을 포함할 수 있다.According to another aspect of the present invention, there is provided an abnormal transaction detection apparatus including at least one processor, a network interface, a memory for loading a computer program executed by the processor, Wherein the computer program is adapted to acquire first transaction data including a classification result indicating an abnormal transaction or a normal transaction and second transaction data that does not include the classification result, An operation for determining a core characteristic among a plurality of features constituting the first transaction data by using an operation and a filter method based feature selection algorithm, each of the second transaction data including a plurality of features, Data for the above-mentioned core characteristic data Also it may include the operations building trade over detection model by performing clustering based learning (unsupervised learning), and using the clustering results.

상술한 기술적 과제를 해결하기 위한 본 발명의 또 다른 실시예에 따른 이상 거래 탐지 컴퓨터 프로그램은, 이상 거래 또는 정상 거래를 가리키는 분류 결과를 포함하는 제1 거래 데이터와 상기 분류 결과를 포함하지 않는 제2 거래 데이터를 획득하되, 상기 제1 거래 데이터 및 상기 제2 거래 데이터 각각은 복수의 특징으로 구성되는 것인, 단계, 필터 방법 기반의 특징 선택 알고리즘을 이용하여 상기 제1 거래 데이터를 구성하는 복수의 특징 중에서 핵심 특징을 결정하는 단계 및 상기 제2 거래 데이터에 대한 상기 핵심 특징의 데이터에 대하여 비지도 학습(unsupervised learning) 기반의 클러스터링을 수행하고, 상기 클러스터링 결과를 이용하여 이상 거래 탐지 모델을 구축하는 단계를 실행시키기 위하여 기록매체에 저장될 수 있다.According to another aspect of the present invention, there is provided an abnormal transaction detection computer program for causing a computer to execute an abnormal transaction detection computer program that includes first transaction data including a classification result indicating an abnormal transaction or a normal transaction, Wherein each of the first transaction data and the second transaction data is composed of a plurality of features, and wherein the plurality of features constituting the first transaction data Determining a core characteristic among features of the first transaction data, performing clustering based on unsupervised learning on the data of the core characteristic of the second transaction data, and constructing an abnormal transaction detection model using the clustering result May be stored in the recording medium to execute the steps.

상술한 본 발명에 따르면, 분류 결과가 포함되지 않은 다수의 거래 데이터를 이용하여 충분한 비지도 학습을 수행함으로써, 이상 거래 탐지의 정확도를 향상시킬 수 있다.According to the present invention, it is possible to improve the accuracy of abnormal transaction detection by performing a sufficient non-ambiguity learning using a large number of transaction data that does not include a classification result.

또한, 특징 선택을 통해 이상 거래 탐지 모델 구축에 이용되는 특징의 개수를 줄임으로써 이상 거래 탐지 모델의 학습에 요구되는 컴퓨팅 비용을 감소시키고, 이상 거래 탐지의 정확도를 더욱 향상시킬 수 있다.In addition, by reducing the number of features used in the abnormal transaction detection model construction through feature selection, it is possible to reduce the computing cost required for learning the abnormal transaction detection model and further improve the accuracy of the abnormal transaction detection.

또한, 비율을 달리하여 샘플링된 데이터 각각에 대하여 필터 방법 기반 특징 선택 알고리즘을 이용하여 특징 별 스코어를 산출하고, 상기 특징 별 스코어를 종합적으로 고려하여 핵심 특징을 도출함으로써, 분별력이 강한 최적의 핵심 특징을 선택할 수 있다. 또한, 분별력이 강한 핵심 특징이 선택됨에 따라 이상 거래 탐지의 정확도는 더욱 향상될 수 있다.Also, by calculating the score for each feature using the filter-method-based feature selection algorithm for each of the data sampled at different ratios and deriving the core feature by taking the score for each feature into consideration in a comprehensive manner, Can be selected. In addition, the accuracy of abnormal transaction detection can be further improved by selecting key features with strong discriminatory power.

또한, 모델 검증을 통해 복수의 후보 이상 거래 탐지 모델에서 실제 이상 거래에 탐지에 이용될 이상 거래 탐지 모델을 결정함으로써, 데이터의 종류 및 비지도 학습 알고리즘 등에 따라 최적의 모델이 동적으로 결정될 수 있다.In addition, the optimal model can be dynamically determined according to the type of data and the non-degree of learning algorithm by determining the abnormal transaction detection model to be used for detecting abnormal abnormal transactions in a plurality of candidate abnormal transaction detection models through model verification.

본 발명의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood to those of ordinary skill in the art from the following description.

도 1은 본 발명의 일 실시예에 따른 이상 거래 탐지 시스템의 구성도이다.
도 2a는 본 발명의 다른 실시예에 따른 이상 거래 탐지 장치의 기능 블록도이다.
도 2b는 도 2a에 도시된 훈련데이터셋의 데이터 종류를 설명하기 위한 도면이다.
도 3은 본 발명의 또 다른 실시예에 따른 이상 거래 탐지 장치의 하드웨어 구성도이다.
도 4은 본 발명의 몇몇 실시예에서 참조될 수 있는 이상 거래 탐지 모델 구축 단계의 상세 순서도이다.
도 5는 도 4에 도시된 핵심 특징 선택 단계의 상세 순서도이다.
도 6 내지 도 7은 도 4에 도시된 핵심 특징 선택 단계를 부연 설명하기 위한 도면이다.
도 8a 및 도 8b는 샘플링 비율을 달리하여 핵심 특징을 선택하는 방법을 설명하기 위한 도면이다.
도 9 내지 도 12는 복수의 후보 이상 거래 탐지 모델에서 이상 거래 탐지에 이용될 이상 거래 탐지 모델을 결정하는 방법을 설명하기 위한 도면이다.
도 13은 본 발명의 몇몇 실시예에서 참조될 수 있는 이상 거래 판단 단계의 상세 순서도이다.1 is a block diagram of an abnormal transaction detection system according to an embodiment of the present invention.
2A is a functional block diagram of an abnormal transaction detection apparatus according to another embodiment of the present invention.
FIG. 2B is a view for explaining data types of the training data set shown in FIG. 2A.
3 is a hardware block diagram of an abnormal transaction detection apparatus according to another embodiment of the present invention.
Figure 4 is a detailed flowchart of the abnormal transaction detection model building step that may be referred to in some embodiments of the present invention.
FIG. 5 is a detailed flowchart of the key feature selection step shown in FIG.
FIGS. 6 to 7 are views for further illustrating the key feature selection step shown in FIG.
FIGS. 8A and 8B are diagrams for explaining a method of selecting key features with different sampling rates. FIG.
9 to 12 are diagrams for explaining a method for determining an abnormal transaction detection model to be used in abnormal transaction detection in a plurality of candidate abnormal transaction detection models.
Figure 13 is a detailed flowchart of an abnormal transaction determination step that may be referred to in some embodiments of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 상세히 설명한다. 본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시 예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시 예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention, and the manner of achieving them, will be apparent from and elucidated with reference to the embodiments described hereinafter in conjunction with the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout the specification.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 공통으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다. 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다.Unless defined otherwise, all terms (including technical and scientific terms) used herein may be used in a sense that is commonly understood by one of ordinary skill in the art to which this invention belongs. Also, commonly used predefined terms are not ideally or excessively interpreted unless explicitly defined otherwise. The terminology used herein is for the purpose of illustrating embodiments and is not intended to be limiting of the present invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification.

명세서에서 사용되는 "포함한다 (comprises)" 및/또는 "포함하는 (comprising)"은 언급된 구성 요소, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성 요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.It is noted that the terms "comprises" and / or "comprising" used in the specification are intended to be inclusive in a manner similar to the components, steps, operations, and / Or additions.

이하, 본 발명에 대하여 첨부된 도면에 따라 보다 상세히 설명한다.Hereinafter, the present invention will be described in more detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 이상 거래 탐지 시스템의 구성도이다.1 is a block diagram of an abnormal transaction detection system according to an embodiment of the present invention.

본 실시예에 따른 이상 거래 탐지 시스템은 사용자 단말과 전자 금융 서비스를 제공하는 서버 사이에 송수신되는 거래 데이터를 모니터링하고, 상기 거래 데이터가 가리키는 전자 금융 거래가 이상 거래에 해당하는지 여부를 실시간 또는 비실시간으로 탐지하는 시스템이다. 이하에서는, 설명의 편의를 위해 다양한 전자 금융 거래 중 전자 결제를 예를 들어서 설명한다. 즉, 복수의 사용자 단말(300)과 전자 결제 서버(200)에 송수신되는 결제 데이터를 상기 거래 데이터의 예로써 설명한다. 단, 본 발명의 기술적 사상은 전자 결제 외에도 인터넷 뱅킹, 스마트 폰 뱅킹 등 다양한 종류의 전자 금융 거래에 적용될 수 있음에 유의하여야 한다.The abnormal transaction detection system according to the present embodiment monitors transaction data transmitted and received between a user terminal and a server providing an electronic financial service, and determines whether the electronic financial transaction indicated by the transaction data corresponds to an abnormal transaction in real time or in non- Detection system. Hereinafter, electronic settlement among various electronic financial transactions will be described as an example for convenience of explanation. That is, payment data transmitted to and received from a plurality of user terminals 300 and electronic payment server 200 will be described as an example of the transaction data. It should be noted that the technical idea of the present invention can be applied to various types of electronic financial transactions such as Internet banking and smartphone banking in addition to electronic payment.

본 실시예에 따른 이상 거래 탐지 시스템은 이상 거래 탐지 장치(100)와 전자 결제 서버(200)를 포함할 수 있다. 단, 이는 본 발명의 목적을 달성하기 위한 바람직한 실시예일 뿐이며, 필요에 따라 일부 구성 요소가 추가되거나 삭제될 수 있음은 물론이다.The abnormal transaction detection system according to the present embodiment may include the abnormal transaction detection device 100 and the electronic payment server 200. However, it should be understood that the present invention is not limited to the above-described embodiments, and that various changes and modifications may be made without departing from the scope of the present invention.

이상 거래 탐지 장치(100)는 전자 결제 서버(200)로부터 수신한 결제 데이터 기반으로 사용자 단말이 요청한 전자 금융 거래가 이상 거래에 해당되는지 여부를 탐지하는 컴퓨팅 장치이다. 여기서, 상기 컴퓨팅 장치는 노트북, 데스크톱(desktop), 랩탑(laptop) 등이 될 수 있으나, 이에 국한되는 것은 아니며 컴퓨팅 기능 및 통신 기능이 구비된 모든 종류의 장치를 포함할 수 있다. 단, 다수의 결제 데이터를 수신하여 실시간으로 이상 거래 여부를 탐지하기 위해서 바람직하게는 고성능의 컴퓨팅 기능이 구비된 서버로 구축될 수 있다.The abnormal transaction detection device 100 is a computing device that detects whether the electronic financial transaction requested by the user terminal based on the payment data received from the electronic payment server 200 corresponds to an abnormal transaction. Here, the computing device may be a notebook, a desktop, a laptop, and the like, but the present invention is not limited thereto and may include all kinds of devices having a computing function and a communication function. However, in order to receive a plurality of settlement data and to detect whether or not an abnormal transaction is performed in real time, it may be constructed as a server having a high-performance computing function.

실시예에 따라, 이상 거래 탐지 장치(100)는 전자 결제 서버(200)와 복수의 사용자 단말(300) 사이에서 리버스 프록시(reverse proxy) 방식으로 결제 데이터를 수신할 수도 있다. 이 때, 이상 거래 탐지 장치(100)는 수신한 결제 데이터가 이상 거래에 해당하는 경우 해당 결제를 차단하고, 정상 거래에 해당하는 경우에만 상기 결제 데이터를 전자 결제 서버(200)로 전달할 수 있다.The abnormal transaction detection device 100 may receive settlement data between the electronic payment server 200 and a plurality of user terminals 300 in a reverse proxy manner. At this time, the abnormal transaction detection device 100 may block the payment if the received payment data corresponds to an abnormal transaction, and may transmit the payment data to the electronic payment server 200 only when the received payment data corresponds to a normal transaction.

이상 거래 탐지 장치(100)는 이상 거래를 탐지하기 위해 기 저장된 결제 데이터 기반으로 비지도 학습(unsupervised learning) 방식의 기계 학습을 이용하여 이상 거래 탐지 모델을 구축한다. 구축된 이상 거래 탐지 모델은 새로운 결제 데이터가 획득된 경우, 상기 전자 결제가 가리키는 전자 금융 거래가 이상 거래에 해당되는지 여부를 판단하는데 이용된다. 이상 거래 탐지 장치(100)가 이상 거래 탐지 모델을 구축하고, 이상 거래 해당 여부를 판단하는 방법에 대한 자세한 설명은 추후 도 4 내지 도 13을 참조하여 상세하게 설명한다. 또한, 상기 비지도 학습은 당해 기술 분야에서 널리 알려진 개념인바 이에 대한 설명은 생략한다.The abnormal transaction detection apparatus 100 constructs an abnormal transaction detection model using unsupervised learning machine learning based on pre-stored payment data in order to detect an abnormal transaction. The established abnormal transaction detection model is used to determine whether an electronic financial transaction indicated by the electronic settlement corresponds to an abnormal transaction when new settlement data is acquired. A detailed description of how the abnormal transaction detection device 100 constructs the abnormal transaction detection model and determines whether or not an abnormal transaction is determined will be described later in detail with reference to FIGS. 4 to 13. FIG. In addition, the non-background learning is a well-known concept in the related art, and a description thereof will be omitted.

전자 결제 서버(200)는 복수의 사용자 단말(300)로부터 결제 데이터를 수신하고 전자 결제 서비스를 제공하는 서버이다. 전자 결제 서버(200)는 예를 들어 인터넷 상에서 카드 결제 서비스를 제공하는 서버일 수 있다. 단, 이에 국한되는 것은 아니며, 전자 결제를 제공하는 모든 종류의 서버를 포함할 수 있다. 또한, 실시예에 따라 이상 거래 탐지 시스템은 전자 결제 서버(200) 외에 인터넷 뱅킹, 스마튼 폰 뱅킹 등의 뱅킹 서비스를 제공하는 서버를 더 포함할 수 있다.The electronic payment server 200 is a server that receives payment data from a plurality of user terminals 300 and provides an electronic payment service. The electronic payment server 200 may be, for example, a server that provides a card settlement service on the Internet. However, the present invention is not limited thereto, and may include all kinds of servers that provide electronic settlement. In addition, according to the embodiment, the abnormal transaction detection system may further include a server for providing banking services such as Internet banking and smartphone banking in addition to the electronic payment server 200. [

복수의 사용자 단말(300)은 사용자로부터 결제 데이터를 입력 받고, 입력 받은 결제 데이터를 전자 결제 서버(200)로 송신함으로써 전자 금융 거래를 수행하는 장치이다. 여기서, 상기 사용자 단말은 노트북, 데스크톱(desktop), 랩탑(laptop), 스마트 폰(smart phone) 등이 될 수 있으나, 이에 국한되는 것은 아니다.The plurality of user terminals 300 is an apparatus that receives payment data from a user and transmits the received payment data to the electronic payment server 200 to perform an electronic financial transaction. The user terminal may be, but is not limited to, a laptop, a desktop, a laptop, a smart phone, and the like.

복수의 사용자 단말(300)과 전자 결제 서버(200)는 네트워크를 통해 통신할 수 있다. 여기서, 상기 네트워크는 근거리 통신망(Local Area Network; LAN), 광역 통신망(Wide Area Network; WAN), 이동 통신망(mobile radio communication network), Wibro(Wireless Broadband Internet) 등과 같은 모든 종류의 유/무선 네트워크로 구현될 수 있다.The plurality of user terminals 300 and the electronic payment server 200 can communicate through a network. Here, the network may be any kind of wired / wireless network such as a local area network (LAN), a wide area network (WAN), a mobile radio communication network, a wibro Can be implemented.

참고로, 도 1에서 이상 거래 탐지 장치(100)와 전자 결제 서버(200)는 물리적으로 독립된 장치로 도시되어 있으나, 실시예에 따라 이상 거래 탐지 장치(100)와 전자 결제 서버(200)는 동일한 장치 내에 서로 다른 로직의 형태로 구현될 수도 있다.1, the abnormal transaction detection device 100 and the electronic payment server 200 are physically separated from each other. However, according to the embodiment, the abnormal transaction detection device 100 and the electronic payment server 200 are the same But may also be implemented in the form of different logic within the device.

지금까지 도 1을 참조하여 본 발명의 일 실시예에 따른 이상 거래 탐지 시스템에 대하여 설명하였다. 다음으로, 본 발명의 실시예에 따른 이상 거래 탐지 에 대하여 도 2 내지 도 3을 참조하여 설명한다.The abnormal transaction detection system according to an embodiment of the present invention has been described with reference to FIG. Next, abnormal transaction detection according to an embodiment of the present invention will be described with reference to FIG. 2 to FIG.

먼저, 도 2a는 본 발명의 다른 실시예에 따른 이상 거래 탐지 장치(100)의 기능 블록도이다.2A is a functional block diagram of an abnormal transaction detection apparatus 100 according to another embodiment of the present invention.

도 2a를 참조하면, 이상 거래 탐지 장치(100)는 데이터 획득부(110), 특징 선택부(120), 클러스터링부(130) 및 이상 거래 탐지부(140)를 포함할 수 있다. 다만, 도 2a에는 본 발명의 실시예와 관련있는 구성요소들만이 도시되어 있다. 따라서, 본 발명이 속한 기술분야의 통상의 기술자라면 도 2a에 도시된 구성요소들 외에 다른 범용적인 구성 요소들이 더 포함될 수 있음을 알 수 있다.2A, the abnormal transaction detection apparatus 100 may include a data acquisition unit 110, a feature selection unit 120, a clustering unit 130, and an abnormal transaction detection unit 140. However, only the components associated with the embodiment of the present invention are shown in Fig. Accordingly, those skilled in the art will appreciate that other general-purpose components other than the components shown in FIG. 2A may be further included.

각 구성 요소를 살펴보면, 데이터 획득부(110)는 이상 거래 탐지 모델의 구축에 이용되는 훈련 데이터셋(250), 이상 거래 탐지의 대상이 되는 새로운 결제 데이터(260, 이하 '탐지 대상 결제 데이터') 등을 획득한다. 예를 들어, 데이터 획득부(110)는 기 구축된 데이터베이스로부터 결제 데이터를 획득하거나, 전자 결제 서버(200) 또는 복수의 사용자 단말(300)로부터 탐지 대상 결제 데이터(260)를 수신할 수 있다.The data acquiring unit 110 acquires training data set 250 used for constructing the abnormal transaction detection model, new settlement data 260 (hereinafter referred to as 'settlement target settlement data') to be subjected to abnormal transaction detection, And so on. For example, the data acquisition unit 110 may acquire settlement data from the pre-established database, or receive the settlement target settlement data 260 from the electronic settlement server 200 or a plurality of user terminals 300. [

훈련 데이터셋(250)에 대하여 도 2b를 참조하여 간략하게 설명하면, 훈련 데이터셋(250)은 정상 거래 또는 이상 거래를 가리키는 분류 결과(251a)가 포함된 결제 데이터(251, 이하 '제1 결제 데이터')와 상기 분류 결과가 포함되지 않은 결제 데이터(252, 이하 '제2 결제 데이터')를 포함할 수 있다.Referring to FIG. 2B, the training data set 250 includes settlement data 251 including a classification result 251a indicating a normal transaction or an abnormal transaction, Data ') and payment data 252 (hereinafter referred to as' second payment data') that does not include the classification result.

제1 결제 데이터 및 제2 결제 데이터는 각각 복수의 특징으로 구성될 수 있다. 예를 들어, 제1 결제 데이터 및 제2 결제 데이터는 결제 금액, 결제 일시, 결제 IP, 결제 물품 등의 특징을 포함할 수 있으나, 이에 국한된 것은 아니다. 결제 데이터를 구성하는 복수의 특징에 대한 예시는 도 7a를 참조하도록 한다. 참고로, 상기 특징은 결제 데이터의 구성 요소로 당해 기술 분야에서 필드(field), 속성(attribute) 등의 용어와 혼용되어 사용될 수 있으나 동일한 의미를 지칭함에 유의한다.The first payment data and the second payment data may each be composed of a plurality of features. For example, the first payment data and the second payment data may include, but are not limited to, a payment amount, a payment date and time, a payment IP, and a payment item. An example of a plurality of features constituting payment data is shown in Fig. 7A. Note that the above feature is a component of the payment data and can be used in combination with the terms such as field, attribute, and the like in the technical field, but it is noted that the same meaning is used.

다음으로, 특징 선택부(120)는 불필요하게 다수의 특징에 대하여 기계 학습이 수행되는 경우 야기되는 다양한 부작용을 방지하기 위하여 상기 제1 결제 데이터의 복수의 특징들 중 분별력이 강한 핵심 특징을 결정할 수 있다. 상기 부작용은, 예를 들어, 다차원(=복수의 특징)으로 구성된 데이터를 학습함에 따라 발생하는 차원의 저주(curse of dimension) 문제일 수 있다. 이 때, 상기 핵심 특징에 한하여 기계 학습을 수행함으로써, 훈련 데이터의 차원이 증가함에 따라 발생하는 학습 데이터 양의 증가, 학습 시간 증가, 과적합(over-fitting), 성능 저하 등의 문제를 방지할 수 있다.Next, the feature selecting unit 120 may determine a key feature having strong discriminating power among a plurality of features of the first payment data to prevent various side effects caused when machine learning is unnecessarily performed on a plurality of features have. The side effect may be, for example, a curse of dimensionality problem that occurs as a result of learning data composed of multidimensional (= multiple features). At this time, by performing machine learning only for the above-mentioned core features, problems such as an increase in the amount of learning data, an increase in learning time, over-fitting, and performance degradation caused by an increase in the dimension of training data can be prevented .

또한, 특징 선택부(120)는 분류 결과가 포함되지 않은 탐지 대상 결제 데이터(260)가 획득된 경우, 탐지 대상 결제 데이터(260)에서 기 선택된 핵심 특징을 추출한다.In addition, the feature selecting unit 120 extracts the pre-selected key features from the to-be-detected billing data 260 when the to-be-detected billing data 260 that does not include the classification result is obtained.

일 실시예에서, 특징 선택부(120)는 제1 결제 데이터를 구성하는 복수의 특징 중에서 핵심 특징을 선택하기 위해 당해 기술 분야에서 잘 알려진 필터 방법(filter method) 기반의 특징 선택(feature selection) 알고리즘을 이용할 수 있다.In one embodiment, the feature selector 120 selects a feature selection algorithm based on a filter method that is well known in the art to select a key feature from among a plurality of features that constitute the first payment data Can be used.

또는 일 실시예에서, 특징 선택부(120)는 제1 결제 데이터를 구성하는 복수의 특징 중에서 핵심 특징을 선택하기 위해 주성분 분석(Principal Component Analysis; PCA)과 같은 차원 축소 알고리즘을 이용할 수 있다.Alternatively, in one embodiment, the feature selector 120 may use a dimension reduction algorithm, such as Principal Component Analysis (PCA), to select key features from a plurality of features that constitute the first payment data.

또는 일 실시예에서, 특징 선택부(120)는 제1 결제 데이터를 구성하는 복수의 특징 중에서 핵심 특징을 선택하기 위해, 주성분 분석과 특징 선택 알고리즘을 조합하여 핵심 특징을 추출할 수 있다. 특징 선택부(120)가 핵심 특징을 선택하는 방법에 대한 자세한 설명은 도 5 내지 도 8을 참조하여 후술하기로 한다.Alternatively, in one embodiment, the feature selector 120 may extract key features by combining key component analysis and feature selection algorithms to select key features from a plurality of features that constitute the first payment data. A detailed description of how the feature selecting unit 120 selects the key features will be described later with reference to FIGS. 5 to 8. FIG.

다음으로, 클러스터링부(130)는 제2 결제 데이터에 대하여 비지도 학습 방식의 클러스터링을 수행하여 이상 거래 탐지 모델을 구축한다. 보다 자세하게는, 제2 결제 데이터를 구성하는 복수의 특징 중 핵심 특징에 대하여 비지도 학습 방식의 클러스터링을 수행하고, 클러스터링 수행 결과를 이용하여 이상 거래 탐지 모델을 구축한다. 여기서, 상기 비지도 학습 방식의 클러스터링은 당해 기술 분야에서 널리 알려진 어느 하나의 비지도 학습 알고리즘이 이용될 수 있다. 예를 들어, k-평균 클러스터링(k-means clustering) 알고리즘, x-평균 클러스터링(x-means clustering), farthest-first 클러스터링 알고리즘 등이 이용될 수 있으나 이에 국한되는 것은 아니다. 클러스터링부(130)가 클러스터링을 통해 이상 거래 탐지모델을 구축하는 방법에 대한 자세한 사항은 도 4, 도 9 내지 도 12를 참조하여 후술하기로 한다.Next, the clustering unit 130 performs clustering of the second settlement data using a non-edge learning method to construct an abnormal transaction detection model. More specifically, clustering of non-index learning methods is performed on a core feature among a plurality of features constituting the second settlement data, and an abnormal transaction detection model is constructed using the clustering execution result. Here, clustering of the non-background learning method may be performed using any non-background learning algorithm well known in the art. For example, a k-means clustering algorithm, an x-means clustering algorithm, a farthest-first clustering algorithm, and the like may be used, but are not limited thereto. Details of how the clustering unit 130 constructs the abnormal transaction detection model through clustering will be described later with reference to FIGS. 4, 9 to 12.

참고로, 상기 이상 거래 탐지 모델은 비지도 학습 방식의 클러스터링 알고리즘을 통해 생성된 클러스터(cluster)를 이용하여, 핵심 특징이 입력된 경우 정상 거래 또는 이상 거래에 대한 분류 결과를 출력하는 모델로 이해될 수 있다.For reference, the abnormal transaction detection model can be understood as a model for outputting classification results for a normal transaction or an abnormal transaction when a core feature is input using a cluster generated through a clustering algorithm of a non-background learning method .

이상 거래 판단부(140)는 탐지 대상 결제 데이터(260)에 대한 핵심 특징을 기 구축된 이상 거래 탐지 모델에 입력하고, 이상 거래 탐지 모델이 출력하는 분류 결과를 이용하여 상기 탐지 대상 결제 데이터가 가리키는 전자 결제가 이상 거래에 해당되는지 여부를 판단한다. 이상 거래 판단부(140)가 이상 거래 여부를 판단하는 방법에 대한 자세한 사항은 도 13을 참조하여 후술하기로 한다.The abnormal transaction determination unit 140 inputs a key characteristic of the detection subject payment data 260 to the established abnormal transaction detection model, It is determined whether the electronic payment corresponds to the abnormal transaction. The details of how the abnormal transaction determiner 140 determines abnormal transaction will be described later with reference to FIG.

도 2a의 각 구성 요소는 소프트웨어(Software) 또는, FPGA(Field Programmable Gate Array)나 ASIC(Application-Specific Integrated Circuit)과 같은 하드웨어(Hardware)를 의미할 수 있다. 그렇지만, 상기 구성 요소들은 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, 어드레싱(Addressing)할 수 있는 저장 매체에 있도록 구성될 수도 있고, 하나 또는 그 이상의 프로세서들을 실행시키도록 구성될 수도 있다. 상기 구성 요소들 안에서 제공되는 기능은 더 세분화된 구성 요소에 의하여 구현될 수 있으며, 복수의 구성 요소들을 합하여 특정한 기능을 수행하는 하나의 구성 요소로 구현될 수도 있다.2A may refer to software or hardware such as an FPGA (Field Programmable Gate Array) or an ASIC (Application-Specific Integrated Circuit). However, the components are not limited to software or hardware, and may be configured to be addressable storage media, and configured to execute one or more processors. The functions provided in the components may be implemented by a more detailed component, or may be implemented by a single component that performs a specific function by combining a plurality of components.

다음으로, 도 3을 참조하여, 본 발명의 또 다른 실시예에 따른 이상 거래 탐지 장치(100)의 구성 및 동작을 설명한다.Next, the configuration and operation of the abnormal transaction detection device 100 according to still another embodiment of the present invention will be described with reference to FIG.

도 3을 참조하면, 이상 거래 탐지 장치(100)는 하나 이상의 프로세서(101), 버스(105), 네트워크 인터페이스(107), 프로세서(101)에 의하여 수행되는 컴퓨터 프로그램을 로드(load)하는 메모리(103)와, 이상 거래 탐지 소프트웨어(109a)를 저장하는 스토리지(109)를 포함할 수 있다. 다만, 도 3에는 본 발명의 실시예와 관련 있는 구성요소들만이 도시되어 있다. 따라서, 본 발명이 속한 기술분야의 통상의 기술자라면 도 3에 도시된 구성요소들 외에 다른 범용적인 구성 요소들이 더 포함될 수 있음을 알 수 있다.3, the abnormal transaction detection device 100 includes one or more processors 101, a bus 105, a network interface 107, a memory (not shown) that loads a computer program executed by the processor 101 103, and a storage 109 for storing abnormal transaction detection software 109a. 3, only the components related to the embodiment of the present invention are shown. Accordingly, those skilled in the art will recognize that other general-purpose components may be included in addition to those shown in FIG.

프로세서(101)는 이상 거래 탐지 장치(100)의 각 구성의 전반적인 동작을 제어한다. 프로세서(101)는 CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), GPU(Graphic Processing Unit) 또는 본 발명의 기술 분야에 잘 알려진 임의의 형태의 프로세서를 포함하여 구성될 수 있다. 또한, 프로세서(101)는 본 발명의 실시예들에 따른 방법을 실행하기 위한 적어도 하나의 애플리케이션 또는 프로그램에 대한 연산을 수행할 수 있다. 이상 거래 탐지 장치(100)는 하나 이상의 프로세서를 구비할 수 있다.The processor 101 controls the overall operation of each configuration of the abnormal transaction detection device 100. The processor 101 includes a central processing unit (CPU), a microprocessor unit (MPU), a microcontroller unit (MCU), a graphics processing unit (GPU), or any type of processor well known in the art . The processor 101 may also perform operations on at least one application or program to perform the method according to embodiments of the present invention. The abnormal transaction detection apparatus 100 may include one or more processors.

메모리(103)는 각종 데이터, 명령 및/또는 정보를 저장한다. 메모리(103)는 본 발명의 실시예들에 따른 이상 거래 탐지 방법을 실행하기 위하여 스토리지(109)로부터 하나 이상의 프로그램(109a)을 로드할 수 있다. 도 6에서 메모리(103)의 예시로 RAM이 도시되었다.The memory 103 stores various data, commands and / or information. The memory 103 may load one or more programs 109a from the storage 109 to execute the abnormal transaction detection method according to embodiments of the present invention. RAM is shown as an example of the memory 103 in Fig.

버스(105)는 이상 거래 탐지 장치(100)의 구성 요소 간 통신 기능을 제공한다. 버스(105)는 주소 버스(Address Bus), 데이터 버스(Data Bus) 및 제어 버스(Control Bus) 등 다양한 형태의 버스로 구현될 수 있다.The bus 105 provides the inter-component communication function of the abnormal transaction detection device 100. The bus 105 may be implemented as various types of buses such as an address bus, a data bus, and a control bus.

네트워크 인터페이스(107)는 이상 거래 탐지 장치(100)의 유무선 인터넷 통신을 지원한다. 또한, 네트워크 인터페이스(107)는 인터넷 통신 외의 다양한 통신 방식을 지원할 수도 있다. 이를 위해, 네트워크 인터페이스(107)는 본 발명의 기술 분야에 잘 알려진 통신 모듈을 포함하여 구성될 수 있다.The network interface 107 supports wired / wireless Internet communication of the abnormal transaction detection device 100. In addition, the network interface 107 may support various communication methods other than Internet communication. To this end, the network interface 107 may comprise a communication module well known in the art.

스토리지(109)는 상기 하나 이상의 프로그램(109a)을 비임시적으로 저장할 수 있다. 도 3에서 상기 하나 이상의 프로그램(109a)의 예시로 이상 거래 탐지 소프트웨어(109a)가 도시되었다.The storage 109 may non-temporarily store the one or more programs 109a. In FIG. 3, the abnormal transaction detection software 109a is shown as an example of the one or more programs 109a.

스토리지(109)는 ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리 등과 같은 비휘발성 메모리, 하드 디스크, 착탈형 디스크, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터로 읽을 수 있는 기록 매체를 포함하여 구성될 수 있다.The storage 109 may be a nonvolatile memory such as ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), flash memory, etc., hard disk, removable disk, And any form of computer-readable recording medium known in the art.

이상 거래 탐지 소프트웨어(109a)는 본 발명의 실시예에 따라 비지도 학습 방식의 기계 학습 알고리즘을 이용하여 이상 거래 탐지 모델을 구축하고, 구축된 이상 거래 탐지 모델을 이용하여 탐지 대상 결제 데이터가 이상 거래에 해당되는지를 판단하는 동작을 수행할 수 있다.The abnormal transaction detection software 109a constructs an abnormal transaction detection model using a machine learning algorithm of a non-ambiguity learning method according to an embodiment of the present invention, and uses the established abnormal transaction detection model to determine whether the detection target settlement data is abnormal transaction To determine whether or not it corresponds to the < RTI ID = 0.0 >

구체적으로, 이상 거래 탐지 소프트웨어(109a)는 이상 거래 또는 정상 거래를 가리키는 분류 결과를 포함하는 제1 거래 데이터와 상기 분류 결과를 포함하지 않는 제2 거래 데이터를 획득하되, 상기 제1 거래 데이터 및 상기 제2 거래 데이터 각각은 복수의 특징으로 구성되는 것인, 오퍼레이션, 필터 방법 기반의 특징 선택 알고리즘을 이용하여 상기 제1 거래 데이터를 구성하는 복수의 특징 중에서 핵심 특징을 결정하는 오퍼레이션 및 상기 제2 거래 데이터에 대한 상기 핵심 특징의 데이터에 대하여 비지도 학습(unsupervised learning) 기반의 클러스터링을 수행하고, 상기 클러스터링 결과를 이용하여 이상 거래 탐지 모델을 구축하는 오퍼레이션을 포함할 수 있다.Specifically, the abnormal transaction detection software 109a obtains first transaction data including a classification result indicating an abnormal transaction or a normal transaction and second transaction data that does not include the classification result, An operation for determining a core characteristic among a plurality of features constituting the first transaction data by using an operation and a filter method based feature selection algorithm, each of the second transaction data including a plurality of features, An operation of performing clustering based on unsupervised learning on the data of the core characteristic of the data and constructing an abnormal transaction detection model using the clustering result.

지금까지, 도 2a 내지 도 3을 참조하여 본 발명의 실시예에 따른 이상 거래 탐지 장치(100)의 구성 및 동작에 대하여 설명하였다.Up to now, the configuration and operation of the abnormal transaction detection device 100 according to the embodiment of the present invention have been described with reference to FIGS.

다음으로, 도 4 내지 도 14를 참조하여 본 발명의 또 다른 실시예에 따른 이상 거래 탐지 방법에 대하여 상세하게 설명한다. 이하, 본 실시예에 따른 이상 거래 탐지 방법의 각 단계는 컴퓨팅 장치에 의하여 수행 될 수 있다. 상기 컴퓨팅 장치는, 예를 들어 이상 거래 탐지 장치(100)일 수 있다. 이해의 편의를 위해 이상 거래 탐지 방법에 포함되는 각 동작의 주체는 생략될 수 있다. 참고로, 이상 거래 탐지 방법의 각 단계는 이상 거래 탐지 소프트웨어(109a)가 프로세서(101)에 의해 실행됨으로써, 이상 거래 탐지 장치(100)에서 수행되는 오퍼레이션일 수 있다.Next, the abnormal transaction detection method according to another embodiment of the present invention will be described in detail with reference to FIG. 4 to FIG. Hereinafter, each step of the abnormal transaction detection method according to the present embodiment may be performed by a computing device. The computing device may be, for example, the abnormal transaction detection device 100. For the convenience of understanding, the subject of each operation included in the abnormal transaction detection method may be omitted. Each step of the abnormal transaction detection method may be an operation performed in the abnormal transaction detection device 100 by the abnormal transaction detection software 109a being executed by the processor 101. [

본 발명에 따른 이상 거래 탐지 방법은 기 저장된 결제 데이터를 이용하여 비지도 학습 기반의 이상 거래 탐지 모델을 구축하는 단계와 구축된 이상 거래 탐지 모델을 이용하여 이상 거래 해당 여부를 판단하는 단계를 포함한다. 먼저 도 4 내지 도 12를 참조하여 이상 거래 탐지 모델을 구축하는 단계에 대하여 설명한다.The abnormal transaction detection method according to the present invention includes a step of constructing an abnormal transaction detection model based on non-affinity learning using previously stored settlement data and a step of determining whether an abnormal transaction exists by using the established abnormal transaction detection model . First, the step of constructing the abnormal transaction detection model will be described with reference to FIGS. 4 to 12. FIG.

도 4는 이상 거래 탐지 모델을 구축하는 단계의 상세 순서도이다.4 is a detailed flowchart of a step of constructing an abnormal transaction detection model.

도 4를 참조하면, 정상 거래 또는 이상 거래를 가리키는 분류 결과가 포함된 제1 결제 데이터와 상기 분류 결과가 포함되지 않은 제2 결제 데이터가 획득된다(S100). 예를 들어, 기 저장된 데이터베이스 등으로부터 제1 결제 데이터와 제2 결제 데이터를 포함하는 결제 데이터가 획득될 수 있다. 이 때, 제2 결제 데이터는 제1 결제 데이터에 비해 훨씬 많은 양의 결제 데이터를 포함할 수 있다. 일반적으로 정상 거래 또는 이상 거래를 가리키는 분류 결과가 포함된 결제 데이터는 많지 않기 때문이다.Referring to FIG. 4, first payment data including a classification result indicating a normal transaction or an abnormal transaction and second payment data not including the classification result are obtained (S100). For example, payment data including the first payment data and the second payment data may be obtained from a previously stored database or the like. At this time, the second payment data may include a much larger amount of payment data than the first payment data. In general, there are not many payment data that contain classification results indicating normal transactions or abnormal transactions.

다음으로, 필터 방법 기반의 특징 선택 알고리즘을 이용하여 제1 결제 데이터를 구성하는 복수의 특징 중에서 비지도 학습에 이용될 적어도 하나의 핵심 특징이 결정된다(S120). 핵심 특징을 결정하는 이유는 전술한 바와 같이 다차원의 결제 데이터의 모든 특징이 학습에 이용되는 경우 과적합, 성능 저하 등의 다양한 문제가 야기될 수 있기 때문이다.Next, at least one key feature to be used for non-feature learning is determined among a plurality of features constituting the first payment data using a filter method-based feature selection algorithm (S120). The reason for determining the key feature is that, as described above, when all the features of the multidimensional settlement data are used for learning, various problems such as overloading and performance degradation may be caused.

이해의 편의를 제공하기 위해 특징 선택 알고리즘에 대하여 간략하게 설명하면, 당해 기술 분야에서 특징 선택 알고리즘은 특징 평가 방법 및 시기에 따라 래퍼 방법(wrapper method), 필터 방법(filter method) 및 임베디드 방법(embedded method) 기반의 알고리즘으로 분류될 수 있다.In order to facilitate understanding, the feature selection algorithm will be briefly described. In the related art, the feature selection algorithm is classified into a wrapper method, a filter method, and an embedded method according to the feature evaluation method and timing. method-based algorithms.

래퍼 방법 기반의 특징 선택 알고리즘은 특징의 후보 특징 탐색 후 탐색 된 특징에 대한 학습을 수행한 뒤 학습 성능(e.g. 분류의 정확도)을 기준으로 선택된 특징을 평가하는 알고리즘을 의미한다. 래퍼 방법 기반의 특징 선택 알고리즘은 학습 성능 자체를 평가 기준으로 이용하기 때문에 특징 선택의 정확성은 높을 수 있으나, 모든 특징에 대하여 학습을 수행해야 하기 때문에 특징의 개수가 증가할수록 계산 비용이 크게 증가하여 비실용적일 수 있다는 단점이 있다.The feature selection algorithm based on the wrapper method refers to an algorithm for evaluating a feature selected based on learning performance (e.g., accuracy of classification) after performing learning about a feature to be searched after searching for a feature candidate feature. Since the feature selection algorithm based on the wrapper method uses the learning performance itself as an evaluation criterion, the accuracy of the feature selection can be high. However, since learning is performed for all the features, the calculation cost is greatly increased as the number of features increases. There is a drawback.

임베디드 방법 기반의 특징 선택 알고리즘은 학습과 동시에 특징 선택이 이루어지는 특징 선택 알고리즘을 의미한다. 래퍼 방법 기반의 특징 선택 알고리즘에 비해 계산 비용은 감소될 수 있으나, 학습 성능이 평가 기준으로 이용되므로 계산 비용이 여전히 많이 요구되는 단점이 있다.The feature selection algorithm based on the embedded method is a feature selection algorithm that selects features simultaneously with learning. Compared with the wrapper - based feature selection algorithm, the computation cost can be reduced, but the computation cost is still required because the learning performance is used as the evaluation criterion.

필터 방법 기반의 특징 선택 알고리즘은 후보 특징 탐색 후 탐색 된 특징에 대하여 학습 알고리즘과는 독립적인 데이터 특징에 기반하여 특징의 중요성을 평가하는 알고리즘을 의미한다. 예를 들어, IG(Information Gain), 카이-스퀘어(Chi-Square) 등이 평가 메트릭(metric)으로 이용될 수 있고, 특징 선택을 위해 학습이 요구되지 않으므로 상대적으로 비용이 저렴하다는 장점이 있다. 특징 선택 알고리즘에 대한 더 이상의 자세한 설명은 위키피디아(https://en.wikipedia.org/wiki/Feature_selection)를 참조하도록 한다.The feature selection algorithm based on the filter method means an algorithm that evaluates the importance of features based on data features independent of the learning algorithm for the features searched after the candidate feature search. For example, IG (Information Gain), Chi-Square, etc. can be used as evaluation metrics, and learning is not required for feature selection, which is advantageous in that it is relatively inexpensive. For further details on the feature selection algorithm, please refer to Wikipedia (https://en.wikipedia.org/wiki/Feature_selection).

특징 선택에 요구되는 컴퓨팅 비용을 최소화하기 위해 상술한 특징 선택 알고리즘 중 필터 방법 기반의 특징 선택 알고리즘을 이용하여 핵심 특징이 결정될 수 있다. 단, 구현 방식에 따라 랩퍼 또는 임베디드 방법 기반의 특징 선택 알고리즘이 이용될 수도 있다. 본 단계에 대한 자세한 설명은 이후 도 5 내지 도 9를 참조하여 부연 설명한다.In order to minimize the computing cost required for feature selection, a key feature may be determined using a feature selection algorithm based on a filter method among the feature selection algorithms described above. However, a wrapper or an embedded method based feature selection algorithm may be used depending on the implementation. A detailed description of this step will be further described with reference to Figs. 5 to 9 hereinafter.

단계(S120)에서 특징을 결정한 뒤, 제2 결제 데이터를 구성하는 복수의 특징 중 핵심 특징에 대하여 비지도 학습을 수행됨으로써 이상 거래 탐지 모델이 구축된다(S140). 보다 자세하게는, 상기 비지도 학습 기반의 알고리즘을 이용하여 제2 결제 데이터에 대한 핵심 특징을 클러스터링하고, 정상 거래 및/또는 이상 거래를 가리키는 대표 클러스터를 생성함으로써 이상 거래 탐지 모델이 구축될 수 있다.After the feature is determined in step S120, an abnormal transaction detection model is constructed by performing non-background learning on a core feature among a plurality of features constituting the second payment data (S140). More specifically, the abnormal transaction detection model can be constructed by clustering the core characteristics of the second payment data using the algorithm based on the non-mapping learning, and creating a representative cluster indicating the normal transaction and / or the abnormal transaction.

상기 대표 클러스터는 제1 결제 데이터의 분류 결과를 이용하여 정상 거래를 가리키는 대표 클러스터와 이상 거래를 가리키는 대표 클러스터로 분류될 수 있다. 예를 들어, 이상 거래 탐지 장치(100)는 정상 거래로 분류된 제1 결제 데이터가 포함된 대표 클러스터를 정상 거래를 가리키는 대표 클러스터로 설정하고, 이상 거래로 분류된 제1 결제 데이터가 포함된 대표 클러스터를 정상 거래를 가리키는 대표 클러스터로 설정할 수 있다.The representative clusters can be classified into a representative cluster indicating a normal transaction and a representative cluster indicating an abnormal transaction using the classification result of the first payment data. For example, the abnormal transaction detection device 100 sets a representative cluster including first payment data classified as a normal transaction as a representative cluster indicating a normal transaction, and sets a representative including first payment data classified as an abnormal transaction The cluster can be set as a representative cluster indicating a normal transaction.

또는, 일반적으로 정상 거래를 가리키는 제2 결제 데이터가 많다는 점을 이용하여, 각 대표 클러스터에 포함된 데이터의 밀도를 비교하여 밀도가 높은 클러스터를 정상 거래를 가리키는 대표 클러스터로 설정되고, 밀도가 작은 클러스터를 이상 거래를 가리키는 대표 클러스터로 설정될 수도 있다.Alternatively, the density of the data contained in each representative cluster is compared by using the fact that there are a large number of second payment data indicating a normal transaction, so that a cluster having a high density is set as a representative cluster indicating a normal transaction, May be set as representative clusters indicating abnormal transactions.

이상 거래 탐지 모델을 구축하기 위해서 예를 들어 아래 표 1에 표시된 비지도 학습 알고리즘 중 어느 하나의 알고리즘을 이용될 수 있다. 단, 표 1에 표시된 알고리즘은 본 발명을 구체화하기 위한 일 예에 불과한 뿐이며, 이외에도 당해 기술 분야에서 널리 알려진 비지도 학습 방식의 클러스터링 알고리즘이 이용될 수 있다.In order to construct the abnormal transaction detection model, for example, any one of the non-image learning algorithms shown in Table 1 below can be used. However, the algorithm shown in Table 1 is only one example for embodying the present invention, and besides, a clustering algorithm of a non-edge learning method widely known in the art can be used.

알고리즘algorithm 설명Explanation EM(expectation-maximization) 클러스터링Expectation-maximization (EM) clustering EM 알고리즘은 매개변수에 관한 추정값으로 로그가능도(log likelihood)의 기댓값을 계산하는 기댓값(E) 단계와 이 기댓값을 최대화하는 변수값을 구하는 최대화(M) 단계를 번갈아가면서 클러스터링을 수행하는 알고리즘The EM algorithm is an algorithm that performs clustering by alternating between an expectation (E) step of calculating the expectation of log likelihood as an estimate of the parameter and a maximization (M) step of obtaining a variable value maximizing the expectation value k-평균 클러스터링k-means clustering 포함된 데이터를 k개의 클러스터로 묶는 알고리즘으로, 각 클러스터와 거리 차이의 분산을 최소화하는 방식으로 동작하는 알고리즘An algorithm that groups the included data into k clusters. An algorithm that operates in a way that minimizes the dispersion of the distance difference between each cluster X-평균 클러스터링X-means clustering k-평균 클러스터링 알고리즘의 확장된 형식으로 BIC(Bayesian Information Criterion) 점수에 따라 클러스터 수를 자동으로 결정하고, K-평균 클러스터링을 수행하는 알고리즘An algorithm that automatically determines the number of clusters according to the Bayesian Information Criterion (BIC) score in the extended format of the k-means clustering algorithm and performs K- Farthest-firstFarthest-first K 평균 알고리즘의 변형된 형태로써, 클러스터 중심 선택 단계 및 클러스터 할당 단계로 동작하며, 클러스터 중심 선택 시 존재하는 클러스터 중심에서 가장 먼 거리에 다음 클러스터의 중심을 선택하는 알고리즘K algorithm, which is a variant of the average algorithm, operates as the cluster center selection step and the cluster allocation step, and selects the center of the next cluster at the farthest distance from the center of the cluster existing at the center of cluster selection MakeDensity-Based 클러스터링MakeDensity-Based Clustering 같은 밀도 상에 있는 데이터들과 동일한 영역 내에 존재하는 데이터들을 연결하며 클러스터를 형성하는 알고리즘An algorithm for forming a cluster by connecting data existing in the same area as the data on the same density

지금까지, 도 4를 참조하여 이상 거래 탐지 모델을 구축하는 단계에 대하여 설명하였다. 상술한 바에 따르면, 특징 선택 알고리즘을 이용하여 분별력 있는 핵심 특징을 도출하고, 상기 핵심 특징만을 이용하여 이상 거래 탐지 모델을 구축함으로써 이상 거래 탐지 모델의 학습 속도뿐만 아니라 이상 거래 탐지의 성능 또한 향상시킬 수 있다.Up to now, the step of constructing the abnormal transaction detection model has been described with reference to FIG. According to the above description, it is possible to improve the performance of the abnormal transaction detection as well as the learning speed of the abnormal transaction detection model by deriving discerning core features using the feature selection algorithm and building the abnormal transaction detection model using only the core features have.

다음으로, 도 5 내지 도 9를 참조하여 상술한 핵심 특징 선택 단계(S120)에 대하여 보다 상세하게 설명한다.Next, the core feature selection step (S120) described above with reference to FIGS. 5 to 9 will be described in more detail.

도 5는 핵심 특징 선택 단계(S120)의 상세 순서도이다. 단, 이는 본 발명의 목적을 달성하기 위한 바람직한 실시예일 뿐이며, 필요에 따라 일부 단계가 추가되거나 삭제될 수 있음은 물론이다.5 is a detailed flowchart of the key feature selection step S120. However, it should be understood that the present invention is not limited thereto and that some steps may be added or deleted as needed.

도 5를 참조하면, 필터 방법 기반의 특징 선택 알고리즘을 이용하여 제1 결제 데이터를 구성하는 복수의 특징 중에서 후보 특징을 탐색하고(S122), 탐색된 후보 특징에 대하여 특징 별 스코어를 계산하며(S124), 상기 특징 별 스코어를 기준으로 기 설정된 개수의 핵심 특징이 선택된다(S126).Referring to FIG. 5, a candidate feature is searched among a plurality of features constituting the first payment data by using a feature selection algorithm based on a filter method (S122), and the score for each feature is calculated for the searched candidate feature (S124 , A predetermined number of key features are selected based on the score for each feature (S126).

이해의 편의를 제공하기 위해, 도 6에 도시된 예를 참조하여 보다 자세하게 설명한다. 도 6에서 제1 결제 데이터는 특징 A 내지 특징 Z(410)로 구성되고, 3개의 핵심 특징(470)을 선택한다고 가정한다.In order to facilitate understanding, this will be described in more detail with reference to the example shown in Fig. In Figure 6, it is assumed that the first payment data consists of features A through Z (410) and selects three key features (470).

도 6을 참조하면, 필터 방법 기반의 특징 선택은 탐색 알고리즘(430)을 이용하여 복수의 특징(410) 중에서 후보 특징을 탐색하고, 랭킹 알고리즘(450)을 이용하여 상기 후보 특징에 대한 특징 별 스코어를 계산하는 방식으로 수행된다. 이 때, 탐색 알고리즘(430) 및 랭킹 알고리즘(450)은 도 6에 도시된 바와 같이 당해 기술 분야에서 널리 알려진 알고리즘을 포함할 수 있고, 알고리즘의 변경은 구현 방식의 차이에 불과할 수 있다.Referring to FIG. 6, the feature selection based on the filter method searches a candidate feature among a plurality of features 410 using a search algorithm 430 and uses a ranking algorithm 450 to calculate a feature score Is calculated. At this time, the search algorithm 430 and the ranking algorithm 450 may include algorithms well known in the art, as shown in FIG. 6, and the alteration of the algorithm may be only a difference in implementation.

예를 들어, 이상 거래 탐지 장치(100)는 기 설정된 개수의 핵심 특징이 선택되기 전까지 순차적으로 후보 특징을 선택하거나(sequential search), 임의의 특징을 후보 특징으로 선택하는(random search) 방식으로 탐색을 수행할 수 있다. 또는, 이상 거래 탐지 장치(100)는 모든 특징을 후보 특징으로 선택(exhaustive search)하거나, 유전자 알고리즘을 이용하여 후보 특징이 선택할 수 있고(genetic search), 다양한 종류의 그리디(greedy) 알고리즘을 이용하여 후보 특징을 선택할 수도 있다(best first search, greedy stepwise). 각 탐색 알고리즘(430)은 당해 기술 분야에서 널리 알려진 알고리즘이므로 이에 대한 자세한 설명은 생략한다.For example, the abnormal transaction detection apparatus 100 sequentially selects candidate features (sequential search) until a predetermined number of core features are selected, or randomly selects a feature as a candidate feature Can be performed. Alternatively, the abnormal transaction detection device 100 may perform an exhaustive search of all the features, genetic search using the genetic algorithm, and various types of greedy algorithms (Best first search, greedy stepwise). Since each search algorithm 430 is a well-known algorithm in the related art, detailed description thereof will be omitted.

탐색 과정을 통해 후보 특징이 선택되면, 선택된 후보 특징에 대하여 기 설정된 랭킹 알고리즘(450)의 평가 메트릭에 따라 후보 특징 각각에 대한 특징 별 스코어가 계산된다. 여기서, 특징 별 스코어는 각 후보 특징이 정상 거래 또는 이상 거래를 분별하는데 미치는 영향을 정량화한 값으로 이해될 수 있다.When the candidate feature is selected through the search process, a feature-specific score for each candidate feature is calculated according to the evaluation metric of the predetermined ranking algorithm 450 for the selected candidate feature. Here, the score for each feature can be understood as a value obtained by quantifying the influence of each candidate feature on discrimination of a normal transaction or an abnormal transaction.

예를 들어, 카이-스퀘어(chi-square)를 평가 메트릭으로 특징 별 스코어를 산출하는 랭킹 알고리즘(chi-square attribute evaluation), IG(Information Gain)를 평가 메트릭으로 특징 별 스코어를 산출하는 랭킹 알고리즘(Information Gain attribute evaluation), RF(Relief-F)를 평가 메트릭으로 특징 별 스코어를 산출하는 랭킹 알고리즘(Relief-F attribute evaluation) 등 랭킹 알고리즘(450) 중에서 적어도 하나의 랭킹 알고리즘이 이용될 수 있다. 각 랭킹 알고리즘(450) 또한 당해 기술 분야에서 널리 알려진 알고리즘이므로 이에 대한 자세한 설명도 생략하기로 한다.For example, a ranking algorithm (chi-square attribute evaluation) for calculating a characteristic score using a chi-square as an evaluation metric, a ranking algorithm At least one ranking algorithm may be used from among ranking algorithms 450 such as information gain attribute evaluation (RF) and Relief-F (attribute-F) evaluation that calculates a score for each feature using an evaluation metric. Since each ranking algorithm 450 is also a well-known algorithm in the related art, detailed description thereof will be omitted.

후보 특징에 대하여 특징 별 스코어가 계산되면, 특징 별 스코어가 상대적으로 높은 3개의 특징이 핵심 특징(470)으로 선택된다. 예를 들어, IG를 기준으로 특징 별 스코어가 계산되는 경우, IG가 높은 3개의 특징이 핵심 특징으로 선택되고, 카이-스퀘어, IG 및 RF 등 복수의 평가 메트릭이 이용되는 경우 각 평가 메트릭을 기준으로 계산된 특징 별 스코어의 산술 평균 또는 가중 평균 값을 이용하여 핵심 특징이 선택될 수 있다. 단, 구현 방식에 따라 특징 별 스코어가 기 설정된 값 이상인 복수의 핵심 특징이 선택될 수도 있다.When the score for each feature is calculated for the candidate feature, three features with relatively high scores are selected as the key feature (470). For example, when feature scores are calculated on the basis of IG, three features with high IGs are selected as key features, and when a plurality of evaluation metrics such as Kai-square, IG, and RF are used, The key features can be selected using the arithmetic mean or the weighted average of the score for each feature computed as. However, a plurality of key features whose score is more than a preset value may be selected according to the implementation method.

다음으로, 보다 이해의 편의를 제공하기 위해 도 7a 내지 도 7d를 참조하여 실제 모바일 결제 데이터에서 핵심 특징을 선택하는 예에 대하여 설명한다.Next, an example of selecting key features from actual mobile payment data will be described with reference to FIGS. 7A to 7D to further facilitate understanding.

도 7a는 모바일 결제 데이터를 구성하는 복수의 특징을 도시한다. 도 7a를 참조하면, 모바일 결제 데이터는 거래 형태, 인증 시간, 거래 금액 등의 다수의 특징을 포함한다. 따라서, 차원의 저주에 따른 다양한 문제를 방지하기 위해 정상 거래 또는 이상 거래를 분별하는데 가장 큰 영향을 미치는 핵심 특징을 선별할 필요가 있다.7A shows a plurality of features that make up mobile payment data. Referring to FIG. 7A, mobile payment data includes a number of features such as transaction type, authentication time, transaction amount, and the like. Therefore, in order to prevent various problems caused by the curse of the dimension, it is necessary to select the key features that have the greatest influence on distinguishing the normal transaction or the abnormal transaction.

도 7b는 모바일 결제 데이터에서 고유의 값으로 특징 값이 자동 결정되는 특징 또는 1개의 값만 갖는 특징을 제외한 특징들을 도시한다. 실제 구현 예에서, 고유의 값으로 값이 자동 결정되거나 1개의 값만 갖는 특징의 경우 정상 거래 또는 이상 거래 분별에 영향을 미치지 않기 때문에 제외되는 것으로 이해될 수 있다. 참고로, 제외되는 특징은 기 설정되어 있을 수 있다.FIG. 7B shows features excluding features having only a single value or a feature in which a feature value is automatically determined to a unique value in mobile payment data. In an actual implementation, it can be understood that a feature that is automatically determined as a unique value or a feature having only one value is excluded because it does not affect normal transaction or abnormal transaction discrimination. For reference, the excluded features may be preset.

다음으로, 도 7c는 각 특징이 갖는 값을 수치 형식으로 변환하는 전처리 과정을 예시한다. 실제 구현 예에서는 도 7c에 도시된 바와 같이 형식 변환과 같은 전처리 과정이 수행될 수 있다. 도 7c에서 거래 금액 등의 특징은 수치 값의 형식으로 입력되므로 별도의 변환 과정이 필요하지 않으나, 통신사와 같이 문자열 형식의 값을 갖는 특징은 적절한 수치 값으로 변환될 필요가 있다. 문자열을 수치로 변환하는 방법은 구현 방식에 따라 다양할 수 있으며, 어떠한 방식이 이용되어도 무방하다.Next, FIG. 7C illustrates a preprocessing process of converting the values of each feature into a numerical format. In an actual implementation, a preprocessing process such as format conversion can be performed as shown in FIG. 7C. In FIG. 7C, since the characteristic of the transaction amount is input in the form of a numerical value, a separate conversion process is not required. However, a characteristic having a character string value like a communication company needs to be converted into an appropriate numerical value. The method of converting a string to a number may vary according to the implementation, and any method may be used.

마지막으로, 도 7d는 모든 특징에 대하여 후보 특징으로 탐색하고(exhaustive search), 탐색된 후보 특징에 대하여 특징 별 스코어를 계산하고 핵심 특징이 선택된 예를 도시한다. 도 7d에서 특징 별 스코어를 계산하기 위해서 One-R, IG, SU(Symmetrical Uncertainty), 카이-스퀘어가 평가 메트릭으로 이용되었다.Finally, FIG. 7d shows an exhaustive search for all features, calculates feature scores for the searched candidate features, and shows an example in which key features are selected. One-R, IG, SU (Symmetrical Uncertainty), and K-Square were used as evaluation metrics in Fig. 7D to calculate feature scores.

도 7d를 참조하면, IG, SU 및 카이-스퀘어를 평가 메트릭으로 특징 별 스코어를 계산한 결과 통신사(COMM_ID), 거래 금액(PRDT_PRICE), 서비스 아이디(SCV_ID)가 높은 스코어를 기록하였고, One-R을 평가 메트릭으로 특징 별 스코어를 계산한 결과 인증 클라이언트 버전(A_VER) 및 서비스 아이디(SCV_ID)를 제외한 나머지 후보 특징의 특징 별 스코어가 높은 점수로 계산된 것을 알 수 있다. 각 평가 메트릭을 기준으로 산출된 특징 별 스코어를 산술 평균하는 경우, 통신사(COMM_ID), 거래 금액(PRDT_PRICE), 서비스 아이디(SCV_ID)의 스코어가 가장 높으므로 통신사(COMM_ID), 거래 금액(PRDT_PRICE), 서비스 아이디(SCV_ID)가 핵심 특징으로 선출될 수 있다. 단, 구현 방식에 따라 특징 별 스코어를 가중 평균(weighted average) 하거나, 특징 별 스코어가 아니라 특징 별 스코어를 기준으로 결정된 우선 순위를 이용하여 핵심 특징이 선정될 수도 있다.7D, when a characteristic score is calculated using IG, SU, and Kai-square as evaluation metrics, a score of a communication company (COMM_ID), a transaction amount (PRDT_PRICE), and a service ID (SCV_ID) As a result of calculating feature scores by the evaluation metric, it can be seen that the scores of the features of the candidate features other than the authentication client version (A_VER) and the service ID (SCV_ID) are calculated with high scores. When the arithmetic average of scores calculated based on each evaluation metric is arithmetically averaged, since the scores of the communication company (COMM_ID), the transaction amount PRDT_PRICE and the service ID SCV_ID are the highest, the communication company COMM_ID, the transaction amount PRDT_PRICE, The service ID (SCV_ID) can be selected as a key feature. However, the key features may be selected using a weighted average of the scores according to the implementation method, or a priority determined based on the scores of the features rather than the scores according to the features.

한편, 일 실시예에서, 기계 학습에 이용되는 특징의 개수를 줄이기 위해 차원 축소 기법이 이용될 수도 있다. 예를 들어, 제2 결제 데이터를 구성하는 복수의 특징의 개수가 m개인 경우(m 차원), 주성분 분석을 수행하여 상기 m개의 특징으로부터 n개의 핵심 특징(n 차원)이 추출될 수 있다. 이때, n 은 m보다 작은 양수의 값을 가질 수 있다. 이후, n개의 핵심 특징에 대하여 비지도 학습 방식의 클러스터링을 수행하여 이상 거래 탐지 모델이 구축될 수 있다.On the other hand, in one embodiment, a dimension reduction technique may be used to reduce the number of features used in machine learning. For example, when the number of the plurality of features constituting the second payment data is m (m dimension), n key features (n dimension) can be extracted from the m features by performing principal component analysis. At this time, n may have a positive value smaller than m. Then, the abnormal transaction detection model can be constructed by performing clustering of the nondegradability learning method on n key features.

또한, 일 실시예에서, 주성분 분석 등의 차원 축소 기법과 특징 선택 알고리즘의 조합을 통해 핵심 특징이 결정될 수도 있다. 예를 들어, 상술한 핵심 특징 선택 단계(S120)를 통해 m개의 제1 핵심 특징을 선택하고, 핵심 특징의 개수를 더 줄이기 위해 주성분 분석을 수행하여 n개의 제2 핵심 특징을 추출될 수도 있다. 또는, 이와 반대로 주성분 분석을 먼저 수행하여 m개의 특징으로부터 n개의 제1 핵심 특징을 추출하고, 상기 n개의 제1 핵심 특징에 대해 핵심 특징 선택 단계(S120)를 통해 기 설정된 개수의 제2 핵심 특징이 선택될 수도 있다. 이때, n은 m보다 작은 양수의 값을 갖게 된다.Also, in one embodiment, core features may be determined through a combination of dimension reduction techniques such as principal component analysis and feature selection algorithms. For example, n first key features may be selected through the above-described key feature selection step S120, and n second key features may be extracted by performing principal component analysis to further reduce the number of key features. Alternatively, principal component analysis may be performed first to extract n first key features from m features, and a predetermined number of second key features may be selected through a key feature selection step (S120) for the n first key features May be selected. At this time, n has a positive value smaller than m.

지금까지, 도 5 내지 도 7을 참조하여 핵심 특징 선택 단계(S120)에 대하여 상세하게 설명하였다. 상술한 바에 따르면, 결제 데이터를 구성하는 복수의 특징 중에서 분별력이 높은 핵심 특징만을 선택하여 학습을 수행함으로써 학습 시간 및 학습에 요구되는 컴퓨팅 비용을 감소시키고, 학습에 따른 이상 거래 탐지의 성능은 향상될 수 있다.Up to now, the key feature selection step S120 has been described in detail with reference to FIGS. 5 to 7. FIG. According to the above description, learning is performed by selecting only the key features having high discriminating power among the plurality of features constituting the settlement data, thereby reducing the learning time and the computing cost required for the learning, and the performance of abnormal transaction detection by learning is improved .

다음으로, 핵심 특징 선택의 정확도를 향상시키기 위한 본 발명의 실시예에 대하여 설명한다.Next, an embodiment of the present invention for improving the accuracy of key feature selection will be described.

핵심 특징의 선택은 이상 거래 탐지 모델의 성능(e.g. 정확도)을 결정하는 중요한 단계이다. 따라서, 본 발명의 실시예에 따르면, 제1 결제 데이터를 서로 다른 비율로 샘플링(sampling)하고, 샘플링 된 데이터 각각에 대하여 특징 별 스코어를 산출한 뒤, 특징 별 스코어의 평균 값 등을 이용하여 핵심 특징이 선택될 수 있다.The selection of key features is an important step in determining the performance (eg accuracy) of the abnormal transaction detection model. Therefore, according to the embodiment of the present invention, the first payment data is sampled at different ratios, the score for each feature is calculated for each sampled data, and then the score Feature can be selected.

이하, 도 8a 및 도 8b를 참조하여 본 실시예에 대하여 상세하게 설명한다.Hereinafter, this embodiment will be described in detail with reference to Figs. 8A and 8B.

도 8a 및 도 8b를 참조하면, 제1 결제 데이터에서 정상 및 이상 거래를 가리키는 결제 데이터가 기 설정된 비율이 되도록 샘플링된다(S121). 예를 들어, 정상 거래 및 이상 거래를 가리키는 데이터의 비율이 제1 비율(e.g. 50:50)이 되도록 제1 결제 데이터를 샘플링될 수 있다. 샘플링하는 방법은 구현 방식에 따라 달라질 수 있으며, 어떠한 방법이 이용되어도 무방하다.8A and 8B, in the first settlement data, the settlement data indicating normal and abnormal transactions are sampled at a predetermined ratio (S121). For example, the first payment data may be sampled such that the ratio of data indicating normal transactions and abnormal transactions is a first rate (e.g., 50:50). The sampling method may vary depending on the implementation method, and any method may be used.

다음으로, 정상 거래 및 이상 거래를 가리키는 데이터가 제1 비율이 되도록 샘플링 된 데이터(이하 '제1 샘플링 데이터')에 대하여 후보 특징을 탐색하고, 상기 후보 특징에 대하여 특징 별 스코어가 계산된다(S122, S124). 단계(S122, S124)에 대한 설명은 중복된 설명을 배제하기 위해 생략한다.Next, a candidate feature is searched for data (hereinafter, referred to as 'first sampling data') in which data indicating a normal transaction and an abnormal transaction is a first rate, and a score for each feature is calculated for each candidate feature (S122 , S124). The description of steps S122 and S124 will be omitted in order to avoid redundant description.

다음으로, 추가 샘플링이 필요한 경우(S127), 제1 결제 데이터에서 정상 및 이상 거래를 가리키는 데이터가 제1 비율과는 다른 제2 비율이 되도록 다시 샘플링이 수행될 수 있다(S121). 제2 비율로 샘플링 된 데이터(이하 '제2 샘플링 데이터')에 대해서도 특징 별 스코어가 동일한 방식으로 계산된다(S122, S124).Next, if additional sampling is required (S127), the data may be resampled so that the data indicating normal and abnormal transactions in the first payment data have a second rate different from the first rate (S121). The characteristic-specific score is also calculated for the data sampled at the second rate (hereinafter referred to as "second sampling data") (S122, S124).

여기서, 추가 샘플링 필요 여부는 예를 들어 샘플링 횟수가 기 설정되어 있을 수 있다. 또한, 상기 제1 비율 및 상기 제2 비율은 기 설정되어 있거나 임의의 비율로 자동으로 결정되는 등 다양한 방식으로 구현될 수 있다. 예를 들어, 도 8b에 도시된 바와 같이 기 설정된 샘플링 횟수가 3회인 경우, 정상 거래가 1823개이고, 이상 거래가 177개로 구성된 제1 결제 데이터에서 50:50, 70:30. 90:10 등의 서로 다른 비율로 3번의 샘플링이 수행될 수 있다.Here, the necessity of additional sampling may be set, for example, in the number of times of sampling. In addition, the first ratio and the second ratio may be implemented in various ways, such as being preset or automatically determined at an arbitrary ratio. For example, as shown in FIG. 8B, when the predetermined number of sampling times is 3, the number of normal transactions is 1823, and the first payment data composed of 177 abnormal transactions is 50:50, 70:30. Three times at different ratios such as 90:10.

추가 샘플링이 필요하지 않은 경우, 샘플링된 데이터에 각각에 대하여 특징 별 스코어의 산술 평균, 가중 평균 등의 평균 값이 계산되고, 상기 평균 값 기준으로 핵심 특징이 선택된다(S128). 예를 들어, 상기 제1 샘플링 데이터에 대한 특징 별 스코어와 상기 제2 샘플링 데이터에 대한 특징 별 스코어의 특징 별 평균 값 기준으로 기 설정된 개수의 핵심 특징이 선택될 수 있다.If additional sampling is not required, average values such as arithmetic mean and weighted average of feature scores are calculated for each of the sampled data, and a key feature is selected based on the average value (S128). For example, a predetermined number of key features may be selected based on the feature-based score for the first sampling data and the feature-based average value of the feature-specific score for the second sampling data.

지금까지, 도 8a 내지 도 8b를 참조하여 핵심 특징 선택의 정확도를 향상시키기 위한 본 발명의 실시예에 대하여 설명하였다. 정리하면, 정상 거래 또는 이상 거래를 나타내는 데이터의 비율을 달리하여 반복하여 샘플링을 수행하고, 샘플링 된 데이터 별로 특징 별 스코어를 산출한 뒤 산출된 특징 별 스코어의 평균 값을 기초로 핵심 특징을 선정함으로써 분별력이 높은 핵심 특징을 정확하게 선택될 수 있다. 또한, 이에 따라 이상 거래 탐지 모델의 성능이 더욱 향상될 수 있다.Up to now, embodiments of the present invention for improving the accuracy of key feature selection have been described with reference to FIGS. 8A to 8B. In summary, sampling is repeated with different ratios of data indicating normal transactions or abnormal transactions, and the score for each characteristic is calculated for each sampled data, and a core characteristic is selected based on the average value of the calculated score for each characteristic Highly discerning key features can be accurately selected. Also, the performance of the abnormal transaction detection model can be further improved.

다음으로, 도 9 내지 도 12를 참조하여 복수의 후보 이상 거래 탐지 모델을 구축하고 모델 검증을 통해 어느 하나의 이상 거래 탐지 모델을 결정하는 본 발명의 실시예에 대하여 설명한다.Next, an embodiment of the present invention for constructing a plurality of candidate abnormal transaction detection models and determining one abnormal transaction detection model through model verification will be described with reference to FIGS. 9 to 12. FIG.

상술한 바와 같이 이상 거래 탐지 모델을 구축하기 위해, EM, farthest-first 등 다양한 종류의 클러스터링 알고리즘을 이용될 수 있다. 클러스터링 알고리즘에 따라 이상 거래 탐지 모델의 정확도는 달라질 수 있기 때문에, 각 비지도 학습 방식의 클러스터링 알고리즘을 이용하여 복수의 후보 이상 거래 탐지 모델을 구축한 뒤 모델 검증을 통해 실제 이상 거래 탐지에 이용될 적어도 하나의 이상 거래 탐지 모델이 결정될 수 있다.As described above, various types of clustering algorithms such as EM and farthest-first can be used to construct an abnormal transaction detection model. Since the accuracy of the abnormal transaction detection model can be changed according to the clustering algorithm, a plurality of candidate abnormal transaction detection models are constructed by using the clustering algorithm of each non-localization learning method, and at least One abnormal transaction detection model can be determined.

도 9에 도시된 순서도를 참조하여 보다 구체적으로 설명하면, 제2 결제 데이터의 핵심 특징에 대하여 복수의 비지도 학습 방식의 클러스터링 알고리즘을 이용하여 복수의 후보 이상 거래 탐지 모델이 구축된다(S142). 예를 들어, 이상 거래 탐지 장치(100)는 EM 클러스터링 알고리즘을 이용하여 제1 후보 이상 거래 탐지 모델을 구축하고, farthest-first 클러스터링 알고리즘을 이용하여 제2 후보 이상 거래 탐지 모델을 구축할 수 있다.More specifically, referring to the flowchart shown in FIG. 9, a plurality of candidate abnormal transaction detection models are constructed using a clustering algorithm of a plurality of non-edge learning methods with respect to the key features of the second payment data (S142). For example, the abnormal transaction detection apparatus 100 may construct a first candidate abnormal transaction detection model using an EM clustering algorithm, and construct a second candidate abnormal transaction detection model using a farthest-first clustering algorithm.

다음으로, 기 설정된 검증 메트릭 기준으로 복수의 후보 이상 거래 탐지 모델에 대한 검증이 수행된다(S144). 구체적으로, 제1 결제 데이터에 포함된 분류 결과와 각 후보 이상 거래 탐지 모델에 의해 출력된 분류 결과를 비교하는 방식으로 검증이 수행될 수 있다.Next, verification for a plurality of candidate abnormal transaction detection models is performed based on a predetermined verification metric (S144). More specifically, the verification can be performed in such a manner that the classification result included in the first payment data is compared with the classification result output by each candidate or higher transaction detection model.

여기서 상기 검증 메트릭은 예를 들어 정확도(precision), 재현율(recall) 또는 F-measure(e.g. F₁-measure) 중 적어도 하나의 메트릭이 이용될 수 있다. 일 실시예에서, 이상 거래임에도 이를 탐지하지 못하고 정상 거래로 판단하는 FN(False Negative) 오류를 줄이는 것이 중요한 경우, 재현율 또는 F_0.5-measure가 검증 메트릭으로 이용될 수 있다.Here, the verification metric may be at least one of a precision, a recall, or an F-measure (e.g., F ₁ -measure). In one embodiment, if it is important to reduce false negatives (FN) errors that can not be detected even though the transaction is abnormal and it is determined that the transaction is normal, the recall rate or F _0.5 -measure may be used as the verification metric.

또는, 일 실시예에서, 정상 거래임에도 이상 거래로 판단하여 결제를 차단하는 FP(False Positive) 오류를 줄이는 것이 중요한 경우, 정확도 또는 F₂-measure를 검증 메트릭으로 이용될 수 있다.Alternatively, in one embodiment, accuracy or F ₂ -measure may be used as a verification metric if it is important to reduce FP (false positive) errors that are determined to be abnormal transactions and block settlement even during normal trading.

또는, 일 실시예에서, FP 및 FN 오류를 함께 줄이는 것이 중요한 경우 F₁-measure가 검증 메트릭으로 이용될 수도 있다.Or, in one embodiment, the F ₁ -measure may be used as a verification metric if it is important to reduce the FP and FN errors together.

정밀도, 재현율 또는 F-measure를 구하는 수식은 아래 수학식 1 내지 수학식 4 및 도 10에 도시된 혼동 매트릭스(confusion matrix)를 참조한다. 상기 검증 메트릭은 당해 기술 분야에서 널리 알려진 바 더 이상의 자세한 설명은 생략한다.Precision, recall rate, or F-measure, refer to the confusion matrix shown in Equations (1) to (4) and (10) below. The verification metric is well known in the art and will not be described in further detail herein.

다음으로, 검증 결과로 산출된 성능 수치를 이용하여 복수의 후보 이상 거래 탐지 모델 중에서 실제 이상 거래 탐지에 이용될 적어도 하나의 이상 거래 탐지 모델이 결정된다(S146). 예를 들어, 검증 메트릭이 F₁-measure인 경우, 검증 결과로 산출된 F₁-measure 측정 값이 기 설정된 값 이상인 적어도 하나의 후보 이상 거래 탐지 모델을 상기 이상 거래 탐지 모델로 결정되거나, F1-measure 측정 값이 가장 높은 후보 이상 거래 탐지 모델을 상기 이상 거래 탐지 모델로 결정될 수 있다.Next, at least one abnormal transaction detection model to be used for actual abnormal transaction detection among the plurality of candidate abnormal transaction detection models is determined (S146) by using the performance values calculated as a result of the verification. For example, when the verification metric is F ₁ -measure, at least one candidate abnormal transaction detection model in which the F ₁ -measure measurement value calculated as the verification result is equal to or greater than a predetermined value is determined as the abnormal transaction detection model, the abnormal transaction detection model having the highest measure value can be determined as the abnormal transaction detection model.

보다 이해의 편의를 제공하기 위해, 도 11a 내지 도 11b에 도시된 검증 결과를 참조하여 부연 설명한다.In order to provide a more convenient understanding, the verification results shown in FIGS. 11A to 11B will be further described with reference to FIG.

도 11은 표 1에 도시된 각 비지도 학습 방식의 클러스터링 알고리즘을 이용하여 구축된 후보 이상 거래 탐지 모델에 대하여 검증을 수행한 결과를 도시한다. 구체적으로, 도 11a는 정확도가 측정된 검증 결과를 도시하고, 도 11b는 F₁-measure가 측정된 검증 결과를 도시한다. 또한, 도 11a 및 도 11b에서 좌측 그래프는 90:10으로 샘플링 된 검증 데이터로 검증을 수행한 결과이고 우측 그래프는 99:1로 샘플링 된 검증 데이터로 검증을 수행한 결과를 나타낸다.FIG. 11 shows the result of performing the verification on the candidate abnormal transaction detection model constructed using the clustering algorithm of each non-edge learning method shown in Table 1. FIG. Specifically, FIG. 11A shows the verification result in which the accuracy is measured, and FIG. 11B shows the verification result in which the F ₁ -measure is measured. 11A and 11B, the left graph shows the result of verification with the verification data sampled at 90:10, and the right graph shows the verification result with the verification data sampled at 99: 1.

도 11a 내지 도 11b를 참조하면, 정확도 또는 F₁-measure를 검증 메트릭으로 하여 검증을 수행하는 경우 도 11a 내지 도 11b에 도시된 바와 같이 각 검증 메트릭에 따른 성능 수치가 산출될 수 있다. 복수의 검증 메트릭을 이용하는 경우, 각 검증 메트릭에 따른 성능 수치의 산술 평균 또는 가중 평균 등을 이용하여 이상 거래 탐지 모델이 결정될 수 있다. 예를 들어, FN 오류를 줄이는 것이 중요한 경우, 재현율에 보다 큰 가중치를 부여하고 가중 평균을 구할 수 있다. 또는, FP 오류를 줄이는 것이 중요한 경우 정확도에 보다 큰 가중치를 부여하고 가중 평균을 구할 수 있다.Referring to Figs. 11A to 11B, when performing verification using accuracy or F ₁ -measure as a verification metric, a performance value according to each verification metric can be calculated as shown in Figs. 11A to 11B. In case of using a plurality of verification metrics, the abnormal transaction detection model can be determined using an arithmetic average or a weighted average of the performance values according to each verification metric. For example, if it is important to reduce the FN error, the recall rate can be weighted more heavily and a weighted average can be obtained. Or, if it is important to reduce FP errors, you can give a greater weight to the accuracy and obtain a weighted average.

도 11a 및 도 11b의 경우, farthest-first 클러스터링 알고리즘을 통해 구축된 후보 이상 거래 탐지 모델이 각 검증 메트릭 별로 우수한 성능을 보여주고 있음을 알 수 있다. 따라서, 본 검증 결과에 따르면 farthest-first 기반의 후보 이상 거래 탐지 모델이 이상 거래 탐지 모델로 결정될 수 있다. 단, 구현 방식에 따라 복수의 이상 거래 탐지 모델이 선정될 수도 있다.11A and 11B, it can be seen that the candidate abnormal trading detection model constructed through the farthest-first clustering algorithm shows excellent performance for each verification metric. Therefore, according to this verification result, farthest-first based abnormal transaction detection model can be determined as abnormal transaction detection model. However, a plurality of abnormal transaction detection models may be selected according to the implementation method.

참고로, 도 11a 내지 도 11b에 도시된 바와 같이 검증 데이터 또한 서로 다른 비율로 제1 결제 데이터에서 샘플링하고, 샘플링 된 각 검증 데이터 별로 검증이 수행되는 경우, 상기 각 검증 데이터 별 성능 수치의 평균 값을 이용하여 이상 거래 탐지 모델이 선정될 수도 있다.11A to 11B, when the verification data is also sampled in the first payment data at different rates and the verification is performed for each sampled verification data, the average value of the performance values for each verification data An abnormal transaction detection model may be selected.

지금까지 도 9 내지 도 11을 참조하여 설명한 이상 거래 탐지 모델 결정 방법에 따르면, 데이터의 종류에 따라 서로 다른 이상 거래 탐지 모델이 구축될 수 있다. 즉, 거래 데이터의 종류에 따라 각 비지도 학습 알고리즘의 학습 성능은 달라질 수 있는 바, 모델 검증을 통해 동적으로 복수의 후보 이상 거래 탐지 모델 중에서 이상 거래 탐지 모델을 결정하는 경우, 거래 데이터의 종류 별로 최적의 이상 거래 탐지 모델이 결정될 수 있다.According to the abnormal transaction detection model determination method described above with reference to FIGS. 9 to 11, different abnormal transaction detection models can be constructed depending on the type of data. In other words, the learning performance of each BID learning algorithm can be changed according to the type of transaction data. In the case of determining an abnormal transaction detection model among a plurality of candidate abnormal transaction detection models dynamically through model verification, An optimal abnormal transaction detection model can be determined.

예를 들어, 도 12에 도시된 바와 같이 제1 거래 데이터(510)에 대하여는 제1 클러스터링 알고리즘을 이용하여 구축된 이상 거래 탐지 모델(530)이 선정되고, 제2 거래 데이터(530)에 대하여는 제2 클러스터링 알고리즘을 이용하여 구축된 이상 거래 탐지 모델(510)이 실제 이용되는 모델로 선정될 수 있다. 여기서, 제1 거래 데이터는 예를 들어 인터넷 뱅킹 데이터가 될 수 있고, 제2 거래 데이터는 결제 데이터가 될 수 있고, 제1 클러스터링 알고리즘 및 제2 클러스터링 알고리즘은 표 1에 표시된 알고리즘 중 어느 하나의 알고리즘일 수 있다.For example, as shown in FIG. 12, the abnormal transaction detection model 530 constructed using the first clustering algorithm is selected for the first transaction data 510, and the abnormal transaction detection model 530 is selected for the second transaction data 530 The abnormal transaction detection model 510 constructed using the two-clustering algorithm can be selected as the actual used model. Here, the first transaction data may be, for example, Internet banking data, the second transaction data may be payment data, and the first and second clustering algorithms may be any one of the algorithms shown in Table 1 Lt; / RTI >

실시예에 따라, 동일한 결제 데이터에 대해서도 카드, 핸드폰, 계좌 이체 등의 결제 수단에 따라 서로 다른 이상 거래 탐지 모델이 결정될 수 있다. 이때, 제1 거래 데이터(510)는 예를 들어 카드로 결제된 결제 데이터가 될 수 있고, 제2 거래 데이터(530)는 핸드폰으로 결제된 결제 데이터가 될 수 있다. 이외에도, 거래 데이터의 종류를 결제 지역, 결제 시간, 단말 소유자의 연령 등 다양한 기준에 따라 분류하는 경우, 분류된 거래 데이터 별로 따라 적어도 일부는 서로 다른 클러스터링 알고리즘을 이용하여 구축된 이상 거래 탐지 모델이 선정될 수 있다.According to the embodiment, different abnormal transaction detection models can be determined for the same settlement data according to a payment means such as a card, a mobile phone, and a bank transfer. At this time, the first transaction data 510 may be payment data settled by a card, for example, and the second transaction data 530 may be payment data settled by a mobile phone. In addition, when classifying transaction data according to various criteria such as settlement area, settlement time, and owner's age, an abnormal transaction detection model constructed using at least some different clustering algorithms according to classified transaction data is selected .

지금까지, 도 4 내지 도 12를 참조하여 이상 거래 탐지 방법에 포함된 이상 거래 탐지 모델 구축 단계 및 이에 관련된 다양한 실시예에 대하여 설명하였다. 다음으로, 도 13을 참조하여 구축된 이상 거래 탐지 모델을 이용하여 이상 거래 해당 여부를 판단하는 단계에 대하여 설명한다.Up to now, the steps of constructing the abnormal transaction detection model included in the abnormal transaction detection method and various embodiments related thereto have been described with reference to FIG. 4 to FIG. Next, a step of determining whether or not an abnormal transaction exists by using the abnormal transaction detection model constructed with reference to FIG. 13 will be described.

도 13은 본 발명의 실시예에 따라 이상 거래 해당 여부를 판단하는 단계의 상세 순서도이다.FIG. 13 is a detailed flowchart of a step of determining whether an abnormal transaction exists according to an embodiment of the present invention.

도 13을 참조하면, 정상 거래 또는 이상 거래를 나타내는 분류 결과가 포함되지 않은 탐지 대상 결제 데이터가 획득된다(S200). 예를 들어, 이상 거래 탐지 장치(100)가 복수의 사용자 단말(300)이 요청한 전자 결제에 관한 데이터를 전자 결제 서버(200)를 통해 실시간으로 수신하는 것일 수 있다. 또는, 이상 거래 탐지 장치(100)가 복수의 사용자 단말(300)과 전자 결제 서버(200) 사이에서 리버스 프록시 방식으로 결제 데이터를 실시간으로 수신하는 것일 수 있다.Referring to FIG. 13, detection target settlement data that does not include a classification result indicating a normal transaction or an abnormal transaction is obtained (S200). For example, the abnormal transaction detection device 100 may receive data on electronic payment requested by a plurality of user terminals 300 through the electronic payment server 200 in real time. Alternatively, the abnormal transaction detection device 100 may receive settlement data in real time between the plurality of user terminals 300 and the electronic payment server 200 in a reverse proxy manner.

다음으로, 탐지 대상 결제 데이터를 구성하는 복수의 특징 중에서 기 선택된 핵심 특징을 추출된다(S220). 다음으로, 탐지 대상 결제 데이터의 핵심 특징을 기 구축된 이상 거래 탐지 모델에 입력하여 분류 결과를 출력하고, 상기 분류 결과에 따라 이상 거래에 해당하는지 여부가 판단된다(S240).Next, the selected key features are extracted from a plurality of features constituting the settlement data to be detected (S220). Next, the key characteristic of the payment data to be detected is input to the built-in abnormal transaction detection model, the classification result is output, and it is determined whether or not it corresponds to the abnormal transaction according to the classification result (S240).

보다 이해의 편의를 제공하기 위해, 이상 거래 탐지 모델의 동작에 대하여 부연 설명하면, 이상 거래 탐지 모델은 예를 들어 입력된 핵심 특징을 클러스터링 알고리즘을 이용하여 데이터 공간에 매핑하고, 매핑된 포인트의 위치와 대표 클러스터의 중심점과의 거리(Euclidean Distance)가 상기 대표 클러스터의 반경 이내인 경우 상기 대표 클러스터에 속한다고 판단할 수 있다. 또한, 이상 거래 탐지 모델은 상기 대표 클러스터가 정상 거래를 가리키는 클러스터인 경우 분류 결과를 정상 거래로 출력하고, 상기 대표 클러스터가 이상 거래를 가리키는 클러스터인 경우 분류 결과를 이상 거래로 출력할 수 있다.In order to more easily understand the operation of the abnormal transaction detection model, the abnormal transaction detection model maps the input key features to the data space using the clustering algorithm, And the distance Euclidean Distance between the center of the representative cluster and the center of the representative cluster is within the radius of the representative cluster. In addition, the abnormal transaction detection model may output classification results as normal transactions when the representative clusters indicate clusters indicating normal transactions, and output classification results as abnormal transactions when the representative clusters indicate clusters indicating abnormal transactions.

또는 실시예에 따라 상기 거리가 대표 클러스터의 최대 거리와 최소 거리 사이에 있는 경우, 상기 대표 클러스터에 속한다고 판단할 수 있다. 여기서, 상기 최대 거리는 대표 클러스터의 중심점과 가장 멀리 위치한 포인트 사이의 거리를 의미하고, 상기 최소 거리는 대표 클러스터의 중심점과 가장 가까이 위치한 포인트 사이의 거리를 의미할 수 있다.Or if the distance is between the maximum distance and the minimum distance of the representative cluster according to the embodiment, it can be determined to belong to the representative cluster. Here, the maximum distance means a distance between a center point of the representative cluster and a point located farthest from the representative cluster, and the minimum distance may mean a distance between the center point of the representative cluster and the closest point.

지금까지, 도 13을 참조하여 기 구축된 이상 거래 탐지 모델 기반으로 탐지 대상 결제 데이터가 이상 거래에 해당하는지 여부를 판단하는 단계에 대하여 설명하였다. 정리하면, 비지도 학습 방식의 클러스터링 알고리즘을 이용하여 구축된 이상 거래 탐지 모델을 이용하여 해당 결제 데이터가 이상 거래에 해당되는지 여부가 판단될 수 있다.13, the step of determining whether or not the detection target settlement data corresponds to the abnormal transaction based on the established abnormal transaction detection model has been described. In summary, it can be determined whether the payment data corresponds to the abnormal transaction using the abnormal transaction detection model constructed using the clustering algorithm of the non-bank learning method.

지금까지 도 1 내지 도 13을 참조하여 설명된 본 발명의 개념은 컴퓨터가 읽을 수 있는 매체 상에 컴퓨터가 읽을 수 있는 코드로 구현될 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체는, 예를 들어 이동형 기록 매체(CD, DVD, 블루레이 디스크, USB 저장 장치, 이동식 하드 디스크)이거나, 고정식 기록 매체(ROM, RAM, 컴퓨터 구비 형 하드 디스크)일 수 있다. 상기 컴퓨터로 읽을 수 있는 기록 매체에 기록된 상기 컴퓨터 프로그램은 인터넷 등의 네트워크를 통하여 다른 컴퓨팅 장치에 전송되어 상기 다른 컴퓨팅 장치에 설치될 수 있고, 이로써 상기 다른 컴퓨팅 장치에서 사용될 수 있다.The concepts of the present invention described above with reference to Figures 1 to 13 can be implemented in computer readable code on a computer readable medium. The computer readable recording medium may be, for example, a removable recording medium (CD, DVD, Blu-ray disk, USB storage device, removable hard disk) . The computer program recorded on the computer-readable recording medium may be transmitted to another computing device via a network such as the Internet and installed in the other computing device, thereby being used in the other computing device.

도면에서 동작들이 특정한 순서로 도시되어 있지만, 반드시 동작들이 도시된 특정한 순서로 또는 순차적 순서로 실행되어야만 하거나 또는 모든 도시 된 동작들이 실행되어야만 원하는 결과를 얻을 수 있는 것으로 이해되어서는 안 된다. 특정 상황에서는, 멀티태스킹 및 병렬 처리가 유리할 수도 있다. 더욱이, 위에 설명한 실시예들에서 다양한 구성들의 분리는 그러한 분리가 반드시 필요한 것으로 이해되어서는 안 되고, 설명된 프로그램 컴포넌트들 및 시스템들은 일반적으로 단일 소프트웨어 제품으로 함께 통합되거나 다수의 소프트웨어 제품으로 패키지 될 수 있음을 이해하여야 한다.Although the operations are shown in the specific order in the figures, it should be understood that the operations need not necessarily be performed in the particular order shown or in a sequential order, or that all of the illustrated operations must be performed to achieve the desired result. In certain situations, multitasking and parallel processing may be advantageous. Moreover, the separation of the various configurations in the above-described embodiments should not be understood as such a separation being necessary, and the described program components and systems may generally be integrated together into a single software product or packaged into multiple software products .

이상 첨부된 도면을 참조하여 본 발명의 실시예들을 설명하였지만, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, You will understand. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive.

Claims

이상 거래 탐지 장치가, 이상 거래 또는 정상 거래를 가리키는 분류 결과를 포함하는 제1 거래 데이터와 상기 분류 결과를 포함하지 않는 제2 거래 데이터를 획득하되, 상기 제1 거래 데이터 및 상기 제2 거래 데이터 각각은 복수의 특징으로 구성되는 것인, 단계;
상기 이상 거래 탐지 장치가, 필터 방법 기반의 특징 선택 알고리즘을 이용하여 상기 제1 거래 데이터를 구성하는 복수의 특징 중에서 핵심 특징을 결정하는 단계; 및
상기 이상 거래 탐지 장치가, 상기 제2 거래 데이터에 대한 상기 핵심 특징의 데이터에 대하여 비지도 학습(unsupervised learning) 기반의 클러스터링을 수행하고, 상기 클러스터링의 결과를 이용하여 이상 거래 탐지 모델을 구축하는 단계를 포함하되,
상기 이상 거래 탐지 모델을 구축하는 단계는,
상기 클러스터링의 결과로 제1 클러스터 및 제2 클러스터를 포함하는 복수의 클러스터를 획득하는 단계;
상기 제1 클러스터에 속한 거래 데이터의 개수와 상기 제2 클러스터에 속한 거래 데이터의 개수를 비교하는 단계; 및
상기 비교의 결과에 기초하여, 상기 제1 클러스터를 정상 거래를 가리키는 클러스터로 지정하고, 상기 제2 클러스터를 이상 거래를 가리키는 클러스터로 지정하는 단계를 포함하는,
이상 거래 탐지 방법.Wherein the abnormal transaction detection device obtains first transaction data including a classification result indicating an abnormal transaction or a normal transaction and second transaction data that does not include the classification result and wherein the first transaction data and the second transaction data Is comprised of a plurality of features;
Wherein the abnormal transaction detection device comprises: determining key features among a plurality of features constituting the first transaction data using a filter method-based feature selection algorithm; And
Wherein the abnormal transaction detection device performs an unsupervised learning based clustering on the data of the core characteristic with respect to the second transaction data and constructs an abnormal transaction detection model using the result of the clustering , &Lt; / RTI &
The step of constructing the abnormal transaction detection model includes:
Obtaining a plurality of clusters including a first cluster and a second cluster as a result of the clustering;
Comparing the number of transaction data belonging to the first cluster with the number of transaction data belonging to the second cluster; And
Designating the first cluster as a cluster indicating a normal transaction and designating the second cluster as a cluster indicating an abnormal transaction based on a result of the comparison;
Abnormal transaction detection method.

제1 항에 있어서,
상기 핵심 특징을 결정하는 단계는,
상기 제1 거래 데이터를 구성하는 복수의 특징 각각에 대하여 기 설정된 평가 메트릭에 따른 특징 별 스코어를 계산하는 단계; 및
상기 특징 별 스코어를 기준으로 상기 복수의 특징 중에서 상기 핵심 특징을 선택하는 단계를 포함하는,
이상 거래 탐지 방법.The method according to claim 1,
Wherein determining the key features comprises:
Calculating a score for each feature according to a predetermined evaluation metric for each of a plurality of features constituting the first transaction data; And
And selecting the core feature from among the plurality of features based on the score for each feature.
Abnormal transaction detection method.

제2 항에 있어서,
상기 선택된 핵심 특징의 개수가 k개(단, k는 2이상의 자연수)인 경우, 상기 k개의 핵심 특징을 기초로 주성분 분석(Principal Component Analysis)을 수행하여 m개(단, m은 1이상 k 미만의 자연수)의 핵심 특징을 산출하는 단계를 더 포함하며,
상기 이상 거래 탐지 모델을 구축하는 단계는,
상기 제2 거래 데이터에 대한 상기 m개의 핵심 특징의 데이터에 대하여 수행된 클러스터링 결과를 이용하여 상기 이상 거래 탐지 모델을 구축하는 단계를 포함하는,
이상 거래 탐지 방법.3. The method of claim 2,
(K is a natural number greater than or equal to 2), performing principal component analysis on the basis of the k key features to obtain m (where m is 1 or more and less than k Lt; / RTI > is a natural number of < RTI ID = 0.0 >
The step of constructing the abnormal transaction detection model includes:
And constructing the abnormal transaction detection model using clustering results performed on data of the m key features for the second transaction data.
Abnormal transaction detection method.

제2 항에 있어서,
상기 기 설정된 평가 메트릭은 제1 평가 메트릭 및 제2 평가 메트릭을 포함하고,
상기 특징 별 스코어를 계산하는 단계는,
상기 제1 거래 데이터를 구성하는 복수의 특징 각각에 대하여 제1 평가 메트릭에 따른 특징 별 스코어를 계산하는 단계; 및
상기 제1 거래 데이터를 구성하는 복수의 특징 각각에 대하여 제2 평가 메트릭에 따른 특징 별 스코어를 계산하는 단계를 포함하고,
상기 기 설정된 개수의 핵심 특징을 선택하는 단계는,
상기 제1 평가 메트릭에 따른 특징 별 스코어와 상기 제2 평가 메트릭에 따른 특징 별 스코어의 평균 값을 기준으로 상기 기 설정된 개수의 핵심 특징을 선택하는 단계를 포함하는,
이상 거래 탐지 방법.3. The method of claim 2,
Wherein the predetermined evaluation metric includes a first evaluation metric and a second evaluation metric,
The step of calculating the feature-
Calculating a score for each feature according to the first evaluation metric for each of the plurality of features constituting the first transaction data; And
And calculating a score for each feature according to the second evaluation metric for each of the plurality of features constituting the first transaction data,
Wherein the step of selecting a predetermined number of key features comprises:
Selecting a predetermined number of key features based on an average value of a feature-specific score according to the first evaluation metric and a feature-based score according to the second evaluation metric,
Abnormal transaction detection method.

제4 항에 있어서,
상기 기 설정된 평가 메트릭은,
카이-스퀘어(chi-square), IG(Information Gain), One-R 및 SU(Symmetrical Uncertainty)를 포함하는,
이상 거래 탐지 방법.5. The method of claim 4,
The predetermined evaluation metric includes:
Including chi-square, IG (Information Gain), One-R, and SU (Symmetrical Uncertainty)
Abnormal transaction detection method.

제1 항에 있어서,
상기 핵심 특징을 결정하는 단계는,
정상 거래 및 이상 거래를 가리키는 거래 데이터의 비율이 제1 비율이 되도록 상기 제1 거래 데이터를 샘플링하는 단계;
상기 거래 데이터의 비율이 상기 제1 비율과 다른 제2 비율이 되도록 상기 제1 거래 데이터를 샘플링하는 단계;
상기 제1 비율로 샘플링 된 제1 샘플링 데이터 및 상기 제2 비율로 샘플링 된 제2 샘플링 데이터 각각에 대하여 기 설정된 평가 메트릭에 따른 특징 별 스코어를 계산하는 단계;
상기 제1 샘플링 데이터에 대한 특징 별 스코어와 상기 제1 샘플링 데이터에 대한 특징 별 스코어의 평균 값을 이용하여, 상기 기 설정된 개수의 핵심 특징을 선택하는 단계를 포함하는,
이상 거래 탐지 방법.The method according to claim 1,
Wherein determining the key features comprises:
Sampling the first transaction data such that a ratio of transaction data indicating normal transactions and abnormal transactions is a first rate;
Sampling the first transaction data such that the ratio of the transaction data is a second rate different from the first rate;
Calculating a score for each feature according to a predetermined evaluation metric for each of the first sampling data sampled at the first rate and the second sampling data sampled at the second rate;
Selecting a predetermined number of key features using a characteristic score for the first sampling data and an average value of the characteristic score for the first sampling data,
Abnormal transaction detection method.

제1 항에 있어서,
상기 비지도 학습 기반의 클러스터링은,
farthest-first 알고리즘을 이용하여 수행되는 것인,
이상 거래 탐지 방법.The method according to claim 1,
The clustering based on the non-background learning,
RTI ID = 0.0 > farthest-first < / RTI >
Abnormal transaction detection method.

제1 항에 있어서,
상기 이상 거래 탐지 모델을 구축하는 단계는,
상기 비지도 학습 기반의 제1 클러스터링 알고리즘을 수행하여 제1 후보 이상 거래 탐지 모델을 구축하는 단계;
상기 비지도 학습 기반의 제2 클러스터링 알고리즘을 수행하여 제2 후보 이상 거래 탐지 모델을 구축하되, 상기 제2 클러스터링 알고리즘은 상기 제1 클러스터링 알고리즘과 서로 다른 알고리즘인 것인, 단계;
기 설정된 검증 메트릭 기준으로 상기 제1 후보 이상 거래 탐지 모델 및 상기 제2 후보 이상 거래 탐지 모델의 성능을 검증하는 단계; 및
상기 검증을 통해 산출된 성능 수치를 이용하여, 상기 제1 후보 이상 거래 탐지 모델 및 상기 제2 후보 이상 거래 탐지 모델 중에서 상기 이상 거래 탐지 모델을 결정하는 단계를 포함하는,
이상 거래 탐지 방법.The method according to claim 1,
The step of constructing the abnormal transaction detection model includes:
Performing a first clustering algorithm based on the non-background learning to construct a first candidate abnormal transaction detection model;
Wherein the second clustering algorithm is a different algorithm than the first clustering algorithm; and performing a second clustering algorithm based on the non-background learning based on the second clustering algorithm.
Verifying the performance of the first candidate abnormal transaction detection model and the second candidate abnormal transaction detection model on the basis of a predetermined verification metric; And
And determining the abnormal transaction detection model from among the first candidate abnormal transaction detection model and the second candidate abnormal transaction detection model using the performance value calculated through the verification.
Abnormal transaction detection method.

제8 항에 있어서,
상기 기 설정된 검증 메트릭은,
정확도(precision), 재현율(recall) 및 F-measure 중에서 적어도 하나를 포함하는,
이상 거래 탐지 방법.9. The method of claim 8,
The predetermined verification metric includes:
Comprising at least one of an accuracy, a recall, and an F-measure,
Abnormal transaction detection method.

제8 항에 있어서,
상기 제1 클러스터링 알고리즘 및 상기 제2 클러스터링 알고리즘은,
EM 알고리즘, k-평균 클러스터링 알고리즘, farthest-first 알고리즘, X-평균 클러스터링 알고리즘 및 make density 알고리즘 중 어느 하나의 알고리즘인 것인,
이상 거래 탐지 방법.9. The method of claim 8,
Wherein the first clustering algorithm and the second clustering algorithm comprise:
Average clustering algorithm, the EM algorithm, the k-means clustering algorithm, the farthest-first algorithm, the X-average clustering algorithm and the make-
Abnormal transaction detection method.

제1 항에 있어서,
상기 분류 결과가 포함되지 않은 탐지 대상 거래 데이터를 획득하는 단계;
상기 탐지 대상 거래 데이터를 구성하는 복수의 특징 중에서 상기 핵심 특징을 추출하는 단계; 및
상기 탐지 대상 거래 데이터에 대한 핵심 특징의 데이터를 상기 이상 거래 탐지 모델에 입력하여 상기 탐지 대상 거래 데이터에 대한 분류 결과를 출력하는 단계를 더 포함하는,
이상 거래 탐지 방법.The method according to claim 1,
Acquiring detection target transaction data not including the classification result;
Extracting the key features from among a plurality of features constituting the detection target transaction data; And
Further comprising the step of inputting data of a core characteristic of the detection target transaction data to the abnormal transaction detection model and outputting a classification result of the detection target transaction data.
Abnormal transaction detection method.

하나 이상의 프로세서;
네트워크 인터페이스;
상기 프로세서에 의하여 수행되는 컴퓨터 프로그램을 로드(Load)하는 메모리; 및
상기 컴퓨터 프로그램을 저장하는 스토리지를 포함하되,
상기 컴퓨터 프로그램은,
이상 거래 또는 정상 거래를 가리키는 분류 결과를 포함하는 제1 거래 데이터와 상기 분류 결과를 포함하지 않는 제2 거래 데이터를 획득하되, 상기 제1 거래 데이터 및 상기 제2 거래 데이터 각각은 복수의 특징으로 구성되는 것인, 오퍼레이션;
필터 방법 기반의 특징 선택 알고리즘을 이용하여 상기 제1 거래 데이터를 구성하는 복수의 특징 중에서 핵심 특징을 결정하는 오퍼레이션; 및
상기 제2 거래 데이터에 대한 상기 핵심 특징의 데이터에 대하여 비지도 학습(unsupervised learning) 기반의 클러스터링을 수행하고, 상기 클러스터링의 결과를 이용하여 이상 거래 탐지 모델을 구축하는 오퍼레이션을 포함하되,
상기 이상 거래 탐지 모델을 구축하는 오퍼레이션은,
상기 클러스터링의 결과로 제1 클러스터 및 제2 클러스터를 포함하는 복수의 클러스터를 획득하는 오퍼레이션;
상기 제1 클러스터에 속한 거래 데이터의 개수와 상기 제2 클러스터에 속한 거래 데이터의 개수를 비교하는 오퍼레이션; 및
상기 비교의 결과에 기초하여, 상기 제1 클러스터를 정상 거래를 가리키는 클러스터로 지정하고, 상기 제2 클러스터를 이상 거래를 가리키는 클러스터로 지정하는 오퍼레이션을 포함하는,
이상 거래 탐지 장치.One or more processors;
Network interface;
A memory for loading a computer program executed by the processor; And
And a storage for storing the computer program,
The computer program comprising:
The first transaction data including a classification result indicating an abnormal transaction or a normal transaction and the second transaction data not including the classification result, wherein each of the first transaction data and the second transaction data includes a plurality of features Operation;
An operation for determining a core feature among a plurality of features configuring the first transaction data using a filter method-based feature selection algorithm; And
Performing an unsupervised learning based clustering on the data of the core characteristic of the second transaction data and constructing an abnormal transaction detection model using the result of the clustering,
The operations for constructing the abnormal transaction detection model include:
Obtaining a plurality of clusters including a first cluster and a second cluster as a result of the clustering;
An operation for comparing the number of transaction data belonging to the first cluster with the number of transaction data belonging to the second cluster; And
Designating the first cluster as a cluster indicating a normal transaction and designating the second cluster as a cluster indicating an abnormal transaction based on a result of the comparison;
Abnormal transaction detection device.

컴퓨팅 장치와 결합하여,
이상 거래 또는 정상 거래를 가리키는 분류 결과를 포함하는 제1 거래 데이터와 상기 분류 결과를 포함하지 않는 제2 거래 데이터를 획득하되, 상기 제1 거래 데이터 및 상기 제2 거래 데이터 각각은 복수의 특징으로 구성되는 것인, 단계;
필터 방법 기반의 특징 선택 알고리즘을 이용하여 상기 제1 거래 데이터를 구성하는 복수의 특징 중에서 핵심 특징을 결정하는 단계; 및
상기 제2 거래 데이터에 대한 상기 핵심 특징의 데이터에 대하여 비지도 학습(unsupervised learning) 기반의 클러스터링을 수행하고, 상기 클러스터링의 결과를 이용하여 이상 거래 탐지 모델을 구축하는 단계를 실행시키되,
상기 이상 거래 탐지 모델을 구축하는 단계는,
상기 클러스터링의 결과로 제1 클러스터 및 제2 클러스터를 포함하는 복수의 클러스터를 획득하는 단계;
상기 제1 클러스터에 속한 거래 데이터의 개수와 상기 제2 클러스터에 속한 거래 데이터의 개수를 비교하는 단계; 및
상기 비교의 결과에 기초하여, 상기 제1 클러스터를 정상 거래를 가리키는 클러스터로 지정하고, 상기 제2 클러스터를 이상 거래를 가리키는 클러스터로 지정하는 단계를 포함하는, 컴퓨터로 판독 가능한 기록매체에 저장된,
컴퓨터 프로그램.In combination with the computing device,
The first transaction data including a classification result indicating an abnormal transaction or a normal transaction and the second transaction data not including the classification result, wherein each of the first transaction data and the second transaction data includes a plurality of features ;
Determining key features among a plurality of features constituting the first transaction data using a filter method-based feature selection algorithm; And
Performing unsupervised learning based clustering on the data of the core characteristic of the second transaction data and constructing an abnormal transaction detection model using the result of the clustering,
The step of constructing the abnormal transaction detection model includes:
Obtaining a plurality of clusters including a first cluster and a second cluster as a result of the clustering;
Comparing the number of transaction data belonging to the first cluster with the number of transaction data belonging to the second cluster; And
Designating the first cluster as a cluster indicating a normal transaction and designating the second cluster as a cluster indicating an abnormal transaction based on a result of the comparison;
Computer program.