KR20190070702A

KR20190070702A - System and method for automatically verifying security events based on text mining

Info

Publication number: KR20190070702A
Application number: KR1020170171485A
Authority: KR
Inventors: 박동훈; 김종연; 조승연; 김병훈
Original assignee: 주식회사 한류에이아이센터
Priority date: 2017-12-13
Filing date: 2017-12-13
Publication date: 2019-06-21
Also published as: KR102221492B1

Abstract

The present invention relates to a verifying system and method to preprocess a collected security event corresponding to learning data based on text mining, generate a verification model by using and learning the preprocessed data, select a verification model to evaluate and apply the validity of the generated verification model, and verify the newly generated security event by applying the generated security event into the selected verification model.

Description

텍스트 마이닝 기반 보안 이벤트 자동 검증 시스템 및 방법{SYSTEM AND METHOD FOR AUTOMATICALLY VERIFYING SECURITY EVENTS BASED ON TEXT MINING}TECHNICAL FIELD [0001] The present invention relates to a system and method for automatically verifying security events based on text mining,

이하의 일 실시 예들은 자동 검증 시스템 및 방법에 관한 것으로, 텍스트 기반 보안 이벤트를 분석하고 검증하는 기술 관한 것이다.One embodiment of the present invention relates to an automated verification system and method, and techniques for analyzing and verifying text based security events.

대량의 보안 이벤트 탐지 및 분석 업무의 효율성을 향상시키기 위한 연구가 다양하게 진행되어 왔다. 하지만, 기존 연구의 대부분은 IP, 포트, 프로토콜, 이벤트 명 등의 기본 정보를 이용하며, 통계 분석 및 시각화를 통한 간접적 방법으로 사이버 위협 동향 파악 및 보안이벤트 감소를 목적으로 하고 있다. 따라서, 보안이벤트에 대한 실제 공격 여부를 판단하기 위해서는 보안 관제 업무 수행 시 추가적인 분석이 필수적이다.Various researches have been conducted to improve the efficiency of large-scale security event detection and analysis. However, most of the existing researches use basic information such as IP, port, protocol, and event name, and use indirect methods through statistical analysis and visualization to identify cyber threat trends and reduce security events. Therefore, in order to determine whether an actual attack on a security event is performed, additional analysis is necessary when performing security control work.

기존 보안 관제 체계에서는 텍스트 기반의 보안 이벤트를 보안 관제 요원의 분석 및 처리를 진행한다. 이에 따라 급증하는 탐지 이벤트에 비례하여, 보안 관제 요원의 분석 업무도 증가하게 되고, 출현 빈도가 높은 이벤트 처리에 많은 시간을 소비하게 되는 구조를 가지고 있다. 즉, 모든 보안 이벤트의 정탐 및 오탐 검증이 불가능한 상황이며, 검증된 이벤트도 특정 유형에 편중되는 현상이 발생하고 있다. 또한, 보안 관제 요원 별 분석 수준에 따라 결과의 질적 차이가 발생할 수 있는 문제점이 있다.In the existing security control system, text based security events are analyzed and processed by security control personnel. As a result, the analysis task of the security control personnel increases in proportion to the surging detection event, and it has a structure in which a lot of time is spent in event processing with high occurrence frequency. In other words, it is impossible to perform spying and false verification of all security events. In addition, there is a problem that qualitative difference in results may occur depending on the level of analysis by the security control personnel.

본 발명은 상기와 같은 종래 기술의 문제점을 해결하고자 도출된 것으로서, 텍스트 마이닝 기반 보안 이벤트 자동 검증 시스템 및 방법을 제공하는 것을 목적으로 한다.SUMMARY OF THE INVENTION It is an object of the present invention to provide a system and method for automatically verifying a security event based on text mining.

구체적으로, 본 발명은 보안 관제 체계에서 텍스트 기반 보안 이벤트를 텍스트 마이닝 기반으로 분석하고, 검증하는 시스템에 관한 것으로, 기존의 분석된 이벤트와 유사한 이벤트의 경우, 자동적으로 정탐 및 오탐을 분류하며, 신규 탐지 유형의 이벤트는 즉각적으로 보안 관제 요원의 상세 분석이 이뤄질 수 있도록 함으로써, 보안 관제 업무에서 보안 이벤트 분석의 다양성을 확보하고, 업무 효율성을 제고하는 것을 그 목적으로 한다.More specifically, the present invention relates to a system for analyzing and verifying text-based security events on a text mining basis in a security control system. In the case of events similar to existing analyzed events, the system automatically classifies spam and false positives, The purpose of the detection type event is to enable the detailed analysis of the security control personnel to be performed immediately, thereby securing the diversity of the security event analysis in the security control work and improving the work efficiency.

상기와 같은 목적을 달성하기 위하여, 본 발명의 일 실시 예에 따른 텍스트 마이닝을 기반으로 보안 이벤트를 검증하는 검증 시스템은, 보안 이벤트의 유형을 분류하고 상기 보안 이벤트를 인코딩하는 이벤트 유형 분류부; 상기 인코딩된 보안 이벤트에서 분류된 유형에 대응하는 정규식을 기반으로 데이터를 추출하는 필요 데이터 추출부; 추출된 데이터를 토큰화하는 텍스트 데이터 전처리부; 토큰화된 데이터의 토큰 가중치를 연산하고 매핑하는 토큰 가중치 처리부; 복수개의 학습 알고리즘을 적용하여 검증 모델을 생성하고, 자동 검증 모델을 적어도 3개 이상 홀수개로 선택하는 알고리즘 모델 학습부; 상기 검증 모델의 유효성이 충족되는지 확인하는 유효성 검증부; 및 유효성이 확인된 상기 자동 검증 모들들을 적용하여 신규 생성된 보안 이벤트를 검증하는 자동 검증부를 포함한다.According to an aspect of the present invention, there is provided a verification system for verifying security events based on text mining, the apparatus comprising: an event type classifier for classifying types of security events and encoding the security events; A necessary data extracting unit for extracting data based on a regular expression corresponding to a type classified in the encoded security event; A text data preprocessing unit for tokenizing the extracted data; A token weighting unit for calculating and mapping token weights of tokenized data; An algorithm model learning unit for generating a verification model by applying a plurality of learning algorithms and selecting at least three automatic verification models with an odd number; A validity verifying unit for verifying whether validity of the verification model is satisfied; And an automatic verification unit verifying the newly generated security event by applying the validated verification models.

본 발명의 일 실시 예에 따른 텍스트 마이닝을 기반으로 보안 이벤트를 검증하는 방법은, 텍스트 마이닝을 기반으로 학습 데이터에 해당하는 수집된 보안 이벤트를 전처리하는 단계; 전처리된 데이터를 이용해서 학습하여 검증 모델을 생성하는 단계; 상기 생성된 검증 모델의 유효성을 평가하고 적용할 검증 모델을 적어도 3개 이상 홀수개로 선택하는 단계; 및 신규 생성된 보안 이벤트를 상기 선택된 검증 모델들에 대입하여 검증하는 단계를 포함한다.A method of verifying security events based on text mining according to an embodiment of the present invention includes: pre-processing collected security events corresponding to learning data based on text mining; Generating a verification model by learning using the preprocessed data; Evaluating the validity of the generated verification model and selecting at least three verification models to be applied as an odd number; And inserting and verifying the newly generated security event into the selected verification models.

이때, 상기 수집된 보안 이벤트를 전처리하는 단계는, 텍스트 형식의 반정형의 보안 이벤트의 유형에 대응하는 정규식을 기반으로 데이터를 추출하는 단계; 상기 추출된 데이터를 토큰화하고, 유형별 토큰 가중치를 연산하고 매핑하는 단계; 및 토큰 가중치의 행렬 형태의 전처리된 데이터를 생성하는 단계를 포함한다.The pre-processing of the collected security events may include extracting data based on a regular expression corresponding to a type of a semi-structured security event of a text format; Tokenizing the extracted data, computing and mapping the token weights by type; And generating preprocessed data in the form of a matrix of token weights.

이때, 상기 추출된 데이터를 토큰화하는 것은, 보안 이벤트 영문 사전, 보안 이벤트 국문 사전 및 보안 이벤트 데이터 사전 순으로, 상기 추출된 데이터에 적용하여 토큰화를 수행할 수 있다.At this time, tokenization of the extracted data may be performed to the extracted data in the order of security English English dictionary, security event Korean dictionary, and security event data dictionary to perform tokenization.

이때, 상기 토큰 가중치는, 기설정된 유형에서의 해당 토큰의 수를 전체 이벤트에서 해당 토큰의 수로 나눈 값일 수 있다.In this case, the token weight may be a value obtained by dividing the number of corresponding tokens in the predetermined type by the number of corresponding tokens in the entire event.

이때, 상기 수집된 보안 이벤트의 전처리된 데이터는, 정탐 또는 오탐 여부의 결과값을 포함할 수 있다.At this time, the preprocessed data of the collected security event may include a result of the spoof or false alarm.

이때, 상기 정규식을 기반으로 데이터를 추출하는 단계는, 상기 텍스트 형식의 반정형의 보안 이벤트의 유형에 대응하는 상기 정규식이 존재하지 않으면, 사용자 인터페이스를 통해서 정규식을 생성하는 단계를 포함할 수 있다.The step of extracting data based on the regular expression may include generating a regular expression through a user interface if the regular expression corresponding to the type of the semi-formal security event of the text format does not exist.

이때, 상기 전처리된 데이터를 이용해서 학습하여 상기 검증 모델을 생성하는 단계는, 의사결정나무(Decision Tree), 서포트 벡터 머신(Support Vector Machine, SVM), 신경망 분석(Neural Network Analysis), 로지스틱회귀분석(Logistic Regression), 베이지안방법론(Bayesian) 중에서 적어도 3개의 분류 알고리즘을 기반으로 학습 진행할 수 있다.At this time, the step of learning using the preprocessed data to generate the verification model may include a decision tree, a support vector machine (SVM), a neural network analysis, a logistic regression analysis (Logistic Regression), and Bayesian methodology (Bayesian).

이때, 상기 생성된 검증 모델의 유효성을 평가하고 적용할 검증 모델을 적어도 3개 이상 홀수개로 선택하는 단계는, 상기 생성된 검증 모델에 교차검증법(Cross Validation)을 수행하여 기설정된 허용 오차 범위에 포함되는지 확인하는 단계; 확인결과 상기 생성된 검증 모델이 상기 기설정된 허용 오차 범위 미만이면, 기 적용된 검증 모델과의 정확도 평가를 진행하여 더 우수한 정확도를 가진 모델을 자동 검증 모델로 적어도 3개 이상 홀수개로 선택하는 단계; 및 확인결과 상기 생성된 검증 모델이 상기 기설정된 허용 오차 범위 이상이면, 알고리즘 별로 설정값을 변경하여 재학습을 수행하거나 또는 상기 기 적용된 검증 모델을 자동 검증 모델로 유지하는 단계를 포함할 수 있다.At this time, the step of evaluating the validity of the generated verification model and selecting at least three verification models to be applied as odd number is performed by performing a cross validation on the generated verification model, Confirming whether it is included; If the generated verification model is less than the predetermined tolerance range, proceeding with accuracy evaluation with the applied verification model to select at least three or more odd number of models with better accuracy as an automatic verification model; And if the generated verification model is greater than or equal to the predetermined tolerance range, re-learning is performed by changing a set value for each algorithm, or maintaining the previously applied verification model as an automatic verification model.

이때, 상기 신규 생성된 보안 이벤트를 상기 선택된 검증 모델들에 대입하여 검증하는 단계는, 상기 신규 생성된 보안 이벤트를 전처리 과정과 동일하게 토큰화하는 단계; 토근화된 신규 생성된 보안 이벤트의 가중치를 가중치 데이터베이스를 이용해서 가중치 매핑하는 단계; 및 행렬 형태의 신규 생성된 보안 이벤트를 상기 선택된 검증 모델들 각각에 대입하여 이벤트 별로 정탐 또는 오탐 여부를 검증하는 단계를 포함할 수 있다.At this time, the step of verifying and inserting the newly generated security event into the selected verification models may include: tokenizing the newly generated security event in the same manner as the preprocessing process; Weighting a weight of a newly created security event to be weighted using a weight database; And a step of verifying whether or not spoofing or spoofing is performed for each event by substituting a newly generated security event in a matrix form into each of the selected verification models.

이때, 상기 행렬 형태의 신규 생성된 보안 이벤트를 상기 선택된 검증 모델들 각각에 대입하여 이벤트 별로 정탐 또는 오탐 여부를 검증하는 단계는, 적어도 3개 이상의 홀수개의 상기 선택된 검증 모델들 각각에 상기 행렬 형태의 신규 생성된 보안 이벤트를 대입하여 상기 선택된 검증 모델들 별로 정탐인지 오탐인지 확인하고, 빈도가 더 높은 쪽으로 선택하여 정탐 또는 오탐 여부를 검증할 수 있다.In this case, the step of inserting the newly generated security event in the matrix form into each of the selected verification models and verifying whether or not spoofing or spoofing is performed on an event basis may include: It is possible to check whether the new security event is a spoof or a false one for each of the selected verification models, and to select a higher frequency to verify spoofing or false positives.

이때, 상기 신규 생성된 보안 이벤트를 상기 선택된 검증 모델에 대입하여 검증하는 단계는, 상기 검증 결과를 사용자에게 제공하는 단계; 상기 선택된 검증 모델들에 대응하는 상기 검증 결과를 보안 관제 요원에 의해서 확인받는 단계; 및 상기 보안 관제 요원에 의해서 확인된 보안 이벤트를 학습 데이터에 추가하여 재학습 하는 단계를 더 포함할 수 있다.At this time, the step of inserting and verifying the newly generated security event into the selected verification model may include: providing the verification result to the user; Receiving the verification results corresponding to the selected verification models by a security control agent; And re-learning by adding the security event identified by the security control agent to the learning data.

본 발명은 텍스트 마이닝을 기반으로 학습 데이터에 해당하는 수집된 보안 이벤트를 전처리하고, 전처리된 데이터를 이용해서 학습하여 검증 모델을 생성하고, 상기 생성된 검증 모델의 유효성을 평가하고 적용할 검증 모델을 선택하고, 신규 생성된 보안 이벤트를 상기 선택된 검증 모델에 대입하여 검증하는 검증 시스템 및 그 방법에 관한 것으로, 기존의 분석된 이벤트와 유사한 이벤트의 경우, 자동적으로 정탐 및 오탐을 분류하도록 할 수 있다.The present invention pre-processes collected security events corresponding to learning data based on text mining, generates a verification model by learning using preprocessed data, evaluates the validity of the generated verification model, Selecting and validating a newly generated security event into the selected verification model and verifying the selected security event, and in the case of an event similar to the existing analyzed event, it is possible to automatically classify the spam and the false alarm.

도 1은 본 발명의 일 실시 예에 따른 텍스트 마이닝을 기반으로 보안 이벤트를 검증하는 검증 시스템의 구성을 도시한 도면이다.
도 2는 본 발명의 일 실시 예에 따라 텍스트 마이닝을 기반으로 보안 이벤트를 검증하는 개략적인 과정을 도시한 흐름도이다.
도 3은 본 발명의 일 실시 예에 따라 학습 데이터를 전처리하는 과정을 도시한 흐름도이다.
도 4는 본 발명의 일 실시 예에 따라 검증 모델들을 선택하는 과정을 도시한 흐름도이다.
도 5는 본 발명의 일 실시 예에 따라 신규 생성된 보안 이벤트를 선택된 검증 모델에 대입하여 검증하는 과정을 도시한 흐름도이다.FIG. 1 is a block diagram of a verification system for verifying security events based on text mining according to an embodiment of the present invention. Referring to FIG.
FIG. 2 is a flowchart illustrating a process of verifying security events based on text mining according to an embodiment of the present invention. Referring to FIG.
3 is a flowchart illustrating a process of preprocessing learning data according to an embodiment of the present invention.
4 is a flowchart illustrating a process of selecting verification models according to an embodiment of the present invention.
5 is a flowchart illustrating a process of inserting a newly generated security event into a selected verification model and verifying the selected security event according to an embodiment of the present invention.

본 명세서에 개시되어 있는 본 발명의 개념에 따른 실시 예들에 대해서 특정한 구조적 또는 기능적 설명들은 단지 본 발명의 개념에 따른 실시 예들을 설명하기 위한 목적으로 예시된 것으로서, 본 발명의 개념에 따른 실시 예들은 다양한 형태로 실시될 수 있으며 본 명세서에 설명된 실시 예들에 한정되지 않는다.It is to be understood that the specific structural or functional descriptions of embodiments of the present invention disclosed herein are only for the purpose of illustrating embodiments of the inventive concept, But may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.

본 발명의 개념에 따른 실시 예들은 다양한 변경들을 가할 수 있고 여러 가지 형태들을 가질 수 있으므로 실시 예들을 도면에 예시하고 본 명세서에 상세하게 설명하고자 한다. 그러나, 이는 본 발명의 개념에 따른 실시 예들을 특정한 개시형태들에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Embodiments in accordance with the concepts of the present invention are capable of various modifications and may take various forms, so that the embodiments are illustrated in the drawings and described in detail herein. However, it is not intended to limit the embodiments according to the concepts of the present invention to the specific disclosure forms, but includes changes, equivalents, or alternatives falling within the spirit and scope of the present invention.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만, 예를 들어 본 발명의 개념에 따른 권리 범위로부터 이탈되지 않은 채, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.The terms first, second, or the like may be used to describe various elements, but the elements should not be limited by the terms. The terms may be named for the purpose of distinguishing one element from another, for example without departing from the scope of the right according to the concept of the present invention, the first element being referred to as the second element, Similarly, the second component may also be referred to as the first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 구성요소들 간의 관계를 설명하는 표현들, 예를 들어 "~사이에"와 "바로~사이에" 또는 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between. Expressions that describe the relationship between components, for example, "between" and "immediately" or "directly adjacent to" should be interpreted as well.

본 명세서에서 사용한 용어는 단지 특정한 실시 예들을 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 실시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, the terms "comprises ", or" having ", and the like, are used to specify one or more other features, numbers, steps, operations, elements, But do not preclude the presence or addition of steps, operations, elements, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the meaning of the context in the relevant art and, unless explicitly defined herein, are to be interpreted as ideal or overly formal Do not.

이하, 실시 예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나, 특허출원의 범위가 이러한 실시 예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. However, the scope of the patent application is not limited or limited by these embodiments. Like reference symbols in the drawings denote like elements.

이하에서는, 본 발명의 일 실시 예에 따른 텍스트 마이닝 기반 보안 이벤트 자동 검증 시스템 및 방법을 첨부된 도 1 내지 도 5를 참조하여 상세히 설명한다.Hereinafter, a text mining based security event automatic verification system and method according to an embodiment of the present invention will be described in detail with reference to FIGS. 1 to 5.

도 1은 본 발명의 일 실시 예에 따른 텍스트 마이닝을 기반으로 보안 이벤트를 검증하는 검증 시스템의 구성을 도시한 도면이다.FIG. 1 is a block diagram of a verification system for verifying security events based on text mining according to an embodiment of the present invention. Referring to FIG.

도 1을 참조하면, 본 발명의 검증 시스템은 전처리 장치(120)와 학습 및 분류 장치(130)를 포함하여 보안 이벤트를 검증할 수 있다.Referring to FIG. 1, the verification system of the present invention may include a preprocessing device 120 and a learning and classifying device 130 to verify security events.

전처리 장치(120)는 이벤트 유형 분류부(121), 필요 데이터 추출부(122), 사용자 인터페이스(123), 텍스트 데이터 전처리부(124), 토큰 가중치 처리부(125) 및 가중치 데이터베이스(126)를 이용해서 전처리를 수행할 수 있다.The preprocessing apparatus 120 uses the event type classifying unit 121, the necessary data extracting unit 122, the user interface 123, the text data preprocessing unit 124, the token weighting processing unit 125 and the weight database 126 So that the preprocessing can be performed.

사내 보안 시스템 환경은 구축 환경에 따라 다르므로, 보안 이벤트를 자동으로 검증하는 검증 시스템은 기존 관제 요원의 분석 결과를 활용한 학습이 진행되어야 한다. 초기 학습을 위한 데이터인 기 검증 보안 이벤트(110)는 기존 보안 관제 요원에 의해 분석된 보안 이벤트와 학습을 위해서 정탐 및 오탐에 관한 정보를 포함한다.Since the internal security system environment differs depending on the construction environment, the verification system that automatically verifies security events should be taught using the analysis results of existing control personnel. The initial verification security event 110, which is data for initial learning, includes security events analyzed by existing security control personnel and information about spoofing and false alarms for learning.

이벤트 유형 분류부(121)는 기 검증 보안 이벤트(110) 및 신규 생성 보안 이벤트(140)를 수신하면 보안 이벤트의 유형을 분류하고 보안 이벤트를 인코딩한다.The event type classification unit 121 classifies the types of security events and encodes the security events upon receipt of the primitive security event 110 and the newly generated security event 140. [

이벤트 유형 분류부(121)는 중요 IP 및 Port 등의 정보를 활용하여, 해당 보안 이벤트 내 포함 여부 확인할 수 있다. 즉, 이벤트 유형 분류부(121)는 네크워크 트레픽의 Payload 데이터를 사람이 인지 가능한 형태로 변환하며, 텍스트 데이터 분석이 가능하도록 할 수 있다. 이때, 초기 자동 검증 모델을 만들기 위한 학습용 데이터인 기 검증 보안 이벤트(110)는 수집 대상 보안 시스템 및 Blacklist IP, Port, Event ID 등의 기초 정보를 활용하여, 유형 별로 분류되어야 한다.The event type classifying unit 121 can check whether the security event is included in the security event by using information such as an important IP and a port. That is, the event type classifying unit 121 may convert the payload data of the network traffic into a human-recognizable form, and enable text data analysis. At this time, the initial verification security event 110, which is training data for creating the initial automatic verification model, should be classified according to types, using basic information such as a collection target security system and Blacklist IP, Port, and Event ID.

필요 데이터 추출부(122)는 인코딩된 보안 이벤트에서 분류된 유형에 대응하는 정규식을 기반으로 데이터를 추출한다. 이때, 대응하는 정규식이 존재하지 않으면, 사용자 인터페이스(123)를 통해서 대응하는 정규식을 생성할 수 있다.The necessary data extracting unit 122 extracts data based on the regular expression corresponding to the type classified in the encoded security event. At this time, if the corresponding regular expression does not exist, the corresponding regular expression can be generated through the user interface 123. [

텍스트 데이터 전처리부(124)는 보안 이벤트 별 추출된 텍스트 데이터를 분석 가능한 형태로 변환하는 기능을 수행한다. 보다 구체적으로, 텍스트 데이터 전처리부(124)는 단어 또는 구분자 등의 불용어를 활용하여, 필요 데이터 추출부(122)를 통해서 추출된 데이터를 토큰 형태로 반환하여 토큰화한다.The text data preprocessing unit 124 converts the text data extracted for each security event into a form that can be analyzed. More specifically, the text data preprocessing unit 124 uses tokenized words such as words or delimiters to return the data extracted through the necessary data extracting unit 122 as a token to be tokenized.

텍스트 데이터 전처리부(124)는 3가지 보안 이벤트 사전을 통해서 추출된 데이터를 토큰 형태로 토큰화 할 수 있다. 이때, 3가지 보안 이벤트 사전 데이터 분석가를 통해서 학습을 위한 보안 이벤트 분석을 통해 사전에 생성될 수 있으며, 3가지 보안 이벤트 사전의 종류는 보안 이벤트 영문 사전, 보안 이벤트 국문 사전 및 보안 이벤트 데이터 사전이다.The text data preprocessing unit 124 can tokenize the extracted data through the three security event dictionaries in a token form. At this time, it can be generated in advance through security event analysis for learning through three security event dictionary data analysts. The types of the three security event dictionaries are security English dictionary, security event dictionary, and security event data dictionary.

보안 이벤트 영문 사전은 보안 이벤트에서 영문으로 분류되는 데이터를 사전화한 것이고, 보안 이벤트 국문 사전은 보안 이벤트에서 국문으로 분류되는 데이터를 사전화한 것이고, 보안 이벤트 데이터 사전은 보안 이벤트에서 국문 또는 영문으로 분류되지 않는 로우(RAW) 데이터를 분석화 하여 사전화한 것입니다.Security event English dictionary is a dictionary of data classified in English in security event. Security event Korean dictionary is a dictionary of data classified in Korean language in security event. Security event data dictionary is a dictionary in Korean or English Uncategorized raw (RAW) data is analyzed and prearranged.

텍스트 데이터 전처리부(124)는 보안 이벤트 영문 사전, 보안 이벤트 국문 사전 및 보안 이벤트 데이터 사전 순으로, 보안 이벤트에 적용하여 토큰화를 수행하고, 토큰화 절차에 해당되지 않는 잔여 항목이 발생할 경우 분석을 통해 보안 이벤트 각 사전에 추가할 수 있다.The text data preprocessing unit 124 applies the tokenization to the security events in the order of the security event English dictionary, the security event language dictionary, and the security event data dictionary, and performs analysis when a residual item that does not correspond to the tokenization procedure occurs Through security events can be added to each dictionary.

토큰 가중치 처리부(125)는 토큰화된 데이터의 토큰 가중치를 연산하고, 토큰에 대응하는 가중치를 가중치 데이터베이스(126)에 저장하고, 신규 보안 이벤트의 토큰에 가중치를 매핑하여 토큰 가중치의 행렬 형태의 전처리된 데이터를 생성한다. 이때, 학습 데이터에 해당하는 수집된 보안 이벤트의 전처리된 데이터는 정탐 또는 오탐 여부의 결과값을 포함한다.The token weight processing unit 125 calculates the token weights of the tokenized data, stores the weights corresponding to the tokens in the weight database 126, maps the weights to the tokens of the new security events, and preprocesses the matrix form of the token weights Lt; / RTI > At this time, the preprocessed data of the collected security event corresponding to the learning data includes the result of the spoof or false alarm.

그리고, 토큰 가중치는 특정 유형의 보안 이벤트 그룹에서의 해당 토큰의 중요도를 의미하며, 토큰 가중치는 기설정된 유형에서의 해당 토큰의 수를 전체 이벤트에서 해당 토큰의 수로 나눈 값으로 연산될 수 있다The token weights mean the importance of corresponding tokens in a particular type of security event group, and the token weights can be calculated by dividing the number of corresponding tokens in the predetermined type by the number of tokens in the overall event

즉, 토큰 가중치는 각 유형 별로 계산되며, (0≤ Weight ≤1)의 값을 갖는다. That is, the token weights are calculated for each type and have a value of (0? Weight? 1).

보안 이벤트는 다음의 <표 1>과 같이 토큰 가중치의 행렬 형태의 전처리된 데이터로 변환된다.Security events are transformed into preprocessed data in the form of a matrix of token weights as shown in Table 1 below.

[표 1][Table 1]

<표 1>은 i - 번째 보안 이벤트 유형에서의 입력 데이터를 나타내고, <표 1>를 살펴보면 특정 이벤트에서의 Token_k 가 있었을 경우에만, 가중치를 부여됨을 확인할 수 있다.Table 1 shows the input data for the i - th security event type. Table 1 shows that only the Token_k in a specific event is weighted.

학습 및 분류 장치(130)는 알고리즘 모델 학습부(131), 유효성 검증부(132) 및 자동 검증부(133)를 통해서 보안 이벤트를 학습하고 분류할 수 있다.The learning and classification device 130 can learn and classify security events through the algorithm model learning unit 131, the validity verification unit 132, and the automatic verification unit 133. [

알고리즘 모델 학습부(131)는 보안 이벤트 별 토큰-가중치 데이터 및 관제 요원의 검증 결과(정탐 또는 오탐)를 활용하고, 텍스트 마이닝 방법론 기반 복수 개의 분류기 알고리즘 기반으로 학습을 진행하여 검증 모델을 생성한다. 이때, 학습 알고리즘은 의사결정나무(Decision Tree), 서포트 벡터 머신(Support Vector Machine, SVM), 신경망 분석(Neural Network Analysis), 로지스틱회귀분석(Logistic Regression), 베이지안방법론(Bayesian) 등에서 적어도 3개의 분류 알고리즘을 기반하여 적어도 3개의 검증 모델을 생성한다.The algorithm model learning unit 131 generates the verification model by using the token-weight data for each security event and the verification result of the control personnel (spying or false positives) and learning based on a plurality of classifier algorithms based on the text mining methodology. At this time, the learning algorithms are classified into at least three classes in Decision Tree, Support Vector Machine (SVM), Neural Network Analysis, Logistic Regression, Bayesian Methodology (Bayesian) At least three verification models are generated based on the algorithm.

알고리즘 모델 학습부(131)는 유효성 검증부(132)를 통해서 유효성을 확인하고 적어도 3개 이상의 홀수개의 자동 검증 모델을 선택한다. 이때, 유효성 검증부(132)는 생성된 검증 모델에 교차검증법(Cross Validation)을 수행하여 기설정된 허용 오차 범위에 포함되는지 확인하여 유효성을 확인할 수 있다.The algorithm model learning unit 131 confirms the validity through the validity verification unit 132 and selects at least three odd number of automatic verification models. At this time, the validity verification unit 132 may check the validity by checking whether the generated verification model is included in a predetermined tolerance range by performing a cross validation method.

그리고, 알고리즘 모델 학습부(131)는 유효성 확인결과 생성된 검증 모델 각각이 상기 기설정된 허용 오차 범위 미만이면, 기 적용된 검증 모델과의 정확도 평가를 진행하여 더 우수한 정확도를 가진 모델을 자동 검증 모델로 선택한다. 그리고, 알고리즘 모델 학습부(131)는 확인결과 생성된 검증 모델이 기설정된 허용 오차 범위 이상이면, 알고리즘 별로 설정값을 변경하여 재학습을 수행하거나 또는 기 적용된 검증 모델을 자동 검증 모델로 유지할 수 있다.If each of the verification models generated as a result of the validation check is less than the predetermined tolerance range, the algorithm model learning unit 131 proceeds to evaluate the accuracy with the verification model applied to the model, Select. If the verification model generated as a result of checking is equal to or greater than the predetermined tolerance range, the algorithm model learning unit 131 can re-learn by changing the setting value for each algorithm, or maintain the applied verification model as an automatic verification model .

자동 검증부(133)는 유효성이 확인된 적어도 3개 이상의 홀수개의 자동 검증 모델들을 적용하여 신규 생성 보안 이벤트(140)를 검증한다. 보다 구체적으로, 자동 검증부(133)는 전처리 장치(120)를 이용해서 신규 생성된 보안 이벤트를 전처리 과정과 동일하게 토큰화하고, 토근화된 신규 생성된 보안 이벤트의 가중치를 가중치 데이터베이스(126)를 이용해서 가중치 매핑하고, 행렬 형태의 신규 생성된 보안 이벤트를 선택된 검증 모델들에 대입하여 이벤트 별로 정탐 또는 오탐 여부를 검증한다.The automatic verifying unit 133 verifies the newly generated security event 140 by applying at least three valid verification validity models. More specifically, the automatic verification unit 133 tokenizes the newly generated security event using the preprocessing unit 120 in the same manner as the preprocessing process, and adds the weight of the newly created security event to the weight database 126. [ And assigns the newly generated security event in the form of a matrix to the selected verification models to verify whether spoofing or false alarm is generated for each event.

자동 검증부(133)는 자동 검증된 보안 이벤트에 대해서 최종적으로 다수 결과로 분류된 결과값을 반환된다. 예를 들어, 3개의 검증 모델(학습 알고리즘)이 적용되어 특정 이벤트를 2개 검증 모델에서는 오탐, 1개 모델에서는 정탐으로 나올 경우, 최종적으로 다수 선택 기준으로 해당 이벤트를 오탐으로 분류한다. 다른 예로, 5개의 검증 모델(학습 알고리즘)이 적용되어 특정 이벤트를 3개 검증 모델에서는 정탐, 2개 모델에서는 오탐으로 나올 경우, 최종적으로 다수 선택 기준으로 해당 이벤트를 정탐으로 분류한다.The automatic verification unit 133 finally returns a result value classified into a plurality of results for the automatically verified security event. For example, if three validation models (learning algorithms) are applied to a particular event as a false positive in two validation models and as a spam in one model, the event is finally classified as a false positive based on a multiple selection criterion. In another example, five verification models (learning algorithms) are applied to classify events as spam in the event that spam is triggered in three verification models and false in two models.

그리고, 자동 검증부(133)는 사용자 UI(150)를 이용해서 검증 결과를 사용자에게 제공할 수 있다. 또한, 자동 검증부(133)는 선택된 검증 모델들에 대응하는 검증 결과를 보안 관제 요원에 의해서 확인받고, 상기 보안 관제 요원에 의해서 확인된 보안 이벤트를 학습 데이터에 추가하여 재학습하도록 할 수 있다.Then, the automatic verification unit 133 can use the user UI 150 to provide the verification result to the user. Further, the automatic verification unit 133 may confirm the verification result corresponding to the selected verification models by the security control agent, and may re-learn the security event added to the learning data by the security control agent.

본 발명의 검증 시스템은 다음의 요구 성능을 필요로 한다.The verification system of the present invention requires the following required performance.

데이터 추출 및 토큰화 등의 전처리 과정 중 원본 데이터(보안 이벤트)는 훼손되지 않아야 한다.During the preprocessing process such as data extraction and tokenization, the original data (security events) should not be tampered with.

검증된 보안 이벤트를 활용한 모델 학습 과정 외에는 적용된 검증 모델 및 토큰-가중치 사전의 변경이 발생되지 않아야 한다.No changes should be made to the applied verification model and token-weighted dictionary other than the model learning process using proven security events.

신규 발생 보안 이벤트는 적용된 검증 모델을 바탕으로 정탐 또는 오탐 여부를 분류하여야 한다New security events should be classified based on the applied verification model.

보안 이벤트의 유형 분류 중 기존 정규식이 적용되어 있지 않은 신규 유형의 이벤트는 유사한 이벤트 별로 그룹화하여, 사용자가 확인 및 정규식을 생성·등록 할 수 있도록 사용자 인터페이스에 제공되어야 한다.During the type classification of security events, new types of events that do not have existing regular expressions applied should be grouped by similar events and provided to the user interface so that the user can create and register validation and regular expressions.

정규식 기반 추출된 데이터는 텍스트 마이닝 방법론 기반 토큰-가중치 사전을 구성 또는 매핑되어, 행렬 형태로 반환되어야 한다.Regular expression-based extracted data must be constructed or mapped to a text-mining methodology-based token-weighted dictionary and returned in a matrix form.

유효성 검증 결과 통과되지 않은 모델의 경우, 알고리즘 별 설정값을 사용자가 변경할 수 있도록 해야 하며, 신규 검증 데이터 추가로 인한 재학습에도 설정 사항을 반영하여 학습이 진행되어야 한다.In case of the model which has not passed the validation result, the user should be able to change the setting value by the algorithm. Also, the learning should be reflected in the re-learning due to addition of new verification data.

신규 검증 데이터 추가로 인한 재학습 모델의 정확도가 이전 모델의 정확도보다 낮을 경우, 설정값 변경 또는 이전 모델 유지가 이뤄져야 한다.If the accuracy of the re-learning model due to the addition of the new verification data is lower than that of the previous model, the setting value should be changed or the previous model should be maintained.

이하, 상기와 같이 구성된 본 발명에 따른 텍스트 마이닝을 기반으로 보안 이벤트를 검증하는 방법을 아래에서 도면을 참조하여 설명한다.Hereinafter, a method for verifying a security event based on text mining according to the present invention will be described with reference to the drawings.

도 2는 본 발명의 일 실시 예에 따라 텍스트 마이닝을 기반으로 보안 이벤트를 검증하는 개략적인 과정을 도시한 흐름도이다.FIG. 2 is a flowchart illustrating a process of verifying security events based on text mining according to an embodiment of the present invention. Referring to FIG.

도 2를 참조하면, 본 발명의 검증 시스템은 텍스트 마이닝을 기반으로 학습 데이터에 해당하는 수집된 보안 이벤트를 전처리한다(210). 학습 데이터에 해당하는 수집된 보안 이벤트를 전처리하는 보다 구체적인 과정을 아래 도 3을 참조해서 후술한다.Referring to FIG. 2, the verification system of the present invention pre-processes collected security events corresponding to learning data based on text mining (210). A more specific process of preprocessing the collected security events corresponding to the learning data will be described below with reference to FIG.

그리고, 검증 시스템은 전처리된 데이터를 이용해서 학습하여 검증 모델을 생성한다(220). 이때, 검증 시스템은 의사결정나무(Decision Tree), 서포트 벡터 머신(Support Vector Machine, SVM), 신경망 분석(Neural Network Analysis), 로지스틱회귀분석(Logistic Regression), 베이지안방법론(Bayesian) 등에서 적어도 3개의 분류 알고리즘을 기반으로 학습 진행하여 검증 모델을 생성할 수 있다.Then, the verification system generates the verification model by using the preprocessed data (220). At this time, the verification system can be classified into at least three classification methods such as Decision Tree, Support Vector Machine (SVM), Neural Network Analysis, Logistic Regression, Bayesian Methodology (Bayesian) The verification model can be generated by learning based on the algorithm.

그리고, 검증 시스템은 생성된 검증 모델의 유효성을 평가하고 적용할 검증 모델들을 선택한다(230). 검증 모델의 유효성을 평가하고 적용할 검증 모델들을 선택하는 보다 구체적인 과정을 아래 도 4를 참조해서 후술한다.Then, the verification system evaluates the validity of the generated verification model and selects the verification models to be applied (230). A more specific process of evaluating the validity of the verification model and selecting the verification models to be applied will be described below with reference to FIG.

그리고, 검증 시스템은 신규 생성된 보안 이벤트를 선택된 검증 모델들에 대입하여 검증한다(240). 신규 생성된 보안 이벤트를 자동으로 검증하는 보다 구체적인 과정을 아래 도 5를 참조해서 후술한다.The verification system then verifies 240 the new generated security event by inserting it into the selected verification models. A more specific process of automatically verifying a newly generated security event will be described below with reference to FIG.

도 3은 본 발명의 일 실시 예에 따라 학습 데이터를 전처리하는 과정을 도시한 흐름도이다.3 is a flowchart illustrating a process of preprocessing learning data according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 검증 시스템은 학습 데이터를 전처리하기 위해서 텍스트 형식의 반정형의 보안 이벤트의 유형에 대응하는 정규식을 기반으로 데이터를 추출한다(310). 이때, 310단계에서 텍스트 형식의 반정형의 보안 이벤트의 유형에 대응하는 정규식이 존재하지 않으면, 사용자 인터페이스를 통해서 정규식을 생성할 수 있다.Referring to FIG. 3, the verification system of the present invention extracts data based on a regular expression corresponding to a type of a semi-structured security event in a text form in order to preprocess learning data (310). At this time, if there is no regular expression corresponding to the type of semi-structured security event of the text format in step 310, the regular expression can be generated through the user interface.

그리고, 검증 시스템은 추출된 데이터를 토큰화하고, 유형별 토큰 가중치를 연산하고 매핑한다(320). 이때, 토큰 가중치는 기설정된 유형에서의 해당 토큰의 수를 전체 이벤트에서 해당 토큰의 수로 나눈 값이다.The verification system then tokenizes the extracted data, and computes and maps the token weights by type (320). The token weights are the number of corresponding tokens in the predefined type divided by the number of tokens in the overall event.

그리고, 검증 시스템은 토큰 가중치의 행렬 형태의 전처리된 데이터를 생성한다(330). 이때, 학습 데이터인 수집된 보안 이벤트의 전처리된 데이터는 정탐 또는 오탐 여부의 결과값을 포함한다.The verification system then generates 330 preprocessed data in the form of a matrix of token weights. At this time, the preprocessed data of the collected security event which is the learning data includes the result of the spoof or false alarm.

도 4는 본 발명의 일 실시 예에 따라 검증 모델들을 선택하는 과정을 도시한 흐름도이다.4 is a flowchart illustrating a process of selecting verification models according to an embodiment of the present invention.

도 4를 참조하면, 본 발명의 검증 시스템은 생성된 검증 모델에 교차검증법(Cross Validation)을 수행한다(410).Referring to FIG. 4, the verification system of the present invention performs cross validation (410) on the generated verification model.

그리고, 검증 시스템은 생성된 검증 모델이 기설정된 허용 오차 범위에 포함되는지 확인한다(420).Then, the verification system checks whether the generated verification model is included in the predetermined tolerance range (420).

그리고, 420단계의 확인결과 생성된 검증 모델이 기설정된 허용 오차 범위 미만이면, 검증 시스템은 기 적용된 검증 모델과의 정확도 평가를 진행하여 더 우수한 정확도를 가진 모델을 자동 검증 모델로 적어도 3개 이상 홀수개로 선택한다(430).If the verification model generated as a result of step 420 is less than the predetermined tolerance range, the verification system proceeds with the accuracy evaluation with the applied verification model to determine a model having better accuracy as an automatic verification model, (430).

420단계의 확인결과 생성된 검증 모델이 기설정된 허용 오차 범위 이상이면, 검증 시스템은 알고리즘 별로 설정값을 변경하여 재학습을 수행하거나 또는 기 적용된 검증 모델을 자동 검증 모델로 유지한다(440).If the verification model generated as a result of step 420 is greater than or equal to the predetermined tolerance range, the verification system changes re-learning by changing the setting value for each algorithm or maintains the applied verification model as an automatic verification model (step 440).

도 5는 본 발명의 일 실시 예에 따라 신규 생성된 보안 이벤트를 선택된 검증 모델에 대입하여 검증하는 과정을 도시한 흐름도이다.5 is a flowchart illustrating a process of inserting a newly generated security event into a selected verification model and verifying the selected security event according to an embodiment of the present invention.

도 5를 참조하면, 본 발명의 검증 시스템은 신규 생성된 보안 이벤트를 전처리 과정과 동일하게 토큰화한다(510).Referring to FIG. 5, the verification system of the present invention tokenizes a newly generated security event in the same manner as the preprocessing process (510).

그리고, 검증 시스템은 토근화된 신규 생성된 보안 이벤트의 가중치를 가중치 데이터베이스를 이용해서 가중치 매핑한다(520).Then, the verification system performs weight mapping (520) using the weight database using the weights of newly created security events that have been inactivated.

그리고, 검증 시스템은 행렬 형태의 신규 생성된 보안 이벤트를 선택된 검증 모델들 각각에 대입하여 이벤트 별로 정탐 또는 오탐 여부를 검증한다(530).Then, the verification system substitutes the newly generated security events in a matrix form into each of the selected verification models, and verifies whether or not the spoofing or spoofing is performed for each event (530).

보다 구체적으로, 검증 시스템은 530단계에서 적어도 3개 이상의 홀수개의 상기 선택된 검증 모델들 각각에 상기 행렬 형태의 신규 생성된 보안 이벤트를 대입하여 상기 선택된 검증 모델들 별로 정탐인지 오탐인지 확인하고, 빈도가 더 높은 쪽으로 선택하여 정탐 또는 오탐 여부를 검증할 수 있다.More specifically, in step 530, the verification system substitutes the newly generated security events of the matrix form into at least three or more odd number of the selected verification models to check whether they are spoofed or false by the selected verification models, You can select a higher one to verify spoofing or false positives.

그리고, 검증 시스템은 검증 결과를 사용자에게 제공한다(540).The verification system then provides the verification results to the user (540).

그리고, 검증 시스템은 선택된 검증 모델에 대응하는 검증 결과를 보안 관제 요원에 의해서 확인 받는다(550).Then, the verification system verifies the verification result corresponding to the selected verification model by the security control agent (550).

그리고, 검증 시스템은 보안 관제 요원에 의해서 확인된 보안 이벤트를 학습 데이터에 추가하여 재학습 할 수 있다(560). 이때, 위의 540단계에서 560단계는 선택적으로 수행될 수 있다.The verification system may then re-learn 560 adding the security events identified by the security officer to the training data. In this case, steps 540 to 560 may be selectively performed.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 실시 예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. The apparatus and components described in the embodiments may be implemented, for example, as a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA) unit, a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be permanently or temporarily embodied in a transmitted signal wave. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시 예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시 예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시 예들이 비록 한정된 실시 예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

그러므로, 다른 구현들, 다른 실시 예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments and equivalents to the claims are within the scope of the following claims.

110; 기 검증 보안 이벤트
120; 전처리 장치
121; 이벤트 유형 분류부
122; 필요 데이터 추출부
123; 사용자 인터페이스
124; 텍스트 데이터 전처리부
125; 토큰 가중치 처리부
126; 가중치 데이터베이스
130; 학습 및 분류 장치
131; 알고리즘 모델 학습부
132; 유효성 검증부
133; 자동 검증부
140; 신규 생성 보안 이벤트
150; 사용자 UI110; Security verification event
120; Pretreatment device
121; Event type classification unit
122; The necessary data extracting unit
123; User interface
124; The text data preprocessing section
125; The token weighting processor
126; Weight database
130; Learning and classifying device
131; Algorithm model learning unit
132; The validation unit
133; Automatic verification unit
140; Newly created security event
150; User UI

Claims

보안 이벤트의 유형을 분류하고 상기 보안 이벤트를 인코딩하는 이벤트 유형 분류부;
상기 인코딩된 보안 이벤트에서 분류된 유형에 대응하는 정규식을 기반으로 데이터를 추출하는 필요 데이터 추출부;
추출된 데이터를 토큰화하는 텍스트 데이터 전처리부;
토큰화된 데이터의 토큰 가중치를 연산하고 매핑하는 토큰 가중치 처리부;
복수개의 학습 알고리즘을 적용하여 검증 모델을 생성하고, 자동 검증 모델을 적어도 3개 이상 홀수개로 선택하는 알고리즘 모델 학습부;
상기 검증 모델의 유효성이 충족되는지 확인하는 유효성 검증부; 및
유효성이 확인된 상기 자동 검증 모들들을 적용하여 신규 생성된 보안 이벤트를 검증하는 자동 검증부
를 포함하는 텍스트 마이닝을 기반으로 보안 이벤트를 검증하는 검증 시스템.
An event type classifier for classifying types of security events and encoding the security events;
A necessary data extracting unit for extracting data based on a regular expression corresponding to a type classified in the encoded security event;
A text data preprocessing unit for tokenizing the extracted data;
A token weighting unit for calculating and mapping token weights of tokenized data;
An algorithm model learning unit for generating a verification model by applying a plurality of learning algorithms and selecting at least three automatic verification models with an odd number;
A validity verifying unit for verifying whether validity of the verification model is satisfied; And
An automatic verification unit for verifying the newly generated security event by applying the validated verification models;
A verification system for verifying security events based on text mining,

텍스트 마이닝을 기반으로 학습 데이터에 해당하는 수집된 보안 이벤트를 전처리하는 단계;
전처리된 데이터를 이용해서 학습하여 검증 모델을 생성하는 단계;
상기 생성된 검증 모델의 유효성을 평가하고 적용할 검증 모델을 적어도 3개 이상 홀수개로 선택하는 단계; 및
신규 생성된 보안 이벤트를 상기 선택된 검증 모델들에 대입하여 검증하는 단계
를 포함하는 텍스트 마이닝을 기반으로 보안 이벤트를 검증하는 방법.
Pre-processing collected security events corresponding to learning data based on text mining;
Generating a verification model by learning using the preprocessed data;
Evaluating the validity of the generated verification model and selecting at least three verification models to be applied as an odd number; And
Inserting a newly generated security event into the selected verification models and verifying
A method for verifying security events based on text mining comprising:

제2항에 있어서,
상기 수집된 보안 이벤트를 전처리하는 단계는,
텍스트 형식의 반정형의 보안 이벤트의 유형에 대응하는 정규식을 기반으로 데이터를 추출하는 단계;
상기 추출된 데이터를 토큰화하고, 유형별 토큰 가중치를 연산하고 매핑하는 단계; 및
토큰 가중치의 행렬 형태의 전처리된 데이터를 생성하는 단계
를 포함하는 텍스트 마이닝을 기반으로 보안 이벤트를 검증하는 방법.
3. The method of claim 2,
Wherein the pre-processing of the collected security events comprises:
Extracting data based on a regular expression corresponding to a type of a semi-structured security event in a text format;
Tokenizing the extracted data, computing and mapping the token weights by type; And
Generating preprocessed data in the form of a matrix of token weights
A method for verifying security events based on text mining comprising:

제3항에 있어서,
상기 추출된 데이터를 토큰화하는 것은,
보안 이벤트 영문 사전, 보안 이벤트 국문 사전 및 보안 이벤트 데이터 사전 순으로, 상기 추출된 데이터에 적용하여 토큰화를 수행하는
텍스트 마이닝을 기반으로 보안 이벤트를 검증하는 방법.
The method of claim 3,
The tokenizing the extracted data may comprise:
The security event English dictionary, the security event English dictionary, and the security event data dictionary, to the extracted data to perform tokenization
How to validate security events based on text mining.

제3항에 있어서,
상기 토큰 가중치는,
기설정된 유형에서의 해당 토큰의 수를 전체 이벤트에서 해당 토큰의 수로 나눈 값인
텍스트 마이닝을 기반으로 보안 이벤트를 검증하는 방법.
The method of claim 3,
The token weights,
The number of corresponding tokens in the predefined type divided by the number of tokens in the overall event
How to validate security events based on text mining.

제3항에 있어서,
상기 수집된 보안 이벤트의 전처리된 데이터는,
정탐 또는 오탐 여부의 결과값을 포함하는
텍스트 마이닝을 기반으로 보안 이벤트를 검증하는 방법.
The method of claim 3,
The preprocessed data of the collected security events may be,
Including the result of spoofing or false positives
How to validate security events based on text mining.

제3항에 있어서,
상기 정규식을 기반으로 데이터를 추출하는 단계는,
상기 텍스트 형식의 반정형의 보안 이벤트의 유형에 대응하는 상기 정규식이 존재하지 않으면, 사용자 인터페이스를 통해서 정규식을 생성하는 단계
를 포함하는 텍스트 마이닝을 기반으로 보안 이벤트를 검증하는 방법.
The method of claim 3,
Wherein the extracting the data based on the regular expression includes:
If the regular expression corresponding to the type of the textual semi-structured security event does not exist, generating a regular expression through a user interface
A method for verifying security events based on text mining comprising:

제2항에 있어서,
상기 전처리된 데이터를 이용해서 학습하여 상기 검증 모델을 생성하는 단계는,
의사결정나무(Decision Tree), 서포트 벡터 머신(Support Vector Machine, SVM), 신경망 분석(Neural Network Analysis), 로지스틱회귀분석(Logistic Regression), 베이지안방법론(Bayesian) 중에서 적어도 3개의 분류 알고리즘을 기반으로 학습 진행하는
텍스트 마이닝을 기반으로 보안 이벤트를 검증하는 방법.
3. The method of claim 2,
Wherein the step of learning by using the preprocessed data to generate the verification model comprises:
Learning based on at least three classification algorithms: Decision Tree, Support Vector Machine (SVM), Neural Network Analysis, Logistic Regression, Bayesian Methodology (Bayesian) Ongoing
How to validate security events based on text mining.

제2항에 있어서,
상기 생성된 검증 모델의 유효성을 평가하고 적용할 검증 모델을 적어도 3개 이상 홀수개로 선택하는 단계는,
상기 생성된 검증 모델에 교차검증법(Cross Validation)을 수행하여 기설정된 허용 오차 범위에 포함되는지 확인하는 단계;
확인결과 상기 생성된 검증 모델이 상기 기설정된 허용 오차 범위 미만이면, 기 적용된 검증 모델과의 정확도 평가를 진행하여 더 우수한 정확도를 가진 모델을 자동 검증 모델로 적어도 3개 이상 홀수개로 선택하는 단계; 및
확인결과 상기 생성된 검증 모델이 상기 기설정된 허용 오차 범위 이상이면, 알고리즘 별로 설정값을 변경하여 재학습을 수행하거나 또는 상기 기 적용된 검증 모델을 자동 검증 모델로 유지하는 단계
를 포함하는 텍스트 마이닝을 기반으로 보안 이벤트를 검증하는 방법.
3. The method of claim 2,
And evaluating the validity of the generated verification model and selecting at least three verification models to be applied to an odd number,
Performing cross validation on the generated verification model to check whether the verification model is included in a predetermined tolerance range;
If the generated verification model is less than the predetermined tolerance range, proceeding with accuracy evaluation with the applied verification model to select at least three or more odd number of models with better accuracy as an automatic verification model; And
If the generated verification model is greater than or equal to the predetermined tolerance range, re-learning is performed by changing the set value for each algorithm, or the validated model is maintained as an automatic verification model
A method for verifying security events based on text mining comprising:

제2항에 있어서,
상기 신규 생성된 보안 이벤트를 상기 선택된 검증 모델들에 대입하여 검증하는 단계는,
상기 신규 생성된 보안 이벤트를 전처리 과정과 동일하게 토큰화하는 단계;
토근화된 신규 생성된 보안 이벤트의 가중치를 가중치 데이터베이스를 이용해서 가중치 매핑하는 단계; 및
행렬 형태의 신규 생성된 보안 이벤트를 상기 선택된 검증 모델들 각각에 대입하여 이벤트 별로 정탐 또는 오탐 여부를 검증하는 단계
를 포함하는 텍스트 마이닝을 기반으로 보안 이벤트를 검증하는 방법.
3. The method of claim 2,
Wherein the step of verifying and inserting the newly generated security event into the selected verification models comprises:
Tokenizing the newly generated security event in the same manner as the preprocessing step;
Weighting a weight of a newly created security event to be weighted using a weight database; And
A step of inserting a newly generated security event in a matrix form into each of the selected verification models to verify whether spoofing or false alarm is generated for each event
A method for verifying security events based on text mining comprising:

제10항에 있어서,
상기 행렬 형태의 신규 생성된 보안 이벤트를 상기 선택된 검증 모델들 각각에 대입하여 이벤트 별로 정탐 또는 오탐 여부를 검증하는 단계는,
적어도 3개 이상의 홀수개의 상기 선택된 검증 모델들 각각에 상기 행렬 형태의 신규 생성된 보안 이벤트를 대입하여 상기 선택된 검증 모델들 별로 정탐인지 오탐인지 확인하고, 빈도가 더 높은 쪽으로 선택하여 정탐 또는 오탐 여부를 검증하는
텍스트 마이닝을 기반으로 보안 이벤트를 검증하는 방법.
11. The method of claim 10,
Wherein the step of verifying whether spying or spoofing is performed for each event by assigning the newly generated security event of the matrix form to each of the selected verification models,
A new security event of the matrix form is substituted into each of the at least three or more odd number of selected verification models to check whether the selected verification models are spoofed or false by the selected verification models, Verifying
How to validate security events based on text mining.

제10항에 있어서,
상기 신규 생성된 보안 이벤트를 상기 선택된 검증 모델들에 대입하여 검증하는 단계는,
상기 검증 결과를 사용자에게 제공하는 단계;
상기 선택된 검증 모델들에 대응하는 상기 검증 결과를 보안 관제 요원에 의해서 확인받는 단계; 및
상기 보안 관제 요원에 의해서 확인된 보안 이벤트를 학습 데이터에 추가하여 재학습 하는 단계
를 더 포함하는 텍스트 마이닝을 기반으로 보안 이벤트를 검증하는 방법.
11. The method of claim 10,
Wherein the step of verifying and inserting the newly generated security event into the selected verification models comprises:
Providing the verification result to a user;
Receiving the verification results corresponding to the selected verification models by a security control agent; And
Adding the security event identified by the security control agent to the learning data and re-learning
The method comprising the steps of:

제2항 내지 제11항 중 어느 한 항의 방법을 실행하기 위한 프로그램이 기록되어 있는 것을 특징으로 하는 컴퓨터에서 판독 가능한 기록 매체.A computer-readable recording medium having recorded therein a program for executing the method according to any one of claims 2 to 11.