KR20210083936A

KR20210083936A - System for collecting cyber threat information

Info

Publication number: KR20210083936A
Application number: KR1020190176736A
Authority: KR
Inventors: 강지훈; 김민석; 윤건수
Original assignee: 주식회사 디플랫폼
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2021-07-07

Abstract

Provided is a cyber threat information collection system including: an information collection part monitoring a plurality of collection channels providing cyber threat information in real time, and collecting a situation propagation text, software vulnerability information, invasion incident information, malignant code information, hacking domain information, zombie IP sent email information, a vulnerability analysis report, a malignant code analysis report and an invasion analysis report; and an integrated management part extracting an invasion index including an IP, a domain, a hash, an email, and a security vulnerability code by processing the variety of information collected through the information collection part, in a natural language, and classifying types of the variety of information collected through the information collection part, based on the invasion index. Therefore, the present invention is capable of flexibly handling a newly added or deleted collection channel.

Description

사이버 위협정보 수집 시스템{SYSTEM FOR COLLECTING CYBER THREAT INFORMATION}Cyber threat information collection system {SYSTEM FOR COLLECTING CYBER THREAT INFORMATION}

본 발명은 사이버 위협정보 수집 시스템에 관한 것으로, 특히 크롤링 및 API를 이용한 사이버 위협정보 수집 시스템에 관한 것이다.The present invention relates to a cyber threat information collection system, and more particularly, to a cyber threat information collection system using crawling and API.

사이버 위협정보(Cyber Threat Information, CTI)는 지속적으로 증가하고 지능화되는 침해사고의 위협에서 잠재적인 악의적 활동을 식별할 수 있는 지표로 활용되고 있는 데이터이다.Cyber threat information (CTI) is data that is being used as an indicator to identify potential malicious activity in the threat of intrusion incidents that are constantly increasing and becoming more intelligent.

지능형 사이버 위협 대응(Cyber Threat Intelligence)은 내부 조직뿐만 아니라 외부 조직 등에서 발생한 위협들을 수집 및 분석하여, 내부의 보안 시스템에도 이를 통합, 적용시키는 것을 의미한다. 현재의 국내/외 사이버 위협은 지속적으로 증가하고 지능화되고 있으며, 체계적이고 조직적인 위협으로 발전하고 있다. Cyber Threat Intelligence means to collect and analyze threats from external as well as internal organizations, and integrate and apply them to internal security systems. The current domestic and foreign cyber threats are continuously increasing and becoming more intelligent, and they are developing into systematic and systematic threats.

이에 따라, 지능형 사이버 위협 대응을 위해 다양한 수집 채널에서 수집되는 데이터를 안정적으로 수집하고, 새롭게 추가되거나 삭제되는 수집 채널에 유연하게 대응할 수 있는 기술이 요구된다.Accordingly, there is a need for a technology capable of stably collecting data collected from various collection channels and flexibly responding to newly added or deleted collection channels in order to respond to intelligent cyber threats.

본 발명이 이루고자 하는 기술적 과제는 지능형 사이버 위협 대응을 위해 다양한 수집 채널에서 수집되는 데이터를 안정적으로 수집하고, 새롭게 추가되거나 삭제되는 수집 채널에 유연하게 대응할 수 있는 사이버 위협정보 수집 시스템을 제공하는 것이다.An object of the present invention is to provide a cyber threat information collection system capable of stably collecting data collected from various collection channels for intelligent cyber threat response and flexibly responding to newly added or deleted collection channels.

한 실시예에 따르면, 사이버 위협정보를 수집하는 시스템이 제공된다. 상기 사이버 위협정보 수집 시스템은 상기 사이버 위협정보를 제공하는 복수의 수집 채널을 실시간 모니터링하고, 상기 복수의 수집 채널로부터 상황전파문, 소프트웨어 취약점 정보, 침해사고 정보, 악성코드 정보, 해킹 도메인 정보, 악성 도메인 정보, 좀비 IP 발송 이메일 정보, 취약점 분석 보고서, 악성코드 분석 보고서, 및 침해사고 분석 보고서를 수집하는 정보 수집부, 그리고 상기 정보 수집부를 통해 수집된 각종 정보를 자연어 처리하여 IP, 도메인, 해시, 이메일, 보안 취약점 코드를 포함하는 침해지표를 추출하고, 상기 침해지표에 기반하여 상기 정보 수집부를 통해 수집된 각종 정보의 유형을 분류하는 통합 관리부를 포함한다.According to one embodiment, a system for collecting cyber threat information is provided. The cyber threat information collection system monitors a plurality of collection channels providing the cyber threat information in real time, and from the plurality of collection channels, situation propagation message, software vulnerability information, infringement accident information, malicious code information, hacking domain information, malicious code An information collection unit that collects domain information, zombie IP sending email information, vulnerability analysis report, malicious code analysis report, and infringement accident analysis report, and natural language processing of various information collected through the information collection unit to obtain IP, domain, hash, and an integrated management unit for extracting a breach index including an email and a security vulnerability code, and classifying the types of information collected through the information collection unit based on the breach index.

상기 정보 수집부는, 입력 데이터 없이 사이버 위협정보를 제공하는 수집 채널로부터, 데이터를 수집하는 일반 수집 모듈, 및 입력 데이터가 수신되면 입력 데이터에 연관된 사이버 위협정보를 제공하는 수집 채널로부터, 데이터를 수집하는 재귀 조회 수집 모듈을 포함할 수 있다.The information collection unit collects data from a collection channel that provides cyber threat information without input data, a general collection module that collects data, and a collection channel that provides cyber threat information related to input data when input data is received. It may include a recursive query collection module.

상기 통합 관리부는, 상기 정보 수집부를 통해 수집된 각종 정보에서 텍스트를 추출하고, 상기 텍스트를 문장 단위로 분리하며, 분리된 문장에서 토큰을 추출하며, 상기 토큰을 이용하여 텍스트 정규화를 수행하며, 상기 토큰에 품사정보를 할당하는 전처리 모듈, 상기 토큰을 바탕으로 고유명사를 분류하는 개체명 인식 모듈, 및 선행 단어 및 구를 현재 단어 및 구와 비교하여 동일한 개체인지 결정하는 상호참조 분석을 수행하며, 단어들 간의 의존관계 분석을 수행하는 상호참조 분석 모듈을 포함할 수 있다.The integrated management unit extracts text from various information collected through the information collection unit, separates the text into sentence units, extracts a token from the separated sentence, performs text normalization using the token, and A preprocessing module for allocating part-of-speech information to a token, an entity name recognition module for classifying proper nouns based on the token, and cross-referencing analysis to determine whether a word is the same entity by comparing preceding words and phrases with current words and phrases, It may include a cross-reference analysis module for performing dependency analysis between the two.

상기 통합 관리부는, 상기 상호참조 분석 및 의존관계 분석을 통해 상기 침해지표, 공격 전술 및 도구, 공격 기법 및 공격 절차(TTPs)를 추출하며, 상기 침해지표 및 상기 공격 전술 및 도구, 공격 기법 및 공격 절차를 설명하는 메타 데이터를 추출하는 의미 분석 모듈을 더 포함할 수 있다.The integrated management unit extracts the breach index, attack tactics and tools, attack techniques and attack procedures (TTPs) through the cross-reference analysis and dependency analysis, and the intrusion index and the attack tactics and tools, attack techniques and attacks It may further include a semantic analysis module for extracting metadata describing the procedure.

상기 통합 관리부는, 네트워크 통신, 내부 및 외부 인터페이스를 관리하는 인터페이스 관리 모듈, 및 미리 저장된 관리 테이블 모델에 기반하여 상기 복수의 수집 채널을 관리하고, 상기 복수의 수집 채널에게 미리 설정된 환경설정 사항을 할당하는 데이터베이스 모델 관리 모듈을 포함할 수 있다.The integrated management unit manages the plurality of collection channels based on an interface management module for managing network communication, internal and external interfaces, and a pre-stored management table model, and assigns preset environment settings to the plurality of collection channels It may include a database model management module that

상기 통합 관리부는, 상기 통합 관리부에 등록된 전체 수집 채널 현황, 현재 사용 가능한 수집 채널 현황, 전체 유휴 수집 채널 현황, 및 작업을 할당받은 수집 채널의 작업완료 여부를 사용자 단말에게 송신하는 수집 채널 제어 모듈을 더 포함할 수 있다.The integrated management unit, the collection channel control module for transmitting the total collection channel status registered in the integrated management unit, the currently available collection channel status, the total idle collection channel status, and whether the task of the collection channel to which the task is assigned has been completed to the user terminal may further include.

상기 통합 관리부는, 미리 저장된 스케줄링 이벤트에 수집 채널의 스케줄을 추가 또는 삭제하는 스케줄러를 더 포함할 수 있다.The integrated management unit may further include a scheduler for adding or deleting a schedule of a collection channel to a pre-stored scheduling event.

상기 통합 관리부는, 상기 통합 관리부에 등록된 프로세스 정보를 초기화하고, 상기 복수의 수집 채널에서 발생하는 오류를 수집하며, 오류 정보를 데이터베이스에 저장하는 프로세스 관리 모듈을 더 포함할 수 있다.The integrated management unit may further include a process management module that initializes process information registered in the integrated management unit, collects errors occurring in the plurality of collection channels, and stores the error information in a database.

상기 인터페이스 관리 모듈은, 기간별, 자원별, 또는 채널별 정보 수집량에 대한 통계를 사용자 단말에게 송신할 수 있다.The interface management module may transmit statistics on the information collection amount for each period, resource, or channel to the user terminal.

지능형 사이버 위협 대응을 위해 다양한 수집 채널에서 수집되는 데이터를 안정적으로 수집하고, 새롭게 추가되거나 삭제되는 수집 채널에 유연하게 대응할 수 있다.Data collected from various collection channels can be reliably collected to respond to advanced cyber threats, and it can respond flexibly to newly added or deleted collection channels.

도 1은 한 실시예에 따른 사이버 위협정보 수집 시스템의 블록도이다.
도 2는 한 실시예에 따른 사이버 위협정보 수집 시스템의 정보 수집부의 블록도이다.
도 3은 한 실시예에 따른 사이버 위협정보 수집 시스템의 통합 관리부의 블록도이다.1 is a block diagram of a cyber threat information collection system according to an embodiment.
2 is a block diagram of an information collection unit of a cyber threat information collection system according to an embodiment.
3 is a block diagram of an integrated management unit of a cyber threat information collection system according to an exemplary embodiment.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those of ordinary skill in the art to which the present invention pertains can easily implement them. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated.

도 1을 참조하면, 한 실시예에 따른 사이버 위협정보 수집 시스템(100)은 정보 수집부(110), 통합 관리부(120), 및 데이터베이스(130)를 포함한다.Referring to FIG. 1 , the cyber threat information collection system 100 according to an embodiment includes an information collection unit 110 , an integrated management unit 120 , and a database 130 .

정보 수집부(110)는 사이버 위협정보(Cyber Threat Information, CTI)를 제공하는 복수의 수집 채널(10)을 실시간 모니터링하고, 복수의 수집 채널(10)로부터 상황전파문, 소프트웨어 취약점 정보, 침해사고 정보, 악성코드 정보, 해킹 도메인 정보, 악성 도메인 정보, 좀비 IP 발송 이메일 정보, 취약점 분석 보고서, 악성코드 분석 보고서, 침해사고 분석 보고서, 및 공개 출처 정보(Open Source INTelligence, OSINT)를 수집한다.The information collection unit 110 monitors a plurality of collection channels 10 providing cyber threat information (CTI) in real time, and a situational message, software vulnerability information, and infringement incident from the plurality of collection channels 10 . Information, malicious code information, hacking domain information, malicious domain information, zombie IP sending email information, vulnerability analysis report, malicious code analysis report, breach analysis report, and open source information (Open Source Intelligence, OSINT) are collected.

복수의 수집 채널(10)은 한 실시예로서, MC 파인더(Malicious Code-Finder), KISA 인터넷침해대응센터(KISC), Expoit-DB, 미국 취약점공유센터(NVD), OTX(Open Threat Exchange), MISP(Malware Information Sharing Platform & Threat Sharing), C-share, DNSBL, VirusShare, zone-h.org, malwaredomainlist.com, Threat Crowd, Cymon, EmergingThreats, Talos Intelligence, bambenek 중 적어도 하나를 포함할 수 있다.The plurality of collection channels 10 are, as an embodiment, MC Finder (Malicious Code-Finder), KISA Internet Infringement Response Center (KISC), Expoit-DB, US Vulnerability Sharing Center (NVD), OTX (Open Threat Exchange), It may include at least one of Malware Information Sharing Platform & Threat Sharing (MISP), C-share, DNSBL, VirusShare, zone-h.org, malwaredomainlist.com, Threat Crowd, Cymon, EmergingThreats, Talos Intelligence, bambenk.

정보 수집부(110)는 한 실시예로서, MC 파인더가 제공하는 정보를 파싱 및 저장할 수 있고, Expoit-DB로부터 소프트웨어(S/W) 취약점 정보를 수집, 파싱 및 저장할 수 있으며, NVD로부터 소프트웨어(S/W) 취약점 정보 및 보안 취약점 공동평가 시스템(CVSS) 정보를 파싱 및 저장할 수 있으며, OTX로부터 OTX 기반 침해사고 정보를 수집할 수 있으며, MISP로부터 MISP 기반 침해사고 정보를 수집할 수 있으며, Export API를 이용하여 C-share로부터 침해사고 정보를 수집, 파싱 및 저장할 수 있으며, VirusShare로부터 단일 악성코드 및 대량의 악성코드를 수집 및 저장할 수 있으며, zone-h.org로부터 해킹 도메인 정보를 수집, 파싱 및 저장할 수 있으며, malwaredomainlist.com로부터 악성도메인 정보를 수집, 파싱 및 저장할 수 있으며, 좀비IP 및 봇넷그룹으로부터 좀비IP 발송 이메일 정보를 수집 및 저장할 수 있으며, Threat Crowd로부터 악성코드 Hash, 도메인, IP, 등 피드(feed)로 제공되는 정보를 수집할 수 있으며, Cymon으로부터 블랙리스트(blacklist) IP 정보 및 연관 IP 정보를 수집할 수 있으며, EmergingThreats로부터 블랙리스트(blacklist) IP 정보를 수집할 수 있으며, Talos Intelligence로부터 취약점 분석 보고서를 수집할 수 있으며, bambenek으로부터 C&C IP 및 도메인 정보를 수집할 수 있다.As an embodiment, the information collection unit 110 may parse and store information provided by the MC finder, collect, parse, and store software (S/W) vulnerability information from Expoit-DB, and may collect, parse and store software (S/W) vulnerability information from NVD. S/W) Vulnerability information and Security Vulnerability Joint Assessment System (CVSS) information can be parsed and stored, OTX-based intrusion incident information can be collected from OTX, MISP-based incident information can be collected from MISP, and Export Using API, you can collect, parse, and store intrusion incident information from C-share, collect and store single malicious code and a large amount of malicious code from VirusShare, and collect and parse hacking domain information from zone-h.org It can collect, parse, and store malicious domain information from malwaredomainlist.com, collect and store zombie IP sending email information from a zombie IP and botnet group, and collect and store malicious code hash, domain, IP, and password information from Threat Crowd. You can collect information provided as a feed, etc., collect blacklist IP information and related IP information from Cymon, collect blacklist IP information from EmergingThreats, Talos Vulnerability analysis reports can be collected from Intelligence, and C&C IP and domain information can be collected from bambenk.

정보 수집부(110)는 한 실시예로서, DNS 및 PTR로부터 조회 대상 도메인과 연관 IP 및 도메인 정보를 수집할 수 있고(DNS 레코드를 조회할 수 있고), whois로부터 조회 대상 도메인에 대한 등록정보를 수집할 수 있으며, 악성코드 정적 행위 분석 채널에게 악성코드에 대한 정적 행위 분석 요청할 수 있고 악성코드 정적 행위 분석 채널로부터 분석 요청 결과를 수집할 수 있으며, 악성코드 유사도 분석 시스템 연동을 통해 유사 악성코드 조회 채널에게 유사 악성코드 분석 요청할 수 있고, 유사 악성코드 조회 채널로부터 분석 요청 결과를 수집할 수 있으며, VirusTotal에서 악성코드 Hash값 기반 악성코드를 조회할 수 있고 관련 정보를 수집할 수 있으며, OTX에서 Hash, IP, 도메인 기반 관련 정보를 수집할 수 있다.As an embodiment, the information collection unit 110 may collect IP and domain information related to the inquiry target domain from DNS and PTR (it may inquire a DNS record), and obtain registration information for the inquiry target domain from whois. It is possible to collect static behavior analysis for malicious code from the malicious code static behavior analysis channel, and collect analysis request results from the malicious code static behavior analysis channel, and search similar malicious codes through linking with the malicious code similarity analysis system. You can request analysis of similar malicious code from the channel, collect analysis request results from the similar malicious code inquiry channel, search for malicious code based on the malicious code hash value in VirusTotal, and collect related information. , IP, domain-based related information can be collected.

정보 수집부(110)는 크롤링(Crawling)기술 및 API 등을 활용하여 복수의 수집 채널 내 정보를 수집한다. 정보 수집부(110)는 작업 대상 및 순서 선정(Indexing) 단계, 작업 생성(Payload) 단계, 수집 수행(Parsing) 단계를 수행한다.The information collection unit 110 collects information in a plurality of collection channels by utilizing a crawling technology and API. The information collection unit 110 performs a task target and order selection (Indexing) step, a task creation (Payload) step, and a collection execution (Parsing) step.

정보 수집부(110)는 작업 대상 및 순서 선정(Indexing) 단계에서, 데이터 수집을 위한 작업 진행률(Events에서 설정된 단위 기준)을 파악하고, 수집할 작업량에 따른 분산처리를 수행한다. 작업 대상 및 순서 선정(Indexing) 단계에서 작업 대상은 수집 대상이 되는 웹 페이지의 URL, 웹 페이지 내 테이블의 페이지네이션(pagination) 및 라인번호, 특정 데이터 파일 경로 등이 될 수 있다.The information collection unit 110 detects the work progress rate (based on the unit set in Events) for data collection in the task target and order selection (Indexing) step, and performs distributed processing according to the amount of work to be collected. In the operation target and order selection (indexing) step, the operation target may be a URL of a web page to be collected, pagination and line number of a table in the web page, a specific data file path, and the like.

정보 수집부(110)는 작업 생성(Payload) 단계에서, 작업 대상 및 순서 선정(Indexing) 단계에서 생성된 인덱싱(Indexing)을 작업량에 따라 분할한다. The information collection unit 110 divides the indexing generated in the task creation (Payload) step, the task target and the order selection (Indexing) step according to the amount of work.

정보 수집부(110)는 수집 수행(Parsing) 단계에서, 정의된 수집 엔진(Crawling 등)을 통해 실제 데이터를 파싱한다. 정보 수집부(110)는 사전에 생성된 작업을 작업 생성 과정에서 sub-indexing으로 생성된 작업 대상으로 할당하여 처리한다. 정보 수집부(110)는 작업 오류 발생시 해당 작업 중이던 payload 및 indexing에 오류를 기록한다. 정보 수집부(110)는 Indexing 단위(Events 단위)로 데이터베이스(130)에 저장할 때마다 하나의 Event와 같이 Events, Correlation, Meta 정보 객체(JSON)를 생성하고, 하나의 Event를 저장하며, 이를 데이터베이스(Mongodb, bson)에 저장한다.The information collection unit 110 parses the actual data through a defined collection engine (eg, crawling) in the collection parsing step. The information collection unit 110 allocates and processes a previously generated job as a job target created by sub-indexing in the job creation process. The information collection unit 110 records the error in the payload and indexing that was in the work when a work error occurs. The information collection unit 110 generates Events, Correlation, and Meta information object (JSON) like one Event every time it is stored in the database 130 in an Indexing unit (Events unit), stores one Event, and stores it in the database. (Mongodb, bson)

도 2를 참조하면, 정보 수집부(110)는 일반 수집 모듈(111) 및 재귀 조회 수집 모듈(1120)을 포함할 수 있다.Referring to FIG. 2 , the information collection unit 110 may include a general collection module 111 and a recursive inquiry collection module 1120 .

일반 수집 모듈(111)은 입력 데이터 없이 사이버 위협정보를 제공하는 수집 채널로부터, 데이터를 수집한다. The general collection module 111 collects data from a collection channel that provides cyber threat information without input data.

일반 수집 모듈(111)은 한 실시예로서, 수집대상 채널의 특성을 고려하여 주기적 또는 상시적으로 데이터를 수집할 수 있다. 일반 수집 모듈(111)은 한 실시예로서, 사용자의 설정에 따라 병렬 처리 여부 및 작업 주기를 조절할 수 있다.As an embodiment, the general collection module 111 may periodically or permanently collect data in consideration of the characteristics of a collection target channel. The general collection module 111 is an embodiment, and may control whether parallel processing is performed and a work cycle according to a user's settings.

재귀 조회 수집 모듈(112)은 입력 데이터가 수신되면 입력 데이터에 연관된 사이버 위협정보를 제공하는 수집 채널로부터, 데이터를 수집한다.When input data is received, the recursive inquiry collection module 112 collects data from a collection channel that provides cyber threat information related to the input data.

재귀 조회 수집 모듈(112)은 한 실시예로서, 수집대상 채널의 특성을 고려하여 주기적 또는 상시적으로 데이터를 수집할 수 있다. 재귀 조회 수집 모듈(112)은 한 실시예로서, 사용자의 설정에 따라 병렬 처리 여부 및 작업 주기를 조절할 수 있다. 재귀 조회 수집 모듈(112)은 한 실시예로서, 수집 채널에 따라 제한사항(예, 일 단위 조회건수 제한 등)이 있는 경우 수집 주기를 조절할 수 있다.As an embodiment, the recursive inquiry collection module 112 may periodically or permanently collect data in consideration of the characteristics of a collection target channel. As an embodiment, the recursive inquiry collection module 112 may control whether to perform parallel processing and a work cycle according to a user's settings. As an embodiment, the recursive inquiry collection module 112 may adjust the collection period when there is a restriction (eg, limit on the number of inquiries per day, etc.) according to the collection channel.

통합 관리부(120)는 정보 수집부(110)를 통해 수집된 각종 정보를 자연어 처리하여 IP, 도메인, 해시, 이메일, 보안 취약점 코드를 포함하는 침해지표(Indicator of Compromise, IoC)를 추출하고, 침해지표에 기반하여 정보 수집부(110)를 통해 수집된 각종 정보의 유형을 분류한다.The integrated management unit 120 extracts an Indicator of Compromise (IoC) including IP, domain, hash, e-mail, and security vulnerability code by natural language processing of various information collected through the information collection unit 110, and infringes The types of various types of information collected through the information collection unit 110 are classified based on the index.

통합 관리부(120)는 확장된 사이버 위협정보 데이터베이스(Mongodb)를 구성하기 위해, 침해지표에 기반하여 데이터 유형에 따른 세부 분류를 정의한다.In order to configure the expanded cyber threat information database (Mongodb), the integrated management unit 120 defines detailed classification according to data types based on the breach index.

통합 관리부(120)는 OSINT를 통해 수집되는 사이버 위협정보를 과거 이력으로 간주한다. 사이버 위협정보를 과거 이력으로 보는 이유는 실제 침해사고 기준의 시점에서 보았을 때, OSINT 사이트상에서 실시간으로 게재되기 어렵기 때문이다.The integrated management unit 120 regards the cyber threat information collected through OSINT as a past history. The reason cyber threat information is viewed as a past history is that it is difficult to be published in real time on the OSINT site from the point of view of the actual incident.

과거 이력 정보(Events)는 시스템에서 동일 OSINT 사이트를 통해 동일시점에 수집되는 IoC 데이터들의 묶음 또는 그룹 단위로 표현될 수 있다. 한 실시예로서, 웹(Web) 상에서 특정 테이블 정보를 크롤링(Crawling)하면, 해당 테이블의 1줄(row)이 이벤트 단위 기준으로 생성될 수 있다.Past history information (Events) may be expressed in units of bundles or groups of IoC data collected at the same time through the same OSINT site in the system. As an embodiment, when specific table information is crawled on the Web, one row of the corresponding table may be generated on an event basis.

표 1은 정보 수집을 나타낸다.Table 1 shows the information collection.

'event_id'는 해당 과거 이력 정보(Events)를 저장할 데이터베이스(130)에서 관리 목적으로 사용되고, 각 데이터의 문자열 정보를 기반으로 고유화(Unique)하여 사용된다. 'ctime'과 'dtime'은 데이터 수집과 관련된 날짜/시간형태의 데이터이다. UTC 표기를 별도로 하지 않은 시간/날짜 형태의 데이터에 대해서는 Whois 질의를 통해 해당 서버 로컬 시간대역을 국내 기준 시간(UTC+09:00)으로 적용된다. 'channel' 과 'origin' 은 OSINT 사이트에 대한 분류이고, 채널(channel)은 해당 OSINT 사이트 domain name을 기준으로 구분될 수 있다. 오리진(origin)은 동일 채널 내에서 수집 방식 및 event 유형에 따라 분류하여 정의될 수 있다. 'ioc_list'는 해당 event를 통해 수집된 IoC에 대한 ID 정보이고, 사용자가 데이터베이스(130)에게 질의시 IoC 데이터 범주 내에서 검색을 시작하기 때문에 확장 및 편의를 위해 해당 정보가 포함될 수 있다. 동일한 유형 및 값을 가지는 데이터는 Events 내 document에서 'ioc_list' Key값을 통해 지속적으로 동일한 'collect_id'로 참조되며, 'label'의 값 중 'resource'에 해당하는 document(이 경우, 주로 검색 대상 자원)가 참조된 횟수를 통해 특정 자원에 대한 과거 수집이력을 확인할 있다.'event_id' is used for management purposes in the database 130 to store the corresponding past history information (Events), and is uniquely used based on string information of each data. 'ctime' and 'dtime' are date/time data related to data collection. For data in the form of time/date without UTC notation, the local time zone of the server is applied as the domestic standard time (UTC+09:00) through the Whois query. 'channel' and 'origin' are classifications for OSINT sites, and channels can be classified based on the OSINT site domain name. An origin may be defined by classifying according to a collection method and an event type within the same channel. 'ioc_list' is ID information about IoC collected through a corresponding event, and since a user starts a search within the IoC data category when a user queries the database 130, the corresponding information may be included for expansion and convenience. Data with the same type and value are continuously referred to as the same 'collect_id' through the 'ioc_list' key value in the document in Events, and the document corresponding to 'resource' among the values of 'label' (in this case, mainly the search target resource ), you can check the past collection history for a specific resource through the number of times it is referenced.

표 2는 주요 자원 및 속성 구분 예시를 나타낸다.Table 2 shows an example of major resource and attribute classification.

주요 자원과 속성을 구분하는 주요한 기준은 해당 자원을 침해지표와 같이 악성으로 볼 수 있는지 여부에 대한 기준으로서, 데이터베이스(130) 상에서 표 3과 같은 형태로 메타 수집(Meta Collection) 저장구조를 정의할 수 있다.The main criterion for classifying the main resource and the attribute is the criterion for whether the resource can be viewed as malicious, such as a violation index, and the meta collection storage structure can be defined in the form shown in Table 3 on the database 130. can

예를 들어, 통합 관리부(120)는 과거 이력 정보(Events)의 묶음 내에서 각 데이터를 표현하는 문자열이 IP인 경우 데이터 유형(Type)을 'ip'로 저장하고, 임의의 IP '192.168.0.1'와 같은 형태의 문자열 데이터인 경우 값(Value)으로 저장할 수 있다. For example, the integrated management unit 120 stores the data type as 'ip' when a string representing each data in the bundle of past history information (Events) is IP, and stores an arbitrary IP '192.168.0.1 In the case of string data in the form of ', it can be stored as a value.

사이버 위협정보 데이터베이스 저장 구조의 주된 목적은 자원들의 문자열 값 저장(Meta) 및 특정 IoC의 과거 이력들(Events)뿐만 아니라 여러 과거 이력들(Events)을 거치며 해당 IoC가 특정 시점의 Event 외에 기존에 Event에 없었던 새로운 IoC와의 연관관계를 생성하는 것이다. 이를 위해, IoC간의 연관관계 표현이 필요하며, 표 4와 같이 저장구조를 정의할 수 있다.The main purpose of the cyber threat information database storage structure is to store string values of resources (Meta) and to go through various past histories (Events) as well as past histories (Events) of a specific IoC. It is to create a relationship with a new IoC that was not in the previous one. For this, it is necessary to express the relationship between IoCs, and the storage structure can be defined as shown in Table 4.

도 3을 참조하면, 통합 관리부(120)는 전처리 모듈(121), 개체명 인식 모듈(122), 상호참조 분석 모듈(123), 및 의미 분석 모듈(124)을 포함할 수 있고, 통합 관리부(120)는 전처리 모듈(121), 개체명 인식 모듈(122), 상호참조 분석 모듈(123), 및 의미 분석 모듈(124)을 통해 자연어 처리를 수행할 수 있다.Referring to FIG. 3 , the integrated management unit 120 may include a preprocessing module 121 , an entity name recognition module 122 , a cross-reference analysis module 123 , and a semantic analysis module 124 , and the integrated management unit ( 120 may perform natural language processing through the preprocessing module 121 , the entity name recognition module 122 , the cross-reference analysis module 123 , and the semantic analysis module 124 .

전처리 모듈(121)은 정보 수집부(110)를 통해 수집된 각종 정보에서 텍스트를 추출하고, 텍스트를 문장 단위로 분리하며, 분리된 문장에서 토큰을 추출하며, 토큰을 이용하여 텍스트 정규화를 수행하며, 토큰에 품사정보를 할당한다. The preprocessing module 121 extracts text from various information collected through the information collection unit 110, separates the text into sentence units, extracts a token from the separated sentence, and performs text normalization using the token, , allocating part-of-speech information to the token.

구체적으로, 전처리 모듈(121)은 다양한 형태로 수집된 문서파일(예, PDF, DOC, HWP)에서 텍스트를 추출하고, 추출된 텍스트를 문장 단위로 분리하며, 분리된 문장을 분석하기 용이하도록 토큰 단위로 추출하며, 토큰을 일반적인 형태로 분석하여 단어수를 줄여 분석의 효율성을 높이는 텍스트 정규화를 수행하며, Decision Tree, Hidden Markov Models, Support Vector Machines 등의 알고리즘을 이용하여 토큰에 품사정보를 할당한다.Specifically, the pre-processing module 121 extracts text from document files (eg, PDF, DOC, HWP) collected in various forms, separates the extracted text into sentence units, and tokens to facilitate analysis of the separated sentences. Extracts in units, analyzes tokens in a general form, performs text normalization to increase the efficiency of analysis by reducing the number of words, and allocates part-of-speech information to tokens using algorithms such as Decision Tree, Hidden Markov Models, and Support Vector Machines .

개체명 인식 모듈(122)은 토큰을 바탕으로 인명, 지명 등의 고유명사를 분류한다.The entity name recognition module 122 classifies proper nouns such as a person's name and a place name based on the token.

상호참조 분석 모듈(123)은 선행 단어 및 구를 현재 단어 및 구와 비교하여 동일한 개체인지 결정하는 상호참조 분석을 수행하며, 단어들간의 의존관계 분석을 수행한다.The cross-reference analysis module 123 compares preceding words and phrases with current words and phrases to perform cross-reference analysis to determine whether they are the same entity, and performs dependency analysis between words.

의미 분석 모듈(124)은 상호참조 분석 및 의존관계 분석을 통해 침해지표(IoC), 공격 전술 및 도구, 공격 기법 및 공격 절차(TTPs)를 추출하며, 침해지표 및 공격 전술 및 도구, 공격 기법 및 공격 절차를 설명하는 메타 데이터(예, 시간 값, 공격그룹 명칭, 설명 등)를 추출한다. 의미 분석 모듈(124)은 추출된 침해지표(IoC), 공격 전술 및 도구, 공격 기법 및 공격 절차(TTPs), 및 메타 데이터를 연결하여 공격 패턴을 생성한다.The semantic analysis module 124 extracts indices of compromise (IoC), attack tactics and tools, attack techniques and attack procedures (TTPs) through cross-reference analysis and dependency analysis, and extracts indices and attack tactics and tools, attack techniques and Extracts meta data describing the attack procedure (eg time value, attack group name, description, etc.). The semantic analysis module 124 generates an attack pattern by connecting the extracted breach index (IoC), attack tactics and tools, attack techniques and attack procedures (TTPs), and metadata.

통합 관리부(120)는 한 실시예로서, 인터페이스 관리 모듈(125), 데이터베이스 모델 관리 모듈(126), 수집 채널 제어 모듈(127), 스케줄러(128), 및 프로세스 관리 모듈(129)을 포함할 수 있다.The integrated management unit 120 may include an interface management module 125 , a database model management module 126 , a collection channel control module 127 , a scheduler 128 , and a process management module 129 as an embodiment. have.

인터페이스 관리 모듈(125)은 네트워크 통신, 내부 및 외부 인터페이스를 관리한다.The interface management module 125 manages network communication, internal and external interfaces.

인터페이스 관리 모듈(125)은 기간별, 자원별, 또는 채널별 정보 수집량에 대한 통계를 사용자 단말(300)에게 송신한다.The interface management module 125 transmits statistics on the amount of information collected by period, resource, or channel to the user terminal 300 .

데이터베이스 모델 관리 모듈(126)은 미리 저장된 관리 테이블 모델에 기반하여 복수의 수집 채널(10)을 관리하고, 복수의 수집 채널(10)에게 미리 설정된 환경설정 사항을 할당한다. 환경설정 사항은 사용자에 의해 미리 설정될 수 있고, 통합 관리부(120)는 사용자 단말(300)로부터 미리 설정된 환경설정 사항을 수신한다.The database model management module 126 manages the plurality of collection channels 10 based on the previously stored management table model, and allocates preset environment settings to the plurality of collection channels 10 . The environment setting items may be preset by the user, and the integrated management unit 120 receives the preset environment setting items from the user terminal 300 .

수집 채널 제어 모듈(127)은 통합 관리부(120)에 등록된 전체 수집 채널 현황, 현재 사용 가능한 수집 채널 현황, 전체 유휴 수집 채널 현황, 및 작업을 할당받은 수집 채널의 작업완료 여부를 사용자 단말(300)에게 송신한다.The collection channel control module 127 displays the total collection channel status registered in the integrated management unit 120, the currently available collection channel status, the total idle collection channel status, and whether the job of the collection channel to which the job is assigned has been completed, the user terminal 300 ) is sent to

스케줄러(128)는 미리 저장된 스케줄링 이벤트에 수집 채널의 스케줄을 추가 또는 삭제한다.The scheduler 128 adds or deletes the schedule of the collection channel to the pre-stored scheduling event.

프로세스 관리 모듈(129)은 통합 관리부(120)에 등록된 프로세스 정보를 초기화하고, 복수의 수집 채널(10)에서 발생하는 오류를 수집하며, 오류 정보를 데이터베이스(130)에 저장한다.The process management module 129 initializes process information registered in the integrated management unit 120 , collects errors occurring in the plurality of collection channels 10 , and stores the error information in the database 130 .

데이터베이스(130)는 한 실시예로서, 정보 수집부(110)를 통해 수집된 각종 정보, 통합 관리부(120)를 통해 분류된 사이버 위협정보, 및 관리 테이블 모델을 저장할 수 있다.As an embodiment, the database 130 may store various types of information collected through the information collection unit 110 , cyber threat information classified through the integrated management unit 120 , and a management table model.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improved forms of the present invention are also provided by those skilled in the art using the basic concept of the present invention as defined in the following claims. is within the scope of the right.

Claims

사이버 위협정보를 수집하는 시스템으로서,
상기 사이버 위협정보를 제공하는 복수의 수집 채널을 실시간 모니터링하고, 상기 복수의 수집 채널로부터 상황전파문, 소프트웨어 취약점 정보, 침해사고 정보, 악성코드 정보, 해킹 도메인 정보, 악성 도메인 정보, 좀비 IP 발송 이메일 정보, 취약점 분석 보고서, 악성코드 분석 보고서, 및 침해사고 분석 보고서를 수집하는 정보 수집부, 그리고
상기 정보 수집부를 통해 수집된 각종 정보를 자연어 처리하여 IP, 도메인, 해시, 이메일, 보안 취약점 코드를 포함하는 침해지표를 추출하고, 상기 침해지표에 기반하여 상기 정보 수집부를 통해 수집된 각종 정보의 유형을 분류하는 통합 관리부
를 포함하는 사이버 위협정보 수집 시스템.A system for collecting cyber threat information, comprising:
A plurality of collection channels providing the cyber threat information are monitored in real time, and situation propagation messages, software vulnerability information, infringement accident information, malicious code information, hacking domain information, malicious domain information, and zombie IP are sent from the plurality of collection channels. an information collection unit that collects information, vulnerability analysis reports, malicious code analysis reports, and incident analysis reports; and
Natural language processing of various information collected through the information collection unit extracts a breach index including IP, domain, hash, email, and security vulnerability code, and the types of information collected through the information collection unit based on the breach index Integrated management department to classify
A cyber threat information collection system comprising a.

제1항에서,
상기 정보 수집부는,
입력 데이터 없이 사이버 위협정보를 제공하는 수집 채널로부터, 데이터를 수집하는 일반 수집 모듈, 및
입력 데이터가 수신되면 입력 데이터에 연관된 사이버 위협정보를 제공하는 수집 채널로부터, 데이터를 수집하는 재귀 조회 수집 모듈을 포함하는, 사이버 위협정보 수집 시스템.In claim 1,
The information collection unit,
A general collection module that collects data from a collection channel that provides cyber threat information without input data, and
A cyber threat information collection system, comprising a recursive inquiry collection module for collecting data from a collection channel that provides cyber threat information related to the input data when input data is received.

제1항에서,
상기 통합 관리부는,
상기 정보 수집부를 통해 수집된 각종 정보에서 텍스트를 추출하고, 상기 텍스트를 문장 단위로 분리하며, 분리된 문장에서 토큰을 추출하며, 상기 토큰을 이용하여 텍스트 정규화를 수행하며, 상기 토큰에 품사정보를 할당하는 전처리 모듈,
상기 토큰을 바탕으로 고유명사를 분류하는 개체명 인식 모듈, 및
선행 단어 및 구를 현재 단어 및 구와 비교하여 동일한 개체인지 결정하는 상호참조 분석을 수행하며, 단어들 간의 의존관계 분석을 수행하는 상호참조 분석 모듈을 포함하는, 사이버 위협정보 수집 시스템.In claim 1,
The integrated management unit,
Extracts text from various pieces of information collected through the information collection unit, separates the text into sentence units, extracts a token from the separated sentence, performs text normalization using the token, and adds part-of-speech information to the token. Allocating preprocessing module,
An entity name recognition module for classifying proper nouns based on the token, and
A cyber threat information collection system comprising a cross-reference analysis module that compares preceding words and phrases with current words and phrases to determine whether they are the same entity, and performs dependency analysis between words.

제3항에서,
상기 통합 관리부는,
상기 상호참조 분석 및 의존관계 분석을 통해 상기 침해지표, 공격 전술 및 도구, 공격 기법 및 공격 절차(TTPs)를 추출하며, 상기 침해지표 및 상기 공격 전술 및 도구, 공격 기법 및 공격 절차를 설명하는 메타 데이터를 추출하는 의미 분석 모듈을 더 포함하는, 사이버 위협정보 수집 시스템.In claim 3,
The integrated management unit,
The intrusion index, attack tactics and tools, attack techniques and attack procedures (TTPs) are extracted through the cross-reference analysis and dependency analysis, and meta describing the intrusion index and the attack tactics and tools, attack techniques and attack procedures A cyber threat information collection system further comprising a semantic analysis module for extracting data.

제1항에서,
상기 통합 관리부는,
네트워크 통신, 내부 및 외부 인터페이스를 관리하는 인터페이스 관리 모듈, 및
미리 저장된 관리 테이블 모델에 기반하여 상기 복수의 수집 채널을 관리하고, 상기 복수의 수집 채널에게 미리 설정된 환경설정 사항을 할당하는 데이터베이스 모델 관리 모듈을 포함하는, 사이버 위협정보 수집 시스템.In claim 1,
The integrated management unit,
an interface management module for managing network communications, internal and external interfaces, and
and a database model management module for managing the plurality of collection channels based on a pre-stored management table model and allocating preset environment settings to the plurality of collection channels.

제5항에서,
상기 통합 관리부는,
상기 통합 관리부에 등록된 전체 수집 채널 현황, 현재 사용 가능한 수집 채널 현황, 전체 유휴 수집 채널 현황, 및 작업을 할당받은 수집 채널의 작업완료 여부를 사용자 단말에게 송신하는 수집 채널 제어 모듈을 더 포함하는, 사이버 위협정보 수집 시스템.In claim 5,
The integrated management unit,
Further comprising a collection channel control module for transmitting the total collection channel status registered in the integrated management unit, the currently available collection channel status, the total idle collection channel status, and whether the job of the collection channel to which the job is assigned has been completed, to the user terminal; Cyber threat information collection system.

제6항에서,
상기 통합 관리부는,
미리 저장된 스케줄링 이벤트에 수집 채널의 스케줄을 추가 또는 삭제하는 스케줄러를 더 포함하는, 사이버 위협정보 수집 시스템.In claim 6,
The integrated management unit,
A cyber threat information collection system, further comprising a scheduler for adding or deleting a schedule of a collection channel to a pre-stored scheduling event.

제7항에서,
상기 통합 관리부는,
상기 통합 관리부에 등록된 프로세스 정보를 초기화하고, 상기 복수의 수집 채널에서 발생하는 오류를 수집하며, 오류 정보를 데이터베이스에 저장하는 프로세스 관리 모듈을 더 포함하는, 사이버 위협정보 수집 시스템.In claim 7,
The integrated management unit,
and a process management module for initializing process information registered in the integrated management unit, collecting errors occurring in the plurality of collection channels, and storing the error information in a database.

제5항에서,
상기 인터페이스 관리 모듈은,
기간별, 자원별, 또는 채널별 정보 수집량에 대한 통계를 사용자 단말에게 송신하는, 사이버 위협정보 수집 시스템.

In claim 5,
The interface management module,
A cyber threat information collection system that transmits statistics on the amount of information collected by period, resource, or channel to a user terminal.